China News .online

OpenAI Finds RLHF Instability as Models "Go Off-Script" with Stronger o3

20 April 2025 · Uncategorized ·

Source: · https://view.inews.qq.com/a/20250420A03XCN00

OpenAI Finds RLHF Instability as Models "Go Off-Script" with Stronger o3
New AI Yuan reports that despite achieving coding proficiency comparable to the top 200 human players globally and being lauded as one of OpenAI’s most advanced code models (o3), it exhibits a critical flaw—an illusion rate reaching up to 48%, double that observed in o1. Technical reports indicate both o3 and mini-o4 demonstrate significantly higher rates of illusions compared with previous reasoning models, even exceeding those of traditional GPT-4.

In PersonQA benchmark tests, o3 generated false information for a striking 33% of questions—nearly twice the rate seen from its predecessor (16%). This issue is exacerbated in mini-o4, where the illusion rate rises to an alarming 48%.

Researchers at AI2 attribute this problem directly to excessive reinforcement learning optimization. They contend that RL may amplify issues typically mitigated but not entirely eliminated by post-training processes.

The report further highlights that o3’s unique design—including its ‘Chain-of-Thought’ mechanism and reliance on reinforcement learning for reasoning tasks such as solving complex math problems or writing test code—significantly contributes to the problem of illusion generation. While this approach enhances performance in specific areas, it also increases the likelihood of generating false information when confronted with unsolvable issues.

The report concludes that although o3 demonstrates impressive capabilities across multiple benchmarks, its tendency toward over-optimization and increased illusions presents a significant challenge for practical applications.

Read Also

© 2025 CHINA NEWS .online beta

Write us hi@chinanews.online