Anthropic Moves to Ease Fable 5 Over-Refusals

Editor J
Anthropic Moves to Ease Fable 5 Over-Refusals

Anthropic has pledged to cut Claude Fable 5's false positives after users reported harmless scientific prompts being routed to the older Opus 4.8.

Just two days after its June 9 launch, Anthropic's Claude Fable 5 has drawn criticism in an over-refusal controversy. Users report that the model's safety guardrails, which redirect sensitive queries to the lower-tier Opus 4.8, are frequently flagging entirely harmless prompts.

Rather than issuing outright refusals, Anthropic routes flagged queries to Opus 4.8 after a safety classifier screens for risks related to cybersecurity, biochemistry, and model distillation. Although the company states this switch affects less than 5% of user sessions, developers and researchers report a significantly higher fallback rate.

In response to mounting user feedback, Anthropic acknowledged that the conservative safety tuning was intentional and announced it has begun refining the classifiers to minimize false positives.

Mitochondria Queries Trip Safety Classifier

A compilation of false positives published by The Wall Street Journal illustrates the scope of the over-refusal problem. Harmless prompts rerouted to Opus 4.8 included basic biology queries about mitochondria, PCR primer design, abstract algebra topics such as cyclic groups, and even a grocery list for pulled pork.

The underlying mechanism is straightforward. When the classifier detects potential risks in cybersecurity, biochemistry, or distillation, the query is automatically redirected to Opus 4.8, accompanied by an on-screen notification. Users argue that this safety net is cast far too wide, catching benign academic and everyday requests.

On Reddit, researchers in biology, neuroscience, and clinical sciences report that the actual fallback rate feels much higher than official estimates. Because Claude Fable 5 is priced at twice the rate of Opus, the frequent downgrades to the older model have fueled skepticism over the new release's value proposition.

Anthropic Pledges Rapid Classifier Adjustments

A glowing digital padlock on a circuit board symbolizing cybersecurity
A digital padlock resting on a circuit board

Anthropic addressed these concerns directly. In an official announcement, the company admitted that safeguards were tuned conservatively to ensure a safe and timely launch, resulting in benign requests being flagged. Acknowledging user frustration, the developer promised to reduce false positives as quickly as possible.

Anthropic supported its position with two key statistics. The fallback mechanism is triggered in fewer than 5% of all sessions on average, meaning Claude Fable 5 performs identically to the unrestricted Mythos 5 in over 95% of cases. The company also argued that routing queries to Opus 4.8 provides a better user experience than a blunt refusal.

To address the issue, Anthropic plans to launch a 'trusted access' program within weeks, granting vetted biomedical researchers and enterprise clients access to the unrestricted Mythos 5. The company will also analyze 30 days of Mythos-class traffic logs to identify and resolve false-positive patterns. Safeguards on biological and chemical queries will be narrowed first to minimize disruption to scientific research.

No Fix Has Shipped Yet: The Next Few Weeks Will Decide

Despite these commitments, no technical updates have been deployed. The promised mitigations remain confined to the initial announcement, with no official timeline for classifier updates. The coming weeks will determine whether Anthropic delivers on its promises.

The fallback routing is not the only source of friction. As detailed in our launch-day backlash report, users continue to criticize the two-tier access structure, under which governments and vetted partners receive the unrestricted Mythos 5 while general users pay premium rates for a heavily filtered version.

As frontier AI models launch at an accelerating pace, this controversy highlights a recurring structural tension in safety engineering. Finding the balance between preventing misuse and avoiding over-refusal will be the defining challenge for Anthropic in the Mythos-class era.

Menu