Claude Mythos Early Reviews Split Sharply
Early feedback on Anthropic's Claude Mythos splits sharply: some testers praise its automated security skills while others call the performance mediocre.
Early Mythos tester reviews from individual users of Anthropic's top-tier Claude Mythos Preview are beginning to emerge, presenting a stark contrast to the 'paradigm shift' promised by corporate press releases. Among the developers and security engineers who have tested the model, the verdicts are sharply divided between high praise and disappointment.
With Anthropic's Claude Mythos still restricted to Project Glasswing partners and select early testers amid speculation of an imminent public release, these early assessments carry significant weight. While testers agree that the model represents a notable departure from its predecessors, few are accepting the company's marketing claims at face value.
Praise for 'Elite Red Teaming on Demand'
Positive assessments highlight the model's capabilities. One developer who spent a full 24 hours testing the model immediately after its launch characterized the Mythos Preview release as a paradigm shift rather than an incremental update. When tasked with auditing a long-established network stack using a simple prompt to find vulnerabilities, the model flagged a 20-year-old integer overflow combined with a NULL pointer dereference within hours.
The Mythos Preview's capacity for exploit generation was even more notable. Given a kernel-module vulnerability, the model autonomously constructed a return-oriented programming (ROP) chain, bypassed kernel address space layout randomization (KASLR), defeated stack canaries, and generated a working proof-of-concept (PoC). In testing with a concurrency bug, it identified the race condition and immediately proposed a fix using mutexes and atomic operations.
Similar reports emerged on Reddit's r/singularity community, where users noted that colleagues with Project Glasswing access to the Mythos Preview were impressed by its ability to locate vulnerabilities and demonstrate exploit methods. Within the cybersecurity domain, there is broad consensus that the model represents a significant advancement.
Claude Mythos Draws an 'Overwhelmingly Mid' Verdict
While few question the technological progress, reviews diverge on whether the model justifies its marketing claims. A cybersecurity professional at a healthcare provider noted that while the Mythos Preview is powerful, it is not as revolutionary as promotional metrics suggest, adding that it still requires significant human supervision, planning, and prompt engineering.
Other assessments were more critical. Security engineer PKIProtector described the Mythos Preview as a marginally improved version of Opus 4.7 that relies on brute-force methods, noting that Anthropic Mythos failed to detect a known, previously patched vulnerability in legacy code. Another tester reported numerous false positives, concluding that the model currently introduces more complications than utility.
Scrutiny from the open-source community was particularly critical. The lead developer of curl reviewed five vulnerability reports generated by Mythos and dismissed the promotional claims as marketing hype, noting that only one report—concerning a low-severity CVE—was valid. A summary on an r/cybersecurity megathread was even more direct, describing the model's performance as 'overwhelmingly mid.'
The Necessity of System Harnesses and Human Triage
Despite the divide between praise and criticism, the Mythos tester reviews converge on the conclusion that Claude Mythos cannot function effectively in isolation. As Project Glasswing participants have noted, success depends on a system harness that narrows the operational scope, paired with human triage to verify results. Frequent false positives, inconsistent refusal behavior, and slow execution speeds remain common concerns.
Running Claude Mythos at scale presents another significant cost challenge. While identifying a single vulnerability may cost under $50, systematically auditing a large repository requires thousands of parallel executions, potentially driving the cost of finding a few genuine bugs into tens of thousands of dollars. The confirmed Claude Mythos pricing for Project Glasswing partners stands at $25 per million input tokens and $125 per million output tokens, roughly five times the cost of Opus.
Public pricing remains unconfirmed even as Project Glasswing access gradually expands. While estimates of $16 to $18 per million input tokens and $80 per million output tokens circulated on YouTube and X in early June, these are unofficial rumors based on video analysis rather than official documentation. The original leaked material did not specify pricing, noting only that the Mythos Preview would be expensive to run and use. Ultimately, the Mythos tester reviews point to a clear conclusion: Anthropic Mythos represents genuine progress, though it falls short of the promotional claims.
- Anthropic Frontier Red Team - Claude Mythos Preview
- Reddit r/singularity - Those who have access to Claude Mythos, what are your impressions?
- Reddit r/cybersecurity - Anthropic Claude Mythos Preview Megathread
- Medium - Claude Mythos Preview: My 24-Hour Deep Dive Into a Security Game-Changer
- Cloudflare - Project Glasswing: what Mythos showed us