Anthropic and OpenAI’s cyber-capable AI models may still require significant human expertise to operate effectively, according to new findings from users testing the systems in real-world environments.
Why it matters: The new phase of AI-powered cybersecurity may depend less on fully autonomous hacking and more on how effectively humans can direct, validate, and operationalize increasingly powerful systems.
The Big Picture: AI Models Uncover Thousands of Vulnerabilities
When Anthropic unveiled Mythos Preview, it warned that the model was so powerful it found tens of thousands of bugs spanning nearly every operating system. Third-party testing suggests that OpenAI’s GPT-5.5-Cyber is just as capable at identifying bugs and writing exploits.
Major companies and governments worldwide have been eager to access these models to prepare for the risks when similar capabilities fall into the hands of attackers.
Early Adopter Experiences: AI Models Show Promise but Need Human Guidance
Several early adopters of Mythos and GPT-5.5 shared their experiences this week from testing the models:
- Palo Alto Networks reported finding 75 bugs using both Anthropic’s and OpenAI’s models, compared to the 5-10 bugs it typically discovers each month. Researchers also noted the models’ ability to chain seemingly low-severity vulnerabilities into functional attack sequences.
- Microsoft announced on Tuesday that its new agentic security system, which runs on several frontier and distilled models, uncovered 16 new vulnerabilities in the Windows networking and authentication stack. The company also warned that AI tools will likely increase the overall volume of discovered vulnerabilities over time, putting additional pressure on defenders to triage and patch flaws more quickly.
- Cisco released this week the “Foundry Security Spec,” an open-source blueprint outlining how organizations should integrate advanced AI models into their security frameworks.
- XBOW, an AI-powered penetration testing startup, described Mythos as “extremely powerful for source code audits” in a blog post on Tuesday detailing its internal tests.
Human Oversight Remains Critical: Models Struggle with Validation and False Positives
Vendors consistently found that the models performed best when paired with experienced security researchers who could validate findings, guide workflows, and distinguish exploitable vulnerabilities from noise.
XBOW noted that while Mythos was effective, it was “good, but less powerful, at validating exploits” and could be “too literal and conservative,” sometimes overstating the practical significance of its findings.
Palo Alto Networks, which has been working with Mythos, Opus 4.7, and GPT-5.5-Cyber, observed a 30% false positive rate across its products. However, this rate decreased as the company trained the model on the specific environment it was scanning.
Daniel Stenberg, the lead developer for the open-source project Curl, stated on Monday that Mythos identified one low-severity bug in its code, along with several false positives and another issue ultimately deemed insignificant. This underscores the ongoing need for human review.
Cisco’s Blueprint Highlights AI Model Limitations
Cisco’s new “Foundry Security Spec” includes critical insights into the capabilities and limitations of these AI models. The company wrote:
“A frontier model produces fluent, confident, plausible vulnerability claims that are wrong at a rate that makes unreviewed output worthless.”
Instead of simply instructing models to be more cautious, Cisco researchers found better results when they instructed systems to make claims “checkable” and then