Last month, Anthropic emphasized the alleged heightened cybersecurity risks posed by its Mythos Preview model, prompting the company to limit its initial release to critical industry partners.

However, fresh research from the UK’s AI Security Institute (AISI) indicates that OpenAI’s GPT-5.5, which debuted publicly last week, achieved comparable performance in cybersecurity evaluations to Mythos Preview, which the AISI tested in the prior month.

AI Security Institute’s Cybersecurity Benchmark Tests

Since 2023, the AISI has subjected frontier AI models to 95 Capture the Flag challenges, designed to assess skills in reverse engineering, web exploitation, and cryptography.

Expert-Level Task Performance

On the most advanced “Expert” tasks, GPT-5.5 recorded an average success rate of 71.4%, marginally surpassing Mythos Preview’s 68.6%—a difference within the margin of error.

In one particularly demanding challenge, GPT-5.5 was tasked with building a disassembler to decode a Rust binary. The AISI reported:

“GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73 in API calls.”

Progress on “The Last Ones” (TLO) Test

GPT-5.5 also matched Mythos Preview in its progress on “The Last Ones” (TLO), an AISI test simulating a 32-step data extraction attack on a corporate network.

GPT-5.5 succeeded in 3 out of 10 attempts on TLO, compared to Mythos Preview’s 2 out of 10. Notably, no prior AI model had ever passed the test even once.

Limitations in Critical Infrastructure Scenarios

Despite these advancements, GPT-5.5, like all previously tested AI models, failed the AISI’s more rigorous “Cooling Tower” simulation, which models an attempt to disrupt control software for a power plant.