GPT-5.5 vs Mythos Preview: UK AI Security Institute Releases Cybersecurity Test Results

Anthropic OpenAI GPT-5.5 Mythos Preview AI Security Institute AI cybersecurity tests frontier AI models Capture the Flag challenges AI security benchmarks AI model performance

Last month, Anthropic emphasized the alleged heightened cybersecurity risks posed by its Mythos Preview model, prompting the company to limit its initial release to critical industry partners.

However, fresh research from the UK’s AI Security Institute (AISI) indicates that OpenAI’s GPT-5.5, which debuted publicly last week, achieved comparable performance in cybersecurity evaluations to Mythos Preview, which the AISI tested in the prior month.

AI Security Institute’s Cybersecurity Benchmark Tests

Since 2023, the AISI has subjected frontier AI models to 95 Capture the Flag challenges, designed to assess skills in reverse engineering, web exploitation, and cryptography.

Expert-Level Task Performance

On the most advanced “Expert” tasks, GPT-5.5 recorded an average success rate of 71.4%, marginally surpassing Mythos Preview’s 68.6%—a difference within the margin of error.

In one particularly demanding challenge, GPT-5.5 was tasked with building a disassembler to decode a Rust binary. The AISI reported:

“GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73 in API calls.”

Progress on “The Last Ones” (TLO) Test

GPT-5.5 also matched Mythos Preview in its progress on “The Last Ones” (TLO), an AISI test simulating a 32-step data extraction attack on a corporate network.

GPT-5.5 succeeded in 3 out of 10 attempts on TLO, compared to Mythos Preview’s 2 out of 10. Notably, no prior AI model had ever passed the test even once.

Limitations in Critical Infrastructure Scenarios

Despite these advancements, GPT-5.5, like all previously tested AI models, failed the AISI’s more rigorous “Cooling Tower” simulation, which models an attempt to disrupt control software for a power plant.

Source: Ars Technica

← Previous

Shilo Sanders Criticizes Reporter Mary Kay Cabot Over Deshaun Watson Q...

Civitas Symposium Gathers Reactions to Justice Thomas' Declaration of Independence Lecture

06:08 · 15 May 2026

xAI Unveils Grok Build: New AI Coding Agent in Early Beta for Elite Users

It's in early beta and only available to SuperGrok Heavy subscribers right now.

22:21 · 14 May 2026

Musk vs. Altman Trial: Closing Arguments Reveal Legal Missteps and Evidence Mount

Today was closing arguments in the Musk v. Altman trial, and I almost feel bad writing about the unbelievable demolition derby I just witnessed. Steve...

21:08 · 14 May 2026

Meta Ray-Ban Display Glasses Now Support Gesture-Based Messaging in WhatsApp, Messenger, and More

Meta is rolling out new features to its Meta Ray-Ban Display smart glasses, including bringing the ability to write messages just with hand gestures t...

21:02 · 14 May 2026

Fired IT Workers Accidentally Record Own Database Sabotage via Active Teams Call

Perhaps you remember Muneeb and Sohaib Akhter, the 34-year-old twin brothers we profiled earlier this week. Although they had the tech chops to commit...

20:59 · 14 May 2026

Elon Musk’s 'Jackass' Trophy Takes Center Stage in OpenAI Trial

Yesterday, in Musk v. Altman, before the jurors came in, Sam Altman's team passed up what looked - from a distance - like a little league trophy. It w...

20:55 · 14 May 2026

Meta Expands Smart Glasses Capabilities with Third-Party Apps and Games

The $800 smart glasses could soon be a lot more useful.

20:00 · 14 May 2026

OpenAI Launches Mobile Access to Codex for Coding Projects On-the-Go

The integration allows you to keep tabs on your coding projects on the go.

20:00 · 14 May 2026

OpenAI Integrates Codex into ChatGPT Mobile App, Expanding AI Coding Capabilities

OpenAI is going to let users access Codex, its desktop AI tool that can write code and use apps on your computer, from the ChatGPT app on your phone....

Technology

AI Cybersecurity Showdown: GPT-5.5 Matches Mythos Preview in UK Security Tests

AI Security Institute’s Cybersecurity Benchmark Tests

Expert-Level Task Performance

Progress on “The Last Ones” (TLO) Test

Limitations in Critical Infrastructure Scenarios

Shilo Sanders Criticizes Reporter Mary Kay Cabot Over Deshaun Watson Q...

Civitas Symposium Gathers Reactions to Justice Thomas' Declaration of...

Technology

AI Cybersecurity Showdown: GPT-5.5 Matches Mythos Preview in UK Security Tests

AI Security Institute’s Cybersecurity Benchmark Tests

Expert-Level Task Performance

Progress on “The Last Ones” (TLO) Test

Limitations in Critical Infrastructure Scenarios

Shilo Sanders Criticizes Reporter Mary Kay Cabot Over Deshaun Watson Q...

Civitas Symposium Gathers Reactions to Justice Thomas' Declaration of...

Related articles

xAI Unveils Grok Build: New AI Coding Agent in Early Beta for Elite Users

Musk vs. Altman Trial: Closing Arguments Reveal Legal Missteps and Evidence Mount

Meta Ray-Ban Display Glasses Now Support Gesture-Based Messaging in WhatsApp, Messenger, and More

Fired IT Workers Accidentally Record Own Database Sabotage via Active Teams Call

Elon Musk’s 'Jackass' Trophy Takes Center Stage in OpenAI Trial

Meta Expands Smart Glasses Capabilities with Third-Party Apps and Games

OpenAI Launches Mobile Access to Codex for Coding Projects On-the-Go

OpenAI Integrates Codex into ChatGPT Mobile App, Expanding AI Coding Capabilities