GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests
New results suggest Mythos' cyber threat isn't "a breakthrough specific to one model."
Signal weather
Rising
Momentum is building quickly, so this card is a good early entry point into the topic.
Last month, Anthropic made a big deal about the supposedly outsize cybersecurity threat represented by its Mythos Preview model, leading the company to restrict the initial release to “critical industry partners.” But new research from the UK's AI Security Institute (AISI) suggests that OpenAI's GPT-5.5, which launched publicly last week, reached "a similar level of performance on our cyber evaluations" as Mythos Preview, which the group evaluated last month. Since 2023, the AISI has run a variety of frontier AI models through 95 different Capture the Flag challenges designed to test capabilities on cybersecurity tasks, such as reverse engineering, web exploitation, and cryptography. On the highest-level "Expert" tasks, GPT-5.5 passed an average of 71.4 percent, slightly higher than the 68.6 percent achieved by Mythos Preview (though within the margin of error). In one particularly difficult task that involved building a disassembler to decode a Rust binary, AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls. GPT-5.5 also matched Mythos Preview in its progress on "The Last Ones" (TLO), an AISI test range set up to simulate a 32-step data extraction attack on a corporate network. GPT-5.5 succeeded in 3 of 10 attempts on TLO, compared to 2 of 10 for Mythos Preview—no previous model had ever succeeded at the test even once. But GPT-5.5 still fails at AISI's more difficult "Cooling Tower" simulation of an attempted disruption of the control software for a power plant, as every previously tested AI model also has. Read full article Comments
Stay on the signal
Follow GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests
Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.
Story map
Understand this topic fast
A quick entry into the story: why it matters now, who is involved, and where to go next for context.
Why it matters now
Topic constellation
Open the live map for this story
See which entities, story threads, sources, and follow-up articles shape this story right now.
Click nodes to continue
Story timeline
Continue with this story
A short sequence of events and follow-up stories to understand the arc quickly.
How reliable this looks
Signal and trust for Ars Technica
This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.
Reliability
92
Freshness
100
Sources in storyline
1
Related articles
More stories that share tags, source, or category context.
Scorpions go terminator mode and reinforce their weapons with metal
Different hunting patterns seem to dictate different distributions of metal.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Is your Purosangue SUV not sharp enough? Ferrari has you covered.
We'll soon get to see the brand's first EV; first, a more honed V12 four-seater.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Атаки на больницы и взлом электростанций. Нейросеть Mythos начала играть по своим правилам
США прервали развитие Mythos.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Virgin Galactic reveals new ship, but it's running out of time and cash
It's not clear whether Virgin Galactic has the cash reserves to fund a prolonged test phase.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
More from Ars Technica
Fresh reporting and follow-up coverage from the same newsroom.
Scorpions go terminator mode and reinforce their weapons with metal
Different hunting patterns seem to dictate different distributions of metal.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Is your Purosangue SUV not sharp enough? Ferrari has you covered.
We'll soon get to see the brand's first EV; first, a more honed V12 four-seater.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Virgin Galactic reveals new ship, but it's running out of time and cash
It's not clear whether Virgin Galactic has the cash reserves to fund a prolonged test phase.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Apple may take "several months" to catch up to Mac mini and Studio demand
Chip shortages and demand from AI enthusiasts are both playing a part.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.