In a surprising turn of events, the UK’s AI Security Institute (AISI) has released evaluation results showing that OpenAI’s GPT-5.5 matches the cybersecurity performance of Anthropic’s much-hyped Mythos Preview. The findings come just weeks after Anthropic restricted access to Mythos Preview, citing concerns over its potential misuse in cyber threats.
Benchmarking frontier AI models in cybersecurity tasks
Since its establishment in 2023, the AISI has systematically assessed frontier AI models across 95 Capture the Flag (CTF) challenges. These challenges evaluate reverse engineering, web exploitation, cryptography, and other critical cybersecurity skills. GPT-5.5 demonstrated an average success rate of 71.4% on the most advanced "Expert" level tasks, narrowly outperforming Mythos Preview’s 68.6%—a difference that falls within the margin of error for such evaluations.
One standout performance came from GPT-5.5’s ability to solve a complex reverse engineering task. The model autonomously developed a disassembler to decode a Rust binary, completing the challenge in just 10 minutes and 22 seconds without human intervention. The total cost for API calls during this task was approximately $1.73.
Progress on high-stakes cyberattack simulations
AISI’s "The Last Ones" (TLO) scenario, designed to simulate a multi-stage corporate data extraction attack, presented another critical test. GPT-5.5 achieved success in 3 out of 10 attempts, while Mythos Preview managed 2 out of 10. Both models surpassed previous AI systems, which had failed to complete even a single attempt on TLO. This milestone highlights a significant advancement in AI-driven cybersecurity threats.
However, neither GPT-5.5 nor Mythos Preview could crack AISI’s more challenging "Cooling Tower" simulation. This scenario models an attempt to disrupt control software for a power plant, a task that has stumped every AI model tested to date. The inability to pass this test underscores the persistent gaps in AI’s reliability for high-risk infrastructure scenarios.
Contrasting release strategies and public perception
Anthropic’s decision to restrict Mythos Preview to "critical industry partners" was framed as a precaution against potential cybersecurity risks. The move sparked debate about balancing public transparency with safety concerns. Meanwhile, OpenAI took a more open approach with GPT-5.5’s public launch, positioning the model as a dual-use tool with both defensive and offensive cybersecurity applications.
The AISI’s findings suggest that release strategies may not directly correlate with performance. Both models now set a new benchmark for AI-driven cybersecurity capabilities, raising questions about the future of AI regulation, model access, and the evolving role of AI in both offensive and defensive cyber operations.
As AI systems grow more sophisticated, the cybersecurity landscape faces both unprecedented opportunities and challenges. Researchers and policymakers will need to collaborate closely to ensure that these tools are deployed responsibly, without sacrificing transparency or safety.
AI summary
AISI’nin yaptığı yeni testler, OpenAI GPT-5.5’in Mythos Preview kadar yetenekli olduğunu gösteriyor. Siber güvenlikte AI’nin rolünü ve geleceğini inceleyin.