AI

AI Just Broke Every Benchmark for Autonomous Cyber Capability

The doubling time for autonomous AI cyber capability has collapsed from eight months to roughly four months — and the latest models have broken past every trend line. This is not a benchmark story. This is an infrastructure story with a fuse.

By I F · 7 min read

TL;DR

The UK's AI Security Institute (AISI) and Palo Alto Networks independently confirmed that Claude Mythos Preview and GPT-5.5 have shattered all previous autonomous cyber capability benchmarks.
Claude Mythos Preview became the first model to complete both of the AISI's cyber ranges, solving a 32-step corporate network attack in 6 of 10 attempts and a previously unsolved range in 3 of 10.
Palo Alto Networks issued 26 CVEs covering 75 issues found by AI model scanning — compared to a typical monthly volume of fewer than five.
The doubling time for autonomous cyber task completion has accelerated from ~8 months (November 2025) to ~5 months (early 2026) to approximately 4 months now — and the latest models have exceeded even that trajectory.
METR, an independent nonprofit, arrived at a nearly identical doubling-time figure, confirming the trend is robust across methodologies.

What Happened

On Wednesday, 14 May 2026, the United Kingdom's AI Security Institute published findings showing that two frontier AI models — Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 — have substantially surpassed the already-accelerating pace at which AI systems complete autonomous cybersecurity tasks. Palo Alto Networks released corroborating findings the same day.

The AISI had been tracking a doubling trend since late 2024, measuring what it calls the "80% reliability cyber time horizon" — how long a task takes a human expert, used as a proxy for AI autonomy. Earlier this year, the institute estimated that this horizon was doubling approximately every five months. That was already roughly half the eight-month doubling time estimated in November 2025.

Now, Mythos Preview and GPT-5.5 have outperformed any trend lines the institute has measured.

The clearest evidence came from the AISI's cyber ranges — structured simulations of multi-stage attacks against small, undefended enterprise networks. A newer checkpoint of Claude Mythos Preview became the first model to complete both of the institute's ranges. It solved "The Last Ones," a 32-step simulated corporate network attack, in 6 of 10 attempts. It completed "Cooling Tower" — previously unsolved by any model — in 3 of 10 attempts. GPT-5.5 solved "The Last Ones" in 3 of 10 attempts.

Palo Alto Networks, which began testing Claude Mythos in April as a launch partner for Anthropic's Project Glasswing, and has since tested Claude Opus 4.7 and OpenAI's GPT-5.5-Cyber, described the models as "extraordinarily capable at finding vulnerabilities and changing them into critical exploit paths in near-real-time."

The company released security advisories covering 26 CVEs representing 75 issues identified through AI model scanning across more than 130 products. All important vulnerabilities in its SaaS products had been patched.

What It Actually Means

This is not a story about AI getting better at cybersecurity. It is a story about the rate of change in AI getting better at cybersecurity — and what that rate implies for every organisation that runs software.

The AISI was careful to note the limits of its data: the estimates are based on a relatively small number of models, and the hardest tasks in the test suite have the least amount of human comparison data. But the institute also said that dropping any single model from the analysis barely moves the needle, shifting the estimated doubling time by less than a month in either direction. METR's independent analysis arrived at a nearly identical figure.

"Frontier AI's autonomous cyber and software capability is advancing quickly: the length of cyber tasks that frontier models can complete autonomously has doubled on the order of months, not years," the AISI wrote.

The key word is months. Not years. Months.

If the doubling time holds at four months, then by September 2026, these models will handle tasks twice as complex as they can today. By January 2027, four times. The AISI does not yet know whether the latest results represent an isolated capability jump or the start of a new, faster trajectory — but the direction is unambiguous.

Hype Deconstruction

This is not a story about AI "going rogue" or attacking networks autonomously in the wild. The AISI's cyber ranges are structured simulations against undefended enterprise networks. Real-world networks have active defences, segmentation, monitoring, and human operators. The AISI is developing more demanding evaluations that include active cyber defences.

It is also not a story about AI replacing human security researchers. Palo Alto Networks used AI to find vulnerabilities — human engineers still triaged, verified, and patched them. The AI accelerated discovery; it did not replace the human in the loop.

But the direction of travel matters more than the absolute capability level. The models are improving faster than anyone's threat models assumed.

Stakeholder Landscape

Who is directly affected:

Enterprise security teams. The window between vulnerability discovery and exploitation is shrinking. AI-assisted attackers will find and weaponise vulnerabilities faster than manual patch cycles can respond.
Software vendors. Palo Alto Networks found 75 issues across 130 products in a single AI scanning pass. That volume is unsustainable for traditional vulnerability management programmes.
Government cybersecurity agencies. The AISI and its US counterpart CAISI are now racing to build evaluation frameworks that keep pace with model capability.

Who benefits:

Defenders who adopt AI-first tooling. The same models that find vulnerabilities for attackers can find them for defenders. Palo Alto Networks' own results demonstrate this.
Anthropic and OpenAI. Both companies can now point to government-validated capability data — though the framing cuts both ways, as it also strengthens the case for mandatory pre-deployment testing.

Who is second-order affected:

Cyber insurers. Actuarial models that assume human-speed attack development cycles are obsolete.
Regulators. The UK's AISI findings will almost certainly accelerate mandatory pre-deployment evaluation frameworks in the EU and UK.

Cross-Layer Implications

The CAISI connection. The same week these findings dropped, Reuters reported that the US Commerce Department removed details from its website about agreements with Google, xAI, and Microsoft to test their AI models for security vulnerabilities. The CAISI page was deleted, then redirected. The timing is not coincidental — the US government is simultaneously expanding and obscuring its AI security testing infrastructure at precisely the moment the capability data says it is most needed.

The talent market. Security engineers who can operate AI-assisted vulnerability discovery tooling are about to become the most in-demand technical role in the industry. The supply is nowhere near the demand.

The open-source ecosystem. If AI models can scan 130 products and find 75 issues in a single pass, the open-source software supply chain — where vulnerability discovery has historically been slow and under-resourced — is about to be stress-tested at unprecedented speed.

What This Means for Security Practitioners

If you run a security team, four things need to happen this quarter:

Deploy AI-assisted vulnerability scanning across your entire codebase and dependency tree. The models that found 75 issues across 130 Palo Alto Networks products are available now. If you are not using them, assume adversaries are.
Reduce your mean time to patch. Palo Alto Networks' guidance is blunt: build security operations fast enough to respond in minutes, because AI-powered attacks may soon unfold that quickly. If your patch cycle is measured in weeks, you are already behind.
Shrink your attack surface. Use AI to identify and remediate security misconfigurations. The models are better at finding configuration gaps than humans — and faster.
Deploy ML-driven detection and response across all systems. Signature-based detection cannot keep pace with AI-generated attack paths. Behavioural detection is now table stakes.

If you are a software vendor: The volume of CVEs Palo Alto Networks generated in a single AI scanning pass — 26 CVEs covering 75 issues — should be treated as a preview of what your own codebase will face. Run the same models against your own products before external researchers or adversaries do.

Uncertainty Ledger

Is this a step change or a new trajectory? The AISI explicitly states it does not know whether the Mythos/GPT-5.5 results represent an isolated jump or the start of a faster doubling curve. The next checkpoint — likely within 2–3 months — will answer this.
How do these models perform against defended networks? The AISI's ranges are undefended. Active defences, segmentation, and monitoring will reduce success rates — but by how much is unknown.
What is the attacker adoption timeline? State-sponsored actors are almost certainly already using these models. The timeline for criminal adoption is less clear but likely measured in months, not years.
Where are the other labs? Google DeepMind and xAI were notably absent from both the AISI and Palo Alto Networks testing. Their models may be equally capable — or more so — but the data is not public.

Bottom Line

The doubling time for autonomous AI cyber capability has collapsed to approximately four months. Two independent assessments — one from a government institute, one from the largest cybersecurity vendor — confirm the same trend. The models are not just improving; they are improving faster than the improvement rate itself. Every organisation that runs software needs to assume that the window between vulnerability discovery and exploitation is now measured in minutes, not weeks. If your security operations cannot respond at that speed, the gap is not closing — it is widening.

Sources:

CyberScoop, "Researchers say AI just broke every benchmark for autonomous cyber capability," 13 May 2026 (Tier 2 — specialist cybersecurity publication)
UK AI Security Institute (AISI), findings published 13 May 2026, cited via CyberScoop (Tier 1 — government institute)
Palo Alto Networks, security advisories and testing results, 13 May 2026, cited via CyberScoop (Tier 1 — primary source, publicly traded cybersecurity vendor)
METR, independent analysis confirming ~4-month doubling time, cited via CyberScoop (Tier 2 — established AI safety nonprofit)
Reuters, "Microsoft, Google, xAI security test details deleted from US government website," 11 May 2026 (Tier 1 — wire service)