When AI Learns to Hack: Inside the UK’s Alarming Test of Anthropic’s Most Capable Model

="">

The question isn’t whether artificial intelligence will become a potent weapon in cyberattacks. The question is how close we are to that threshold — and whether the safety testing infrastructure can keep pace with the models themselves.

A detailed evaluation published by the UK’s AI Safety Institute offers the most concrete public evidence yet that frontier AI models are approaching meaningful autonomous cyber capabilities. The AISI’s assessment of Anthropic’s Claude “Mythos” preview models — early versions of what would become the Claude 4 family — found that the AI could independently complete multi-step cybersecurity challenges that previously required human expertise. Not toy problems. Real capture-the-flag exercises modeled on actual attack scenarios.

And the results should make every CISO pay attention.

What the UK Found — and Why It Matters Beyond the Lab

The AI Safety Institute tested two preview models, codenamed Mythos-minor and Mythos-major, against its ATLAS benchmark — a set of challenges drawn from real-world cybersecurity competitions. These aren’t abstract reasoning tests. They require an AI agent to probe systems, identify vulnerabilities, write and execute exploit code, and chain multiple steps together to compromise a target. The kind of work that, until recently, demanded years of specialized training in offensive security.

Mythos-major solved 26 out of 78 ATLAS challenges autonomously. That’s a 33.3% completion rate. Mythos-minor hit 22 of 78, or 28.2%. For context, the previous Claude 3.5 Sonnet model — already considered highly capable — managed just 14 of those same challenges, an 18% rate. The jump from 18% to 33% represents nearly a doubling in autonomous cyber capability in a single model generation.

The numbers alone are striking. The details are more so.

AISI researchers observed the models successfully performing tasks across the full attack chain: reconnaissance, exploitation, privilege escalation, and lateral movement. Mythos-major demonstrated what the institute described as an ability to “credibly perform some of the individual steps required in a real-world cyberattack.” It could scan networks, identify vulnerable services, craft working exploits, and escalate privileges on compromised machines — all without human guidance.

But the institute was careful to draw a line. Completing isolated CTF challenges, even complex ones, isn’t the same as executing a full end-to-end cyberattack against hardened production infrastructure. The models still struggled with the longest challenge chains, those requiring sustained planning across many sequential steps. They’d lose context. Make errors they couldn’t recover from. Get stuck in loops.

Not yet autonomous cyber operators. But closer than anything that came before them.

The AISI evaluation also introduced a new dimension to its testing methodology. Alongside the fully autonomous runs, researchers conducted “human uplift” assessments — measuring whether the models could meaningfully enhance the capabilities of human attackers at different skill levels. This is arguably the more immediately relevant threat vector. Most cyberattacks aren’t launched by AI agents operating alone. They’re launched by people, many of whom lack elite technical skills but possess enough intent and basic knowledge to be dangerous with the right assistance.

Here, the findings were nuanced. The models could explain attack concepts clearly, generate functional exploit code, and help debug failed attempts. For a moderately skilled attacker — someone with basic penetration testing knowledge but not deep expertise — Claude Mythos could serve as an effective force multiplier. The AISI’s report stopped short of declaring a specific uplift threshold had been crossed, but the directional trend is unmistakable.

Anthropic, for its part, has been transparent about these capabilities. The company’s own Responsible Scaling Policy classifies models into AI Safety Levels, and the Mythos evaluations fed directly into Anthropic’s decision-making about deployment safeguards. According to the AISI report, Anthropic provided pre-release access to the preview models specifically so the UK institute could conduct independent testing before public release — a practice that remains voluntary but that the UK government has been pushing to formalize.

The Arms Race Between Capability and Containment

The timing of this evaluation is significant. It arrives as governments worldwide are grappling with how to regulate AI systems whose capabilities are advancing faster than the policy frameworks designed to govern them.

The UK’s approach, centered on its AI Safety Institute, has emphasized technical evaluation as the foundation for governance. Rather than prescriptive rules about what models can or can’t do, AISI has focused on building the measurement infrastructure to understand what models actually do when tested rigorously. The Claude Mythos evaluation represents one of the most detailed public examples of this approach in practice.

But there’s a tension embedded in the model. AISI’s evaluations are conducted on pre-release versions with the cooperation of AI companies. That cooperation has been voluntary. And while Anthropic has been among the most willing participants, the broader industry’s commitment to pre-deployment safety testing remains uneven. OpenAI has engaged with AISI on some evaluations. So has Google DeepMind. But the depth and timing of access varies, and there’s no legal requirement compelling any company to submit models for independent testing before releasing them.

The European Union’s AI Act takes a different tack, imposing mandatory obligations on providers of general-purpose AI models above certain capability thresholds. Under that framework, the kind of cyber capability demonstrated by Claude Mythos would likely trigger additional compliance requirements — including adversarial testing and incident reporting obligations. Whether those requirements will prove effective or merely bureaucratic remains an open question.

In the United States, the picture is more fragmented. The Biden administration’s executive order on AI safety included provisions for reporting on dual-use foundation models, but enforcement mechanisms remain thin. The current political environment has shown limited appetite for new AI regulation, even as the technical case for oversight grows stronger with each model generation.

So the safety testing that does happen relies heavily on the goodwill of companies and the technical capacity of institutions like AISI. And AISI itself acknowledged limitations in its evaluation. The ATLAS benchmark, while more realistic than many academic alternatives, still operates in controlled environments that don’t fully replicate the complexity of real-world networks. Challenges are self-contained. Defenders aren’t actively responding. The fog of war that characterizes actual cyber operations is absent.

This matters because the gap between benchmark performance and real-world impact is where much of the genuine risk assessment lives. A model that solves 33% of CTF challenges might be far more or far less dangerous in practice than that number suggests, depending on how those capabilities translate to actual attack scenarios with real defenders, real network architectures, and real consequences for failure.

The cybersecurity community has been watching these developments with a mix of alarm and pragmatism. Offensive security professionals have noted that current AI models, including the most capable ones, still lack the kind of adaptive reasoning that elite human hackers bring to novel situations. When a model encounters an unexpected configuration or a defense it hasn’t seen in training data, it tends to fall back on generic approaches rather than creatively improvising. That’s a meaningful limitation — for now.

The defensive implications are equally important and often underweighted in public discussion. The same capabilities that make AI models useful for attacking systems also make them useful for defending them. Automated vulnerability discovery, code auditing, anomaly detection, incident response triage — these are areas where AI is already being deployed by security teams. The question is whether the offense-defense balance shifts as models become more capable, and in which direction.

Historical precedent from other dual-use technologies offers limited guidance. Nuclear technology, genetic engineering, cryptography — each followed its own trajectory of capability development, governance response, and eventual equilibrium (or lack thereof). AI’s unique characteristics, particularly the speed of capability improvement and the low marginal cost of deployment, suggest that waiting for problems to manifest before responding may not be a viable strategy.

The AISI evaluation also raises questions about the adequacy of current model safeguards. Anthropic has implemented various safety measures in the Claude model family, including training-time interventions designed to make models refuse requests for malicious assistance and system-level monitoring for potentially harmful outputs. The AISI testing was conducted on pre-release versions with some safety measures potentially not yet fully implemented, which complicates direct comparisons to the models ultimately deployed to users.

But the fundamental challenge remains: a model capable enough to solve complex cybersecurity challenges autonomously is, by definition, a model that possesses knowledge and reasoning abilities applicable to offensive operations. The difference between a helpful security research assistant and a cyber weapon is largely one of intent, context, and guardrails — all of which can be manipulated, bypassed, or simply absent in certain deployment scenarios.

Fine-tuning, jailbreaking, and prompt injection techniques continue to evolve alongside the models themselves. And open-weight models from other providers, once released, can be modified without any safety constraints at all. The AISI evaluation focused on Anthropic’s closed models, but the broader capability trajectory applies across the industry.

What Comes Next

The Claude Mythos evaluation is a snapshot. A single frame in a rapidly advancing film. The models tested were previews — not even final release versions. The next generation will be more capable. And the one after that.

AISI has signaled its intention to continue and expand its evaluation program, including more sophisticated benchmarks that better approximate real-world conditions. The institute is also working on evaluations that test models’ ability to assist with other categories of catastrophic risk, including biological weapons development and the generation of disinformation at scale. Cyber capabilities are one piece of a larger puzzle.

For industry practitioners, the practical takeaways are concrete. First, AI-assisted cyberattacks are no longer theoretical. The capability exists in current-generation models, albeit at a level below what elite human operators can achieve. Security teams should be modeling AI-augmented threat actors in their risk assessments now, not waiting for a dramatic public incident to force the issue.

Second, the defensive applications of these same models deserve equal investment. If AI can identify and exploit vulnerabilities autonomously, it can also find and flag them before attackers do. Organizations that integrate AI into their defensive operations — thoughtfully, with appropriate validation — will have a meaningful advantage over those that don’t.

Third, the governance question isn’t going away. Whether through voluntary frameworks like Anthropic’s Responsible Scaling Policy, institutional evaluations like AISI’s, or regulatory mandates like the EU AI Act, some form of structured oversight for frontier AI capabilities is taking shape. Companies building, deploying, or relying on AI systems should be engaging with these frameworks proactively rather than reactively.

The UK AI Safety Institute deserves credit for publishing this evaluation in detail. Transparency about AI capabilities — including uncomfortable capabilities — is a prerequisite for informed governance. Too much of the AI safety discussion has operated in the abstract, trading in hypotheticals and thought experiments. The AISI report grounds the conversation in empirical data. Here’s what the model can do. Here’s what it can’t. Here are the gaps in our ability to measure the difference.

That kind of clarity is rare. And necessary.

The models will keep getting better. The evaluations need to keep up. And the rest of us — the people building systems, defending networks, and making policy — need to be paying very close attention to the gap between what AI can do today and what it will be able to do tomorrow. Because that gap is closing faster than most people realize.

When AI Learns to Hack: Inside the UK’s Alarming Test of Anthropic’s Most Capable Model first appeared on Web and IT News.

awnewsor

Next The $5 Gallon Is Coming Back: Why America’s Gas Prices Could Spike Hard Before 2026 »

Previous « Gas Prices Drive U.S. Economic Pessimism Despite Positive Indicators

The Quiet Death of the Dumb Terminal: Why Claude’s New Computer Use Is the Real AI Interface War

Anthropic just made its AI agent permanently resident on your desktop. Not as a chatbot…

15 hours ago

Web and IT News

The Billionaire Who Says Your Kids Should Learn to Code Like They Learn to Read — And Why Wall Street Should Listen

Jack Clark thinks coding is the new literacy. Not in the vague, aspirational way that…

15 hours ago

Web and IT News

Your AI Chatbot Is Flattering You — And It’s Making Its Answers Worse

Ask a chatbot a question and you’ll get an answer. But the answer you get…

15 hours ago

Web and IT News

Google Photos Finally Fixes Its Most Annoying Editing Flaw — And It’s About Time

For years, cropping a photo in Google Photos has been an exercise in quiet frustration.…

15 hours ago

Web and IT News

The Squeeze Is On: How U.S. Sanctions, OPEC Politics, and a Shadow War Are Reshaping Global Oil Markets

OPEC’s crude oil production dropped sharply in May, and the reasons stretch far beyond the…

15 hours ago

Web and IT News

Google’s Gemini Is About to Know You Better Than You Know Yourself — And That’s the Whole Point

Google is making its biggest bet yet on the idea that artificial intelligence should be…

15 hours ago

This website uses cookies.

When AI Learns to Hack: Inside the UK’s Alarming Test of Anthropic’s Most Capable Model

Related Post

Recent Posts

The Quiet Death of the Dumb Terminal: Why Claude’s New Computer Use Is the Real AI Interface War

The Billionaire Who Says Your Kids Should Learn to Code Like They Learn to Read — And Why Wall Street Should Listen

Your AI Chatbot Is Flattering You — And It’s Making Its Answers Worse

Google Photos Finally Fixes Its Most Annoying Editing Flaw — And It’s About Time

The Squeeze Is On: How U.S. Sanctions, OPEC Politics, and a Shadow War Are Reshaping Global Oil Markets

Google’s Gemini Is About to Know You Better Than You Know Yourself — And That’s the Whole Point