Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

Throughout the frontier labs, the best immediate injection figures revealed this spring are Anthropic’s. Level a red-teamer at its latest mannequin in a browser, and the attacker hijacked it 31.5% of the time earlier than safeguards engaged. OpenAI, Google, and Meta by no means gave safety leaders a comparable quantity to set beside it. That determine seems to be like a legal responsibility. On this comparability, it’s the reverse. It is the one stable piece of floor.

4 frontier labs every shipped a immediate injection disclosure, and no two match. Anthropic put 244 pages and 4 agentic surfaces on the desk on Could 28. OpenAI reported one floor, connectors. Google moved the topic out of the mannequin card and right into a separate security framework. Meta shipped no closed-model card in any respect. The Cross-Vendor Immediate Injection Disclosure Grid under maps what every lab examined, what each measured, and the 4 locations a side-by-side comparability falls aside.

A immediate injection hides a malicious instruction in one thing an agent reads, an online web page, a doc, or a software outcome. One planted line can exfiltrate data or fireplace off actions no person authorized, and these playing cards are a purchaser’s solely first-party proof.

There isn’t a business commonplace for measuring any of this, and that’s the root of the issue. Carter Rees, VP of AI at Status, informed VentureBeat that immediate injection breaks the idea that each legacy software was constructed on. “A phrase as innocuous as, ‘ignore earlier directions’ can carry a payload as devastating as a buffer overflow, but it shares no commonality with recognized malware signatures.” With no shared signature to scan for, every lab constructed its personal yardstick, and the outcomes don’t line up.

Adam Meyers, Senior Vice President of Counter Adversary Operations at CrowdStrike, mentioned that the publicity is now the client’s to handle. “As you implement AI, it will increase your assault floor, so now you might have to have the ability to shield these AI fashions towards adversary misuse or knowledge poisoning or immediate injection.” CrowdStrike’s personal frontline knowledge exhibits the risk aspect will not be standing nonetheless. In its 2026 Monetary Providers Menace Panorama Report, launched in Could, the corporate reported adversaries utilizing AI to compress the time from preliminary entry to impression sooner than legacy defenses can reply.

Anthropic measured 4 surfaces. The numbers swing by an order of magnitude relying on which one you learn.

The Opus 4.8 card does what others don’t: It breaks immediate injection out by floor, and the unfold is the story.

Put the mannequin in a coding surroundings, and an adaptive attacker from Grey Swan’s Shade software bought by way of on 7.03% of single makes an attempt with considering on. Safeguards pulled that to 2.09%.

Transfer the identical class of assault right into a browser, the floor behind Claude in Chrome and Claude Cowork, and the ground offers method. Anthropic put skilled red-teamers on 129 net environments held out from coaching and printed each end in Desk 5.2.2.4.A on web page 81 of the system card. Per-attempt is the share of all injection makes an attempt that bought by way of throughout 129 environments at 10 tries every. Per-scenario is the more durable minimize, the share of environments the place at the very least one strive landed.

Learn down the per-attempt column with out safeguards, considering on, and the uncooked fee drops with every era, from Sonnet 4.6 at 50.7% to Opus 4.8 at 31.5%. The bottom within the desk, 5.9%, belongs to Mythos Preview, which no person can purchase but. Flip safeguards on, and Opus 4.8 drops to 0.5%. Flip considering off and it drops to zero throughout all 129 environments.

OpenAI measured one floor, with assaults it already knew.

The GPT-5.5 card, revealed April 23 and up to date April 24, handles immediate injection in a single place, a single part on robustness to recognized assaults towards connectors. OpenAI reviews it as a robustness rating the place increased is best, the inverse of an assault success fee. GPT-5.5 got here in at 0.963, down from 0.998 for GPT-5.4-thinking. That one determine is the entire disclosure.

Anthropic examined 4 surfaces towards an adaptive attacker that rewrites its strategy primarily based on what the mannequin does, then ran a one-week bug bounty the place red-teamers tried to interrupt the mannequin reside. When the coding outcomes got here again worse than Opus 4.7, the cardboard mentioned so.

Lay the 0.963 subsequent to the 31.5%, and so they appear to be they belong on a scoreboard. They don’t. One is a robustness rating towards recognized assaults on one floor. The opposite is a per-attempt assault success fee throughout 129 browser environments towards an attacker that tailored in actual time.

Google and Meta by no means put the quantity within the card in any respect

Google’s Gemini 3 recordsdata immediate injection below mitigations, and the launch supplies describe stronger resistance with no quantity hooked up. The Frontier Security Framework report does run purple teaming, however throughout its functionality domains, and immediate injection will not be certainly one of them. No mannequin card, no framework web page, no per-surface quantity a purchaser can elevate right into a danger assessment.

Meta ships open weights with no closed-model card. Immediate injection protection sits in a separate stack, Purple Llama’s LlamaFirewall. A PromptGuard 2 classifier and an AlignmentCheck auditor, run towards the general public AgentDojo benchmark and its 97 duties, minimize assault success from 17.6% with no protection to 1.75% mixed. Actual numbers. They grade the guardrails on a public benchmark, not the mannequin on a deployment floor a safety staff would acknowledge.

The Cross-Vendor Immediate Injection Disclosure Grid

The grid under works on any frontier mannequin safety groups are weighing. Every row marks a spot the place the 4 labs are cut up. Every cut up is the place a fast comparability breaks. The Anthropic figures come from the Opus 4.8 system card. Every little thing for the opposite three comes from every vendor’s revealed security documentation.

Dimension	Anthropic, Opus 4.8	OpenAI, GPT-5.5	Google, Gemini 3.x	Meta, Llama stack
Security doc	System card, Could 28 2026, 244 pages	System card, April 23 2026, up to date April 24	Mannequin card plus a separate Frontier Security Framework report	No closed-model card. Open weights plus the Purple Llama stack
Injection benchmark or dataset	ART from Grey Swan and UK AISI, the Shade software, plus an inner browser eval, 129 environments	Inner connectors analysis, recognized assaults	None for injection	AgentDojo, 97 duties
Surfaces with an injection eval	4. Device use, coding, pc use, browser	One. Connectors	None revealed for injection	One. AgentDojo agent duties
Multi-attempt escalation proven	Sure. ART benchmark at 1, 10, 100. Coding and pc use at 1 and 200	No. A single rating	No	No
Headline metric and unit	Assault-success fee. Browser, with considering, 31.5% uncooked, 0.5% safeguarded	Robustness rating, increased is best. 0.963, down from 0.998 for GPT-5.4-thinking	None revealed. Elevated resistance claimed qualitatively	Assault-success fee on AgentDojo. 17.6% baseline to 1.75% mixed
Stay exterior bounty	Sure. One-week reside injection bounty with exterior red-teamers	No injection bounty. Bio bounty solely	None discovered	None discovered
Regression disclosed	Sure, express, with numbers	Quantity fell 0.998 to 0.963, not framed as a regression	Elevated resistance claimed, no numbers	Not relevant

5 components safety groups want to contemplate now

Anthropic examined 4 surfaces and printed each quantity. OpenAI examined one. Google printed no per-surface fee. Meta graded its guardrails, not the mannequin. The 4 disclosures don’t add as much as a comparability. These 5 steps construct one.

Pull each agent you might have deployed or scoped and tag every by the floor it touches, browser, code, connectors, or desktop. Anthropic’s fee for Opus 4.8 runs 2.09% on coding and 0.5% on browser. A blended quantity covers neither. Pull the seller’s revealed fee in your particular floor. If the seller by no means revealed one, deal with it as untested.

Ship the Cross-Vendor grid to each vendor below analysis. A 0.963 connectors rating and a 31.5% browser fee have been by no means on one scale. Demand a per-surface assault success fee, uncooked and safeguarded, with the attacker methodology named. The clean cells are the surfaces with no first-party proof.

Affirm in writing which quantity your integration will get. Anthropic’s 0.5% comes from Claude in Chrome and Cowork with the complete safeguard stack. On the API, the mannequin ships with out them. Don’t settle for a product quantity for an API deployment.

Add two clauses to the RFP. The seller examined with an adaptive attacker that rewrites payloads towards the mannequin, and somebody exterior the corporate tried to interrupt it. Anthropic ran Grey Swan’s adaptive Shade software and a one-week paid bounty. OpenAI examined recognized assaults on one floor. Adversaries don’t submit recognized payloads.

Run your personal injection check earlier than any agent ships. Vendor numbers come from vendor environments with vendor system prompts. Your stack has its personal prompts, permissions, and knowledge entry. Set a move threshold. Something above it doesn’t go reside.

The underside line. No commonplace exists for this but. A vendor’s quantity tells you what it selected to measure. Your individual purple staff tells you what you might be uncovered to.

Source link

Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

Frontier AI is rewriting the economics of software supply chain security

Tech updates (June 29, 2026): Samsung Galaxy M47, Infinix Note 60 Pro, Salesforce, DashORM, and more | Technology News

Nothing Phone (4b) Release Date & Snapdragon Processor Revealed

Samsung Galaxy Ring 2 Confirmed with Upgrades Teased

AOC Says 1 Familiar Way To Save Money Isn’t Enough Anymore, Thanks To Trump

Meghan Markle’s Reality TV Friendship Has ‘Mortified’ Harry

Frontier AI is rewriting the economics of software supply chain security

Chevron CFO reveals why gas prices are stuck

South Korean Footballer Hwang Ui-jo Apologises For Filming Secret Sex Videos

Maradona’s daughter takes a dig at Argentine national team for not honouring her father the way Brazil did with Pele

Chipotle heads south of the border with opening of 1st location in Mexico

Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

Anthropic measured 4 surfaces. The numbers swing by an order of magnitude relying on which one you learn.

OpenAI measured one floor, with assaults it already knew.

Google and Meta by no means put the quantity within the card in any respect

The Cross-Vendor Immediate Injection Disclosure Grid

5 components safety groups want to contemplate now

Related Posts