
Throughout the frontier labs, the best immediate injection figures revealed this spring are Anthropic’s. Level a red-teamer at its latest mannequin in a browser, and the attacker hijacked it 31.5% of the time earlier than safeguards engaged. OpenAI, Google, and Meta by no means gave safety leaders a comparable quantity to set beside it. That determine seems to be like a legal responsibility. On this comparability, it’s the reverse. It is the one stable piece of floor.
4 frontier labs every shipped a immediate injection disclosure, and no two match. Anthropic put 244 pages and 4 agentic surfaces on the desk on Could 28. OpenAI reported one floor, connectors. Google moved the topic out of the mannequin card and right into a separate security framework. Meta shipped no closed-model card in any respect. The Cross-Vendor Immediate Injection Disclosure Grid under maps what every lab examined, what each measured, and the 4 locations a side-by-side comparability falls aside.
A immediate injection hides a malicious instruction in one thing an agent reads, an online web page, a doc, or a software outcome. One planted line can exfiltrate data or fireplace off actions no person authorized, and these playing cards are a purchaser’s solely first-party proof.
There isn’t a business commonplace for measuring any of this, and that’s the root of the issue. Carter Rees, VP of AI at Status, informed VentureBeat that immediate injection breaks the idea that each legacy software was constructed on. “A phrase as innocuous as, ‘ignore earlier directions’ can carry a payload as devastating as a buffer overflow, but it shares no commonality with recognized malware signatures.” With no shared signature to scan for, every lab constructed its personal yardstick, and the outcomes don’t line up.
Adam Meyers, Senior Vice President of Counter Adversary Operations at CrowdStrike, mentioned that the publicity is now the client’s to handle. “As you implement AI, it will increase your assault floor, so now you might have to have the ability to shield these AI fashions towards adversary misuse or knowledge poisoning or immediate injection.” CrowdStrike’s personal frontline knowledge exhibits the risk aspect will not be standing nonetheless. In its 2026 Monetary Providers Menace Panorama Report, launched in Could, the corporate reported adversaries utilizing AI to compress the time from preliminary entry to impression sooner than legacy defenses can reply.
Anthropic measured 4 surfaces. The numbers swing by an order of magnitude relying on which one you learn.
The Opus 4.8 card does what others don’t: It breaks immediate injection out by floor, and the unfold is the story.
Put the mannequin in a coding surroundings, and an adaptive attacker from Grey Swan’s Shade software bought by way of on 7.03% of single makes an attempt with considering on. Safeguards pulled that to 2.09%.
Transfer the identical class of assault right into a browser, the floor behind Claude in Chrome and Claude Cowork, and the ground offers method. Anthropic put skilled red-teamers on 129 net environments held out from coaching and printed each end in Desk 5.2.2.4.A on web page 81 of the system card. Per-attempt is the share of all injection makes an attempt that bought by way of throughout 129 environments at 10 tries every. Per-scenario is the more durable minimize, the share of environments the place at the very least one strive landed.
Learn down the per-attempt column with out safeguards, considering on, and the uncooked fee drops with every era, from Sonnet 4.6 at 50.7% to Opus 4.8 at 31.5%. The bottom within the desk, 5.9%, belongs to Mythos Preview, which no person can purchase but. Flip safeguards on, and Opus 4.8 drops to 0.5%. Flip considering off and it drops to zero throughout all 129 environments.
OpenAI measured one floor, with assaults it already knew.
The GPT-5.5 card, revealed April 23 and up to date April 24, handles immediate injection in a single place, a single part on robustness to recognized assaults towards connectors. OpenAI reviews it as a robustness rating the place increased is best, the inverse of an assault success fee. GPT-5.5 got here in at 0.963, down from 0.998 for GPT-5.4-thinking. That one determine is the entire disclosure.
Anthropic examined 4 surfaces towards an adaptive attacker that rewrites its strategy primarily based on what the mannequin does, then ran a one-week bug bounty the place red-teamers tried to interrupt the mannequin reside. When the coding outcomes got here again worse than Opus 4.7, the cardboard mentioned so.
Lay the 0.963 subsequent to the 31.5%, and so they appear to be they belong on a scoreboard. They don’t. One is a robustness rating towards recognized assaults on one floor. The opposite is a per-attempt assault success fee throughout 129 browser environments towards an attacker that tailored in actual time.
Google and Meta by no means put the quantity within the card in any respect
Google’s Gemini 3 recordsdata immediate injection below mitigations, and the launch supplies describe stronger resistance with no quantity hooked up. The Frontier Security Framework report does run purple teaming, however throughout its functionality domains, and immediate injection will not be certainly one of them. No mannequin card, no framework web page, no per-surface quantity a purchaser can elevate right into a danger assessment.
Meta ships open weights with no closed-model card. Immediate injection protection sits in a separate stack, Purple Llama’s LlamaFirewall. A PromptGuard 2 classifier and an AlignmentCheck auditor, run towards the general public AgentDojo benchmark and its 97 duties, minimize assault success from 17.6% with no protection to 1.75% mixed. Actual numbers. They grade the guardrails on a public benchmark, not the mannequin on a deployment floor a safety staff would acknowledge.
The Cross-Vendor Immediate Injection Disclosure Grid
The grid under works on any frontier mannequin safety groups are weighing. Every row marks a spot the place the 4 labs are cut up. Every cut up is the place a fast comparability breaks. The Anthropic figures come from the Opus 4.8 system card. Every little thing for the opposite three comes from every vendor’s revealed security documentation.
|
Dimension |
Anthropic, Opus 4.8 |
OpenAI, GPT-5.5 |
Google, Gemini 3.x |
Meta, Llama stack |
|
Security doc |
System card, Could 28 2026, 244 pages |
System card, April 23 2026, up to date April 24 |
Mannequin card plus a separate Frontier Security Framework report |
No closed-model card. Open weights plus the Purple Llama stack |
|
Injection benchmark or dataset |
ART from Grey Swan and UK AISI, the Shade software, plus an inner browser eval, 129 environments |
Inner connectors analysis, recognized assaults |
None for injection |
AgentDojo, 97 duties |
|
Surfaces with an injection eval |
4. Device use, coding, pc use, browser |
One. Connectors |
None revealed for injection |
One. AgentDojo agent duties |
|
Multi-attempt escalation proven |
Sure. ART benchmark at 1, 10, 100. Coding and pc use at 1 and 200 |
No. A single rating |
No |
No |
|
Headline metric and unit |
Assault-success fee. Browser, with considering, 31.5% uncooked, 0.5% safeguarded |
Robustness rating, increased is best. 0.963, down from 0.998 for GPT-5.4-thinking |
None revealed. Elevated resistance claimed qualitatively |
Assault-success fee on AgentDojo. 17.6% baseline to 1.75% mixed |
|
Stay exterior bounty |
Sure. One-week reside injection bounty with exterior red-teamers |
No injection bounty. Bio bounty solely |
None discovered |
None discovered |
|
Regression disclosed |
Sure, express, with numbers |
Quantity fell 0.998 to 0.963, not framed as a regression |
Elevated resistance claimed, no numbers |
Not relevant |
5 components safety groups want to contemplate now
Anthropic examined 4 surfaces and printed each quantity. OpenAI examined one. Google printed no per-surface fee. Meta graded its guardrails, not the mannequin. The 4 disclosures don’t add as much as a comparability. These 5 steps construct one.
Pull each agent you might have deployed or scoped and tag every by the floor it touches, browser, code, connectors, or desktop. Anthropic’s fee for Opus 4.8 runs 2.09% on coding and 0.5% on browser. A blended quantity covers neither. Pull the seller’s revealed fee in your particular floor. If the seller by no means revealed one, deal with it as untested.
Ship the Cross-Vendor grid to each vendor below analysis. A 0.963 connectors rating and a 31.5% browser fee have been by no means on one scale. Demand a per-surface assault success fee, uncooked and safeguarded, with the attacker methodology named. The clean cells are the surfaces with no first-party proof.
Affirm in writing which quantity your integration will get. Anthropic’s 0.5% comes from Claude in Chrome and Cowork with the complete safeguard stack. On the API, the mannequin ships with out them. Don’t settle for a product quantity for an API deployment.
Add two clauses to the RFP. The seller examined with an adaptive attacker that rewrites payloads towards the mannequin, and somebody exterior the corporate tried to interrupt it. Anthropic ran Grey Swan’s adaptive Shade software and a one-week paid bounty. OpenAI examined recognized assaults on one floor. Adversaries don’t submit recognized payloads.
Run your personal injection check earlier than any agent ships. Vendor numbers come from vendor environments with vendor system prompts. Your stack has its personal prompts, permissions, and knowledge entry. Set a move threshold. Something above it doesn’t go reside.
The underside line. No commonplace exists for this but. A vendor’s quantity tells you what it selected to measure. Your individual purple staff tells you what you might be uncovered to.

