Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

A safety researcher, working with colleagues at Johns Hopkins College, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Safety Evaluate motion put up its personal API key as a remark. The identical immediate injection labored on Google’s Gemini CLI Motion and GitHub’s Copilot Agent (Microsoft). No exterior infrastructure required.

Aonan Guan, the researcher who found the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, printed the complete technical disclosure final week, calling it “Remark and Management.” GitHub Actions doesn’t expose secrets and techniques to fork pull requests by default when utilizing the pull_request set off, however workflows utilizing pull_request_target, which most AI agent integrations require for secret entry, do inject secrets and techniques into the runner setting. This limits the sensible assault floor however doesn’t eradicate it: collaborators, remark fields, and any repo utilizing pull_request_target with an AI coding agent are uncovered.

Per Guan’s disclosure timeline: Anthropic categorized it as CVSS 9.4 Important ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 by means of the Copilot Bounty Program. The $100 quantity is notably low relative to the CVSS 9.4 ranking; Anthropic’s HackerOne program scopes agent-tooling findings individually from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs within the NVD or printed safety advisories by means of GitHub Safety Advisories as of Saturday.

Remark and Management exploited a immediate injection vulnerability in Claude Code Safety Evaluate, a particular GitHub Motion function that Anthropic’s personal system card acknowledged is “not hardened towards immediate injection.” The function is designed to course of trusted first-party inputs by default; customers who decide into processing untrusted exterior PRs and points settle for further danger and are chargeable for proscribing agent permissions. Anthropic up to date its documentation to make clear this working mannequin after the disclosure. The identical class of assault operates beneath OpenAI’s safeguard layer on the agent runtime, based mostly on what their system card doesn’t doc — not a demonstrated exploit. The exploit is the proof case, however the story is what the three system playing cards reveal concerning the hole between what distributors doc and what they shield.

OpenAI and Google didn’t reply for remark by publication time.

“On the motion boundary, not the mannequin boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, instructed VentureBeat when requested the place safety really wants to take a seat. “The runtime is the blast radius.”

What the system playing cards let you know

Anthropic’s Opus 4.7 system card runs 232 pages with quantified hack charges and injection resistance metrics. It discloses a restricted mannequin technique (Mythos held again as a functionality preview) and states instantly that Claude Code Safety Evaluate is “not hardened towards immediate injection.” The system card explains to readers that the runtime was uncovered. Remark and Management proved it. Anthropic does gate sure agent actions exterior the system card’s scope — Claude Code Auto Mode, for instance, applies runtime-level protections — however the system card itself doesn’t doc these runtime safeguards or their protection.

OpenAI’s GPT-5.4 system card paperwork in depth purple teaming and publishes model-layer injection evals however not agent-runtime or tool-execution resistance metrics. Trusted Entry for Cyber scales entry to 1000’s. The system card tells you what purple teamers examined. It doesn’t let you know how resistant the mannequin is to the assaults they discovered.

Google’s Gemini 3.1 Professional mannequin card, shipped in February, defers most security methodology to older documentation, a VentureBeat overview of the cardboard discovered. Google’s Automated Pink Teaming program stays inside solely. No exterior cyber program.

Dimension	Anthropic (Opus 4.7)	OpenAI (GPT-5.4)	Google (Gemini 3.1 Professional)
System card depth	232 pages. Quantified hack charges, classifier scores, and injection resistance metrics.	Intensive. Pink teaming hours documented. No injection resistance charges printed.	Few pages. Defers to older Gemini 3 Professional card. No quantified outcomes.
Cyber verification program	CVP. Removes cyber safeguards for vetted pentesters and purple teamers doing approved offensive work. Doesn’t deal with immediate injection protection. Platform and data-retention exclusions not but publicly documented.	TAC. Scaled to 1000’s. Constrains ZDR.	None. No exterior defender pathway.
Restricted mannequin technique	Sure. Mythos held again as a functionality preview. Opus 4.7 is the testbed.	No restricted mannequin. Full functionality launched, entry gated.	No restricted mannequin. No acknowledged plan for one.
Runtime agent safeguards	Claude Code Safety Evaluate: system card states it isn’t hardened towards immediate injection. The function is designed for trusted first-party inputs. Anthropic applies further runtime protections (e.g., Claude Code Auto Mode) not documented within the system card.	Not documented. TAC governs entry, not agent operations.	Not documented. ART inside solely.
Exploit response (Remark and Management)	CVSS 9.4 Important. $100 bounty. Patched. No CVE.	In a roundabout way exploited. Structural hole inferred from TAC design, not demonstrated.	$1,337 bounty per Guan disclosure. Patched. No CVE.
Injection resistance knowledge	Revealed. Quantified charges within the system card.	Mannequin-layer injection evals printed. No agent-runtime or tool-execution resistance charges.	Not printed. No quantified knowledge out there.

Baer provided particular procurement questions. “For Anthropic, ask how security outcomes really switch throughout functionality jumps,” she instructed VentureBeat. “For OpenAI, ask what ‘trusted’ means below compromise.” For each, she stated, administrators must “demand readability on whether or not safeguards prolong into software execution, not simply immediate filtering.”

Seven menace courses neither safeguard method closes

Every row names what breaks, why your controls miss it, what Remark and Management proved, and the advisable motion for the week forward.

Menace Class	What Breaks	Why Your Controls Miss It	What Remark and Management Proved	Really useful Motion
1. Deployment floor mismatch	CVP is designed for approved offensive safety analysis, not immediate injection protection. It doesn’t prolong to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your crew could also be working a verified mannequin on an unverified floor.	Launch bulletins describe this system. Help documentation lists the exclusions. Safety groups learn the announcement. Procurement reads neither.	The exploit targets the agent runtime, not the deployment platform. A crew working Claude Code on Bedrock is exterior CVP protection, however CVP was not designed to deal with this class of vulnerability within the first place.	E mail your Anthropic and OpenAI reps at present. One query, in writing: ‘Affirm whether or not [your platform] and [your data retention config] are lined by your runtime-level immediate injection protections, and describe what these protections embody.’ File the response in your vendor danger register.
2. CI secrets and techniques uncovered to AI brokers	ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any manufacturing secret saved as a GitHub Actions env var are readable by each workflow step, together with AI coding brokers.	The default GitHub Actions config doesn’t scope secrets and techniques to particular person steps. Repo-level and org-level secrets and techniques propagate to all workflows. Most groups by no means audit which steps entry which secrets and techniques.	The agent learn the API key from the runner env var, encoded it in a PR remark physique, and posted it by means of GitHub’s API. No attacker-controlled infrastructure required. Exfiltration ran by means of GitHub’s personal API — the platform itself grew to become the C2 channel.	Run: grep -r ‘secrets and techniques.’ .github/workflows/ throughout each repo with an AI agent. Record each secret the agent can entry. Rotate all uncovered credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI).
3. Over-permissioned agent runtimes	AI brokers granted bash execution, git push, and API write entry at setup. Permissions by no means scoped down. No periodic least-privilege overview. Brokers accumulate entry in the identical means service accounts do.	Brokers are configured as soon as throughout onboarding and inherited throughout repos. No tooling flags unused permissions. The Remark and Management agent had bash, write, and env-read entry for a code overview activity.	The agent had bash entry it didn’t want for code overview. It used that entry to learn env vars and put up exfiltrated knowledge. Stripping bash would have blocked the assault chain solely.	Audit agent permissions repo by repo. Strip bash from code overview brokers. Set repo entry to read-only. Gate write entry (PR feedback, commits, merges) behind a human approval step.
4. No CVE sign for AI agent vulnerabilities	CVSS 9.4 Important. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC software all present inexperienced.	No CNA has but issued a CVE for a coding agent immediate injection, and present CVE practices haven’t captured this class of failure mode. Distributors patch by means of model bumps. Qualys, Tenable, and Rapid7 don’t have anything to scan for.	A SOC analyst working a full scan on Monday morning would discover zero entries for a Important vulnerability that hit Claude Code Safety Evaluate, Gemini CLI Motion, and Copilot concurrently.	Create a brand new class in your provide chain danger register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with every vendor’s safety contact. Don’t look forward to CVEs. None have come but, and the taxonomy hole makes them unlikely with out trade strain.
5. Mannequin safeguards don’t govern agent actions	Opus 4.7 blocks a phishing e mail immediate. It doesn’t block an agent from studying $ANTHROPIC_API_KEY and posting it as a PR remark. Safeguards gate technology, not operation.	Safeguards filter mannequin outputs (textual content). Agent operations (bash, git push, curl, API POST) bypass safeguard analysis solely. The runtime is exterior the safeguard perimeter. Anthropic applies some runtime-level protections in options like Claude Code Auto Mode, however these are usually not documented within the system card and their scope is just not publicly outlined.	The agent by no means generated prohibited content material. It carried out a professional operation (put up a PR remark) containing exfiltrated knowledge. Safeguards by no means triggered.	Map each operation your AI brokers carry out: bash, git, API calls, file writes. For every, ask the seller in writing: does your safeguard layer consider this motion earlier than execution? Doc the reply.
6. Untrusted enter parsed as directions	PR titles, PR physique textual content, problem feedback, code overview feedback, and commit messages are all parsed by AI coding brokers as context. Any can comprise injected directions.	No enter sanitization layer between GitHub and the agent instruction set. The agent can’t distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Motion is designed for trusted first-party inputs by default. Customers who decide into processing untrusted exterior PRs settle for further danger.	A single malicious PR title grew to become a whole exfiltration command. The agent handled it as a professional instruction and executed it with out validation or affirmation.	Implement enter sanitization as defense-in-depth, however don’t depend on conventional WAF-style regex patterns. LLM immediate injections are non-deterministic and can evade static sample matching. Limit agent context to accredited workflow configs and mix with least-privilege permissions.
7. No comparable injection resistance knowledge throughout distributors	Anthropic publishes quantified injection resistance charges in 232 pages. OpenAI publishes model-layer injection evals however no agent-runtime resistance charges. Google publishes a few-page card referencing an older mannequin.	No trade normal for AI security metric disclosure. Distributors could have inside metrics and red-team packages, however printed disclosures are usually not comparable. Procurement has no baseline and no framework to require one.	Anthropic, OpenAI, and Google have been all accredited for enterprise use with out comparable injection resistance knowledge. The exploit uncovered what unmeasured danger seems to be like in manufacturing.	Write one sentence in your subsequent vendor assembly: ‘Present me your quantified injection resistance charge for my mannequin model on my platform.’ Doc refusals for EU AI Act high-risk compliance. Deadline: August 2026.

OpenAI’s GPT-5.4 was circuitously exploited within the Remark and Management disclosure. The gaps recognized within the OpenAI and Google columns are inferred from what their system playing cards and program documentation don’t publish, not from demonstrated exploits. That distinction issues. Absence of printed runtime metrics is a transparency hole, not proof of a vulnerability. It does imply procurement groups can’t confirm what they can’t measure.

Eligibility necessities for Anthropic’s Cyber Verification Program and OpenAI’s Trusted Entry for Cyber are nonetheless evolving, as are platform protection and program scope, so safety groups ought to validate present vendor docs earlier than treating any protection described right here as definitive. Anthropic’s CVP is designed for approved offensive safety analysis — eradicating cyber safeguards for vetted actors — and isn’t a immediate injection protection program. Safety leaders mapping these gaps to current frameworks can align menace courses 1–3 with NIST CSF 2.0 GV.SC (Provide Chain Danger Administration), menace class 4 with ID.RA (Danger Evaluation), and menace courses 5–7 with PR.DS (Knowledge Safety).

Remark and Management focuses on GitHub Actions at present, however the seven menace courses generalize to most CI/CD runtimes the place AI brokers execute with entry to secrets and techniques, together with GitHub Actions, GitLab CI, CircleCI, and customized runners. Security metric disclosure codecs are in flux throughout all three distributors; Anthropic at the moment leads on printed quantification in its system card documentation, however norms are prone to converge as EU AI Act obligations come into pressure. Remark and Management focused Claude Code GitHub Motion, a particular product function, not Anthropic’s fashions broadly. The vulnerability class, nonetheless, applies to any AI coding agent working in a CI/CD runtime with entry to secrets and techniques.

What to do earlier than your subsequent vendor renewal

“Don’t standardize on a mannequin. Standardize on a management structure,” Baer instructed VentureBeat. “The danger is systemic to agent design, not vendor-specific. Keep portability so you may swap fashions with out transforming your safety posture.”

Construct a deployment map. Affirm your platform qualifies for the runtime protections you suppose cowl you. If you happen to run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level immediate injection protections apply to your deployment floor. E mail your account rep at present. (Anthropic Cyber Verification Program)

Audit each runner for secret publicity. Run grep -r ‘secrets and techniques.’ .github/workflows/ throughout each repo with an AI coding agent. Record each secret the agent can entry. Rotate all uncovered credentials. (GitHub Actions secrets and techniques documentation)

Begin migrating credentials now. Swap saved secrets and techniques to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all help OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to 2 quarters, beginning with repos working AI brokers. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs)

Repair agent permissions repo by repo. Strip bash execution from each AI agent doing code overview. Set repository entry to read-only. Gate write entry behind a human approval step. (GitHub Actions permissions documentation)

Add enter sanitization as one layer, not the one layer. Filter pull request titles, feedback, and overview threads for instruction patterns earlier than they attain brokers. Mix with least-privilege permissions and OIDC. Static regex won’t catch non-deterministic immediate injections by itself.

Add “AI agent runtime” to your provide chain danger register. Assign a 48-hour patch verification cadence with every vendor’s safety contact. Don’t look forward to CVEs. None have come but for this class of vulnerability.

Examine which hardened GitHub Actions mitigations you have already got in place. Hardened GitHub Actions configurations block this assault class at present: the permissions key restricts GITHUB_TOKEN scope, setting safety guidelines require approval earlier than secrets and techniques are injected, and first-time-contributor gates forestall exterior pull requests from triggering agent workflows. (GitHub Actions safety hardening information)

Put together one procurement query per vendor earlier than your subsequent renewal. Write one sentence: “Present me your quantified injection resistance charge for the mannequin model I run on the platform I deploy to.” Doc refusals for EU AI Act high-risk compliance. The deadline is August 2026.

“Uncooked zero-days aren’t how most techniques get compromised. Composability is,” Baer stated. “It’s the glue code, the tokens in CI, the over-permissioned brokers. While you wire a strong mannequin right into a permissive runtime, you’ve already performed many of the attacker’s work for them.”

Source link

Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

Amazon Launches Three New Kindle Scribe eReaders

iOS 27 features Apple didn’t highlight: Full-screen widgets, smarter messages, better clipboard and more | Technology News

When is Wear OS 7 Coming to the Pixel Watch? Yesterday, Apparently

Secrets Behind How Jay Z Went From Locs to an Afro ‘Seemingly Overnight’

Amazon Launches Three New Kindle Scribe eReaders

Regulators’ proposed prediction markets rules ban trading on terrorism, assassinations

Cristiano Ronaldo’s influence, movement and finishing remain a ‘big, big strength’ at 41

Karmelo Anthony Found Guilty Of Murdering Austin Metcalf at Track Meet

Saweety beats Wang Lina to help India bag 2nd gold at World Boxing Championship

To meet states’ demand, PM Awas Yojana gets Rs 13,000 crore extra

Amendments to J&K Act bolster LG’s role | Latest News India

Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

What the system playing cards let you know

Seven menace courses neither safeguard method closes

What to do earlier than your subsequent vendor renewal

Related Posts