As businesses and governments increasingly rely on AI agents to perform high-level tasks online, security researchers continue to uncover serious vulnerabilities in large language models that malicious actors can exploit. The latest such discovery comes from LayerX, a browser security firm, which identified a critical bug in the Chrome extension for Anthropic’s Claude AI model.
The vulnerability allows any other browser plugin—even those without special permissions—to inject hidden instructions that can hijack the AI agent. Aviad Gispan, senior researcher at LayerX, explained the flaw’s origin: “The bug stems from an instruction in the extension’s code that permits any script running in the origin browser to communicate with Claude’s LLM without verifying the script’s source.”
This oversight enables any extension to invoke a content script—which requires no special permissions—and issue commands directly to the Claude extension. Gispan demonstrated the exploit by executing arbitrary prompts, bypassing Claude’s safety guardrails, evading user confirmation, and performing cross-site actions across Google’s ecosystem.
In a proof-of-concept attack, LayerX successfully:
- Extracted files from a user’s Google Drive and shared them with unauthorized parties.
- Surveilled recent email activity and sent emails on behalf of the victim.
- Pilfered private source code from a connected GitHub repository.
The vulnerability effectively breaks Chrome’s extension security model, Gispan noted, creating “a privilege escalation primitive across extensions—a scenario Chrome’s security framework is explicitly designed to prevent.”
Claude’s decision-making relies on text, UI semantics, and screenshot interpretation—all of which an attacker can manipulate. The researchers altered the AI’s interface to remove labels and indicators around sensitive data, such as passwords and sharing prompts, then tricked Claude into sharing files with an external server. Such attacks leave little obvious malicious activity for defenders to detect. Even when traces exist, the model can be prompted to delete emails or cover its tracks.
Ax Sharma, Head of Research at Manifold Security, called the flaw “a stark reminder that monitoring AI agents at the prompt layer is fundamentally insufficient.” He added, “The most insidious part of this attack isn’t the injection itself but the manipulation of the agent’s perceived environment to produce actions that appear legitimate from within. This is the kind of threat the industry must prioritize defenses against.”
LayerX reported the vulnerability to Anthropic on April 27. However, the company responded the next day, claiming the bug was a duplicate of another vulnerability already slated for a future update. Anthropic released a partial fix on May 6, introducing new approval flows for privileged actions to harden the system. Despite this, Gispan noted that he could still hijack Claude’s agent in certain scenarios, stating, “Switching to ‘privileged’ mode, even without...”