Almost every AI cost-control tool on the market works the same way: your app talks to a gateway, the gateway talks to OpenAI or Anthropic, and every prompt, every completion, every API key passes through a server somebody else runs. That's not a criticism — routing through a shared choke point is the obvious way to add budgets, logging, and fallback across a whole organization's traffic. It's also the design that turns the gateway into the single thing that, if compromised, sees everything.
In March 2026, that stopped being theoretical.
What actually happened to LiteLLM
For about forty minutes on March 24, two releases of LiteLLM — a widely used open source AI gateway with over 20,000 GitHub stars — were live on PyPI carrying a credential-stealing payload. Versions 1.82.7 and 1.82.8 got pulled once discovered, but not before the malware had a window to run on anyone who installed during it.
The payload wasn't crude. According to Kaspersky's writeup, it went after AWS, Kubernetes,
email, database, and WireGuard configs, SSH keys, .env files, Terraform and Helm state, TLS
certificates, crypto wallet configs, and Slack and Discord webhooks — then, separately, queried
the cloud metadata service directly (169.254.169.254) to pull the temporary IAM credentials
your AWS instance issues itself at runtime, which don't live in a file at all. Everything it
found got AES-encrypted, bundled into an archive, and shipped to a server the attackers
controlled, with the AES key itself wrapped in RSA so only they could open it.
LiteLLM's own postmortem traces the entry point to their build pipeline, not their code: Trivy, the security scanner they ran in CI to catch exactly this kind of thing, had itself been compromised upstream. The tool meant to catch supply chain attacks became the supply chain attack.
Here's the detail that matters for what I'm about to argue, and I'm not going to leave it out
just because it's inconvenient: LiteLLM states plainly that customers running their official
Docker image were never exposed, because that image pins dependency versions. The people actually
at risk were the ones who ran pip install litellm without a version pin during a forty-minute
window. That's a narrower blast radius than "an AI gateway got hacked" suggests, and pretending
otherwise would be exactly the kind of overstatement I'd criticize in someone else's launch post.
What it's still evidence of
The incident doesn't prove gateways get backdoored routinely. It proves something narrower and,
for this argument, still sufficient: a gateway is a separate running service, built from its own
dependency tree, that every one of your LLM calls has to pass through — and every one of those
dependencies is one more thing that can be compromised on a timeline you don't control. The
malware in March didn't target LiteLLM's rate-limiting logic. It targeted the fact that a gateway
process, by the nature of being a gateway, ends up holding your cloud credentials, your SSH keys,
your .env file, because that's what a service running in your infrastructure has access to.
RateGuard doesn't have that access, structurally, because it isn't a service. rg.WrapClient(&http.Client{})
runs inside your own process, using the same credentials your app already has, over a connection
your app already owns. There's no separate service to compromise, no separate credential store
for it to hold, no separate network hop for a prompt to leak across. That's not a security
guarantee — RateGuard is a package pulled from go get/npm install/pip install same as
anything else, and packages can be compromised too, LiteLLM's CI pipeline is the proof of that.
What it removes is the specific exposure a gateway adds on top: a second service, holding your
secrets, in the path of every call.
When you still want a gateway
I'm not going to pretend RateGuard replaces what a real gateway is for. If you're routing traffic from forty microservices across three clouds and you want one team to own budgets and fallback centrally, a gateway is the right shape for that job — you want the aggregation point. If you need to normalize wildly different provider APIs into one schema for a platform team that never touches raw SDKs, that's gateway territory too. RateGuard is for the far more common case: one application, one team, that wants its own LLM calls budgeted and observable without standing up infrastructure to do it. Different shape, different job.
Next post: the meta one, about building this mostly by directing AI agents, and the morning two of them edited the same file at once.
github.com/varbees/rateguard · rateguard.antharmaya.com/docs
