← Field notes

Open source

Why RateGuard runs inside your app instead of in front of it

Every AI gateway sees 100% of your traffic and holds your provider credentials to do it. In March 2026, two compromised LiteLLM releases showed what that costs when the gateway itself gets hit — even though the actual damage was narrower than the headlines made it sound.

Harsha Beesabathina7 July 20264 min read
A monumental open gateway glowing cold green beside a small precise doorway glowing warm amber, both set in a dark wall.

Almost every AI cost-control tool on the market works the same way: your app talks to a gateway, the gateway talks to OpenAI or Anthropic, and every prompt, every completion, every API key passes through a server somebody else runs. That's not a criticism — routing through a shared choke point is the obvious way to add budgets, logging, and fallback across a whole organization's traffic. It's also the design that turns the gateway into the single thing that, if compromised, sees everything.

In March 2026, that stopped being theoretical.

What actually happened to LiteLLM

For about forty minutes on March 24, two releases of LiteLLM — a widely used open source AI gateway with over 20,000 GitHub stars — were live on PyPI carrying a credential-stealing payload. Versions 1.82.7 and 1.82.8 got pulled once discovered, but not before the malware had a window to run on anyone who installed during it.

The payload wasn't crude. According to Kaspersky's writeup, it went after AWS, Kubernetes, email, database, and WireGuard configs, SSH keys, .env files, Terraform and Helm state, TLS certificates, crypto wallet configs, and Slack and Discord webhooks — then, separately, queried the cloud metadata service directly (169.254.169.254) to pull the temporary IAM credentials your AWS instance issues itself at runtime, which don't live in a file at all. Everything it found got AES-encrypted, bundled into an archive, and shipped to a server the attackers controlled, with the AES key itself wrapped in RSA so only they could open it.

LiteLLM's own postmortem traces the entry point to their build pipeline, not their code: Trivy, the security scanner they ran in CI to catch exactly this kind of thing, had itself been compromised upstream. The tool meant to catch supply chain attacks became the supply chain attack.

Here's the detail that matters for what I'm about to argue, and I'm not going to leave it out just because it's inconvenient: LiteLLM states plainly that customers running their official Docker image were never exposed, because that image pins dependency versions. The people actually at risk were the ones who ran pip install litellm without a version pin during a forty-minute window. That's a narrower blast radius than "an AI gateway got hacked" suggests, and pretending otherwise would be exactly the kind of overstatement I'd criticize in someone else's launch post.

What it's still evidence of

The incident doesn't prove gateways get backdoored routinely. It proves something narrower and, for this argument, still sufficient: a gateway is a separate running service, built from its own dependency tree, that every one of your LLM calls has to pass through — and every one of those dependencies is one more thing that can be compromised on a timeline you don't control. The malware in March didn't target LiteLLM's rate-limiting logic. It targeted the fact that a gateway process, by the nature of being a gateway, ends up holding your cloud credentials, your SSH keys, your .env file, because that's what a service running in your infrastructure has access to.

RateGuard doesn't have that access, structurally, because it isn't a service. rg.WrapClient(&http.Client{}) runs inside your own process, using the same credentials your app already has, over a connection your app already owns. There's no separate service to compromise, no separate credential store for it to hold, no separate network hop for a prompt to leak across. That's not a security guarantee — RateGuard is a package pulled from go get/npm install/pip install same as anything else, and packages can be compromised too, LiteLLM's CI pipeline is the proof of that. What it removes is the specific exposure a gateway adds on top: a second service, holding your secrets, in the path of every call.

When you still want a gateway

I'm not going to pretend RateGuard replaces what a real gateway is for. If you're routing traffic from forty microservices across three clouds and you want one team to own budgets and fallback centrally, a gateway is the right shape for that job — you want the aggregation point. If you need to normalize wildly different provider APIs into one schema for a platform team that never touches raw SDKs, that's gateway territory too. RateGuard is for the far more common case: one application, one team, that wants its own LLM calls budgeted and observable without standing up infrastructure to do it. Different shape, different job.

Next post: the meta one, about building this mostly by directing AI agents, and the morning two of them edited the same file at once.

github.com/varbees/rateguard · rateguard.antharmaya.com/docs

The product behind the note

PhotoSelect is live and taking payments.

Delivery and payment software for Indian wedding studios — galleries that work through bad venue networks, originals that unlock the moment a UPI payment clears, and zero commission on what your clients pay.

See PhotoSelect →