Field note

RateGuard was built mostly by directing AI agents. Here's what that actually looked like.

Not a highlight reel — the actual mix: real infrastructure shipped and tested in one sitting, an image wired to the wrong blog post, a bug that turned out to be two agents editing the same file at once, and a statistic that almost made it into a launch post before the standing rule caught it first.

Harsha Beesabathina9 July 20264 min read

A human hand and a geometric robotic hand holding a glowing amber crystal between them at equal height.

RateGuard's core rate limiter, its outbound spend tracking, the MCP tools, the budget delegation system, the docs site, most of the blog posts including this one — I directed AI agents to write nearly all of it. I want to be specific about what "directing" meant in practice, because the honest version is more useful than the highlight reel, and it isn't the clean story of typing a prompt and getting a finished feature back.

What went right without me watching closely

The bulk of it, honestly. A lock-free rate limiter with a randomized test proving it makes identical decisions to the one it replaced. An AIMD controller for adaptive rate limiting, cited against the actual TCP congestion-control literature it's modeled on. A cryptographic budget delegation system using Ed25519 signatures, with narrowing enforced at the code level, not just described in a comment. Four runnable example programs, each one actually executed, its output read, before I'd call it done. That's the case where an agent with the right constraints does careful, tested, correctly-scoped work faster than I could write it myself.

What needed me watching closely

A hero image ended up on the wrong blog post. Two different posts had similar working titles — one about a context-injection tool, one about rate limiting — and the agent working on it matched a slug to a title it remembered from an old planning doc without opening the file to check what was actually in it. The image sat there, wrong, until the same agent stumbled onto it later while checking something unrelated, and fixed it without me having asked.

Separately: code for a feature — hero thumbnails on the writing index, next/previous navigation between posts — got written, deployed, and then never actually committed to git. It sat in the working directory the whole time. The live site had it. The repository didn't. That one also surfaced on its own, mid-session, not because I went looking for it.

There was also the morning two separate agent sessions were working the same repository at the same time without either one knowing about the other — one of them mine, one an orchestrator running independently. Both had noticed the same test was expecting the wrong number of MCP tools and were fixing it in parallel. From where I sat, it looked like a flaky test: the same file returning a different answer a few seconds apart. It wasn't flaky. It was two agents editing the same line while I was mid-investigation into what I thought was a timing bug.

And there was a moment I'd call the most important one, even though nothing broke: a launch post was about to state that a competitor's security incident affected "roughly 36% of cloud environments" — a specific-sounding number that had been carried forward through a few rounds of research notes. Before it went anywhere near a public post, the agent writing it stopped and checked the claim against the actual company's own account of the incident and two independent security write-ups, on its own, without being told to. Neither source contained that figure. The real scope was narrower than the number implied. The post got rewritten around what was actually verifiable, including the detail that cuts against the point it was trying to make.

The first three weren't the agents lying. They were the agents being confident about something they hadn't actually checked — the image, the commit, the file collision — which, if you've managed people who are good at their jobs, you'll recognize as a familiar failure mode, not a uniquely AI one. The difference is that an AI agent will state it with exactly the same confidence whether it's right or wrong, and won't flag the uncertainty unless something in its process forces it to. The fourth one, the statistic, is what it looks like when that same process forces the check before the mistake ships instead of after.

Why I'd still call this the right way to build

One person can't type fast enough to build a rate limiter, a docs site, and a cryptographic delegation protocol in the time it took to write this sentence. What I can do instead is set the standing rule the agents actually work under: run the example instead of trusting the summary that says it works, trace a statistic back to its source before it goes into anything with my name on it, and fix a mistake the moment it's found rather than after someone else notices it live. None of the four catches above happened because I was staring over anyone's shoulder in real time — they happened because that's the instruction, not an afterthought.

The gap between "an agent produced this" and "this is actually correct" doesn't close on its own. Somebody has to make checking the default, not the exception. That's the part that's actually my job here: not typing faster than the agents, but making sure confident-sounding work doesn't ship unverified.

RateGuard is open source, MIT-licensed, three SDKs, and every claim in these three posts was checked against something I could point to. github.com/varbees/rateguard · rateguard.antharmaya.com/docs

The product behind the note

PhotoSelect is live and taking payments.

Delivery and payment software for Indian wedding studios — galleries that work through bad venue networks, originals that unlock the moment a UPI payment clears, and zero commission on what your clients pay.

See PhotoSelect →