What 'production AI' actually costs
2026-01-15 · 5 min read
Most of the AI work I see falls into two categories. The first looks like magic in a tweet thread. The second looks like an unending series of small, unglamorous engineering decisions. The first is what attracts founders. The second is what determines whether anything you build is still running six months from now.
A working demo of an LLM-powered feature can be assembled in a few hours. The hard work begins the moment you ask: what happens if the provider rate-limits us at 3 a.m. on a Saturday? What happens if a user sends an unbounded conversation history? What happens if the same user retries the same request because the network blipped? What happens if the output is plausible but wrong, and someone billed for it?
None of these questions are interesting in isolation. Together they are the entire reason production AI takes ten times longer than the demo suggested. The provider gets slow — you need a way to fall back, and a way to know you fell back. The history gets long — you need a strategy for what to keep and what to compress. The user retries — you need every operation to be idempotent, or to be safe to repeat. The output is wrong — you need to know which model produced it, on which prompt, with which context, and you need to be able to roll back the change that caused the regression without rolling back unrelated work.
This is the part of the work that doesn’t ship in a launch tweet. It’s also the part that decides whether you are a product team or a demo team.
What I’ve learned, building two of these things by myself, is that the discipline matters more than the model. The frontier model you picked last quarter will not be the one you ship on. The retrieval technique you obsessed over will look quaint in a year. The dependency on idempotency, replay safety, observability, and rollback paths — those are not going to look quaint. They are how you keep the system alive long enough to find out what’s actually worth doing.
A startup that ships an AI product without budgeting for the unsexy half of the work isn’t building a product. It’s hosting a demo with a payment form attached.