LLM SLOs in Production: Latency, Quality, Cost, and Availability Targets That Actually Move Decisions
Introduction The first time I argued for LLM SLOs at our weekly platform review, the head of product told me the number I wanted to track was "user happiness." I laughed politely and asked how we measured user happiness today. He said the customer-success team had a feeling. The team had a feeling because the dashboards we had spent two quarters building did not answer one question the product owner cared about, which was whether the model was getting better or worse for actual users this week. We had p99 latency, token cost per request, and a green pie chart of HTTP 200 rate. None of those moved when the model regressed. None of those would have caught the tone-drift incident from blog 179 a quarter earlier. None of those gave a CTO a number to put in a board update. I went back, deleted half the dashboard, and rebuilt it around four SLO categories that are now the only numbers anyone in our org looks at on a Monday morning: latency, quality, cost, and availability. ...