When we set the time-to-engineer target in seconds, the engineering team disagreed about whether it was a goal or a tagline. After a thousand sessions in the wild, the median is 74 seconds. Here’s how we got there, and what broke along the way.
The press itselfis cheap. The hard work happens between the press and the moment an engineer types their first line into the customer’s session. Three steps, in order: classify the session, find the engineer, hand them the context. We measured each.
Classify (3–8s).When you press, we look at the AI tool you’re in, the language and framework, the size and recency of the diff, the error in the terminal if there is one, and any structured signals the integration provides (Cursor sends us project metadata, Lovable sends a snapshot URL, Claude sends the conversation handle). A small classifier turns this into a five-token routing tag. We rebuild this classifier monthly; it’s the part of the system that gets meaningfully better the more presses we see.
Match (12–28s).The bench is sharded by routing tag. We don’t broadcast, we use a priority queue per shard, with a fairness term so no engineer gets buried. The queue is shallow by design; if it’s ever deeper than three we page an on-call to hire faster. The thing we learned the hard way: latency in the match step is dominated not by finding an available engineer but by their accept-the-handoff round trip. We dropped that from 11s to 4s by pre-loading customer context into the engineer’s desktop the moment we route, before they accept.
Hand off (35–55s).The single biggest variable. If the integration is rich (Claude, Cursor, Replit) we’re landing an engineer with the customer’s repo open, the relevant file selected, and the last 20 turns of the AI conversation summarized in a side panel. If the integration is light (a copy-paste from a chat) the engineer is essentially walking into a room cold; we slow the customer down with a 30-second “tell me what you’re trying to do” that turns out to be the best thing we’ve added.
The interesting bug: in the first month, our median was 106 seconds, and we couldn’t explain a 12-second hump in the distribution. It turned out to be Slack, the engineers’ on-call notifications were going through Slack mobile push, which is not built for sub-second delivery. We replaced it with a desktop-native nudge and the hump disappeared the next day.
The system fights for every second, and we still think we can take ten more out of the median before the end of the year. The point is not that 90 is magic. It’s that the press is only useful if the answer arrives inside the same attention span the question lived in.
Posted from the engineering team. Comments and questions welcome at support@relay.green.