Airbnb Says AI Writes 60% of Their Code. Here Is Why That Is the Wrong Metric.

Airbnb announced in their May earnings that AI now writes 60% of their new code. The number is real. The interpretation most people are running with is wrong.

Every engineering leader we have talked to in the last two weeks has surfaced the same anxious question. "Are we behind? Should our number be that high?" The premise is broken. Percentage of lines written by AI is to engineering what lines of code per day was in the 1990s. A metric that sounds quantitative but measures the wrong thing.

Why the metric breaks

Code is cheap. Decisions are expensive. The cost of a software project is not in keystrokes. It is in the choices. Which library to adopt, how to structure the data, where to place the abstraction boundary, what to defer and what to lock in. A model can generate 200 lines of correct Python in 30 seconds. The real question is whether those 200 lines should exist in your codebase at all.

So when Airbnb says "AI wrote 60%", the only honest read is that 60% of the keystrokes are AI generated. The 40% that humans wrote is overwhelmingly the high leverage 40%. It is the integration glue, the cross cutting concerns, the unusual case the autocomplete missed, the part that needed someone to sit and think.

The 40% is the only number that matters

What was the human written 40% actually doing? In our shop, three buckets.

Decisions. Naming things. Choosing data shapes. Picking the boundary between two services. Deciding what to test and what to mock. None of these are typing heavy. All of them are unbillable as "lines of code" and unrepresented in any percent of AI metric.

Integration. Wiring the new code into the rest of the system. Updating the surrounding callers. Making sure the deployment configuration knows about the new service. AI generates the new function easily. The human spends 90 minutes ensuring it fits.

Review. Reading the AI generated 60% with enough attention to catch the subtle wrong thing. A model will confidently produce code that compiles, passes its own tests, and is wrong in a way you only catch by understanding what the system is supposed to do. The reviewing is the work.

What we actually track

Two things at our studio.

Time to merged feature. For a given scoped feature, how many wall clock days from kickoff to production. This is the real productivity question. A team that ships a feature in three days using AI to write 80% of the code is more productive than a team that ships in five days writing 100% by hand. It is also more productive than a team that ships in three weeks writing 95% with AI and rewriting most of it.

Defect rate. What fraction of merged code triggers a follow up bug fix within two weeks. This catches the failure mode where AI assistance speeds up the writing but slows down everything downstream because the code is subtly wrong.

These two numbers, tracked over a quarter, tell you whether your AI integration is actually working. The percent of code number tells you nothing.

What Airbnb's real story probably is

Companies the size of Airbnb that integrate AI well tend to see 20 to 40% reduction in time to merged feature on greenfield work, less on maintenance. The reduction is concentrated in initial scaffolding and refactoring at scale. They see roughly stable defect rates if their review culture is strong, and rising defect rates if it is not.

Those are the numbers worth comparing yourselves to. If your AI percentage is 30% and your features ship in half the time they used to, you are winning. If your AI percentage is 80% and your features take longer because you spend three days fixing what the model generated, you are losing. The Airbnb number tells you nothing about which side of that line you are on.

How to reframe the conversation

When someone asks if you are using AI "enough", reframe. The question is not how much code AI writes. It is whether your engineers are spending more time on decisions and less time on typing. If yes, you are using it right, regardless of the percentage. If no, you have a workflow problem that the percentage will not fix.

Airbnb's number is a headline. Your team's time to merge is the metric. Track the right thing.

Airbnb says AI writes 60% of their code. The 40% is the only number that matters.