Three Weeks With Claude Opus 4.7: What Changed in Practice

Anthropic shipped Claude Opus 4.7 on April 16. We have been using it daily for client work for about three weeks. The release notes and marketing copy already exist, so this is just what changed in practice for a small studio that ships production code.

What actually got better

A few things, in roughly the order they changed our day to day.

Vision is finally useful. 4.6 could read screenshots well enough to caption them. 4.7 can read a UI screenshot and tell you which button is misaligned, which input has the wrong padding, what the visual hierarchy is doing wrong. We have stopped writing "the third button on the right" in design feedback prompts. We paste the screenshot and ask. The image resolution limit also went up, which means full page screenshots no longer lose the small text.

Long task consistency improved a lot. Our agent loops used to drift after maybe 30 tool calls. The character of the responses would shift toward more verbose, more hedging, more "let me clarify what we are doing." 4.7 holds character through much longer runs. We have had 8 hour migration jobs that produced output as crisp at hour seven as at hour one. For anyone running agents, this alone is worth the upgrade.

Instruction precision is the third one. If you tell 4.6 "do not add comments unless the WHY is non obvious", it follows about 70% of the time, with the other 30% being explanatory prose. 4.7 follows more like 95%. The remaining 5% is when the instruction itself was ambiguous, which is on us.

What did not change

A few things we hoped would improve and did not.

Architecture decisions. Still a humans only conversation. 4.7 will confidently recommend whatever pattern is trending in its training data, regardless of your team's strengths, your existing stack, or your two year roadmap. It will tell you to migrate to event sourcing because event sourcing is currently fashionable in the engineering blogs it has read. The pattern recognition is good. The judgment is not.

Knowing when to stop. Ask 4.7 to refactor a function and you get a refactored function. Ask it to refactor a function and clean up the surrounding file and you get a rewritten module. The threshold between helpful additional fix and scope creep we did not ask for has not shifted noticeably. We still review every diff.

Cost aware reasoning. 4.7 happily suggests adding a Redis cluster to fix a problem a query plan tweak would have solved. It does not weigh operational complexity against immediate solve. You still need a human to push back with "is the added infrastructure worth it for this gain."

The Mythos question

The most interesting line in the 4.7 announcement was the one Anthropic almost buried. They publicly conceded that 4.7 does not match the performance of a separate, unreleased model internally called Mythos. The reason given was safety. Mythos can do things they are not comfortable shipping to the public yet.

This is a new posture for frontier model labs. They have historically not advertised what they were holding back. The fact that Anthropic is saying it out loud is a signal. The gap between what frontier labs ship and what they have running internally is now large enough to be worth disclosing.

What this means for shops like ours is that the model you are integrating with is not the most capable model that exists. Plan your AI roadmap accordingly. The assistant you ship today will look conservative in twelve months, not because the public API got better but because what the public API gives you access to expanded.

How we think about it now

Treat each Opus release as a step change in two specific dimensions: instruction following and context length. Those are the two that compound across every workflow. Speed and cost matter at scale but do not change how you build. Vision and reasoning are useful for specific tasks but do not reshape your daily loop.

If you skipped 4.5 and 4.6 because the marginal gains did not justify the migration, 4.7 is the version to upgrade for. The instruction precision and long context consistency are the kind of improvement that you stop noticing as an improvement and just start expecting. Once you have used it for a week, going back to 4.6 feels like talking to a model that has not had its coffee.

For client work specifically, the longer agents stay coherent, the more we can hand off to them. We are now running overnight migration jobs we would not have trusted to 4.6. That is the practical headline.

Three weeks with Claude Opus 4.7: what changed, what did not

What actually got better

What did not change

The Mythos question

How we think about it now

Get the next post in your inbox.

Airbnb says AI writes 60% of their code. The 40% is the only number that matters.