Anthropic's Claude Sonnet series has rapidly evolved, with Sonnet 4.6 marking the latest milestone released just weeks after Opus 4.6. Building on predecessors like Sonnet 4.5—which scored 77.2% on SWE-Bench—this model delivers a full upgrade in coding, computer use, and agentic tasks, becoming the default for Free and Pro users on Claude.ai. Developers now prefer it over Sonnet 4.5 by 70% and even Opus 4.5 by 59% in early Claude Code tests.
Key features include a 1M token context window in beta—enabling analysis of entire codebases or hundreds of documents—alongside adaptive and extended thinking modes. It supports web search with sandboxed code execution, persistent memory, programmatic tool calling (now generally available), and Opus-level prompt injection resistance. Pricing stays at $3 per million input tokens and $15 per million output tokens (up to 200K tokens), making high-end capabilities affordable without changes from Sonnet 4.5.
On benchmarks, Sonnet 4.6 scores 79.6% on SWE-Bench Verified (coding tasks), nearly matching Opus 4.6's 80.8%; 72.5% on OSWorld (computer use), just behind Opus at 72.7%; 74.1% on GPQA Diamond (scientific reasoning); and 89% on math evals — a 27-point jump from Sonnet 4.5. It matches Opus on OfficeQA for document analysis and excels in Vending-Bench for long-horizon planning.
The road ahead involves broader enterprise adoption via Amazon Bedrock and Google Vertex AI, with potential expansions in agent teams and multi-modal tasks, though Opus retains edges in deep reasoning like GPQA. Lower pricing slashes deployment costs by 80% versus Opus for coding/agentic work, fueling volume in the agentic AI boom against cheap rivals.
DEMOCRATIZING FRONTIER AI FOR EVERY DEVELOPER, NOT JUST THE ELITE.
