JUN 16, 2026AI Insights Hubanthropic.com

Anthropic details Claude 3.5 Sonnet benchmark results surpassing Claude 3 Opus on coding and reasoning

Anthropic published an evaluation report for Claude 3.5 Sonnet showing improved performance over Claude 3 Opus on coding, math, and reasoning benchmarks such as GSM8K and HumanEval, while being faster and cheaper to run in production.

Read original at anthropic.com