GPT-5.5 vs Mythos — OpenAI Just Called Anthropic's Bet on Calibrated Capability

OpenAI shipped GPT-5.5 on April 23, narrowly beating Anthropic’s Claude Mythos Preview on a flagship benchmark and putting the capability behind a $20 ChatGPT Plus subscription. The release lands seven days after Anthropic locked Mythos behind a Cyber Verification Program. The two labs are running explicit counter-plays on the same release week.

Two weeks ago we wrote that Anthropic’s real bet wasn’t Opus 4.7 — it was the model the company chose not to ship widely. OpenAI just called that bet. GPT-5.5 is, by OpenAI’s own benchmarks, in the same capability class as Mythos. By Anthropic’s release framework, it shouldn’t be available to anyone but verified security professionals. By OpenAI’s, it’s available to anyone with a credit card.

What OpenAI actually shipped

GPT-5.5 launched on April 23 in two variants — standard and Pro — rolling out to ChatGPT Plus, Pro, Business, and Enterprise users in both ChatGPT and Codex. The model carries the internal codename “Spud,” according to VentureBeat’s reporting, and OpenAI says nearly 200 trusted early-access partners helped pressure-test it before launch.

OpenAI GPT-5.5 launch art card — GPT-5.5, from OpenAI’s release post.

The benchmark gains are concrete. GPT-5.5 hits 82.7% on Terminal-Bench 2.0 (vs 69.4% for Opus 4.7 and 75.1% for the prior GPT-5.4). Expert-SWE, OpenAI’s internal coding benchmark for tasks with median 20-hour human completion times, jumps from 68.5% to 73.1%. FrontierMath Tier 4 nearly doubles to 35.4%. CyberGym — the cybersecurity-capability eval — reaches 81.8%, ahead of Opus 4.7’s 73.1%. Crucially, GPT-5.5 matches GPT-5.4’s per-token latency despite being more capable, achieved through hardware-software co-design on NVIDIA GB200 and GB300 NVL72 systems with custom partitioning algorithms the model wrote itself.

One important caveat: API access is not available yet. OpenAI says it’s “coming very soon” but that “API deployments require different safeguards.” That’s a meaningful split — the consumer surface ships first, the developer surface waits. Enterprise integrators looking to build on GPT-5.5 don’t have the door open yet.

The Mythos comparison is the actual story

The benchmarks-against-Opus-4.7 framing is what most outlets led with. The sharper read is the Mythos comparison. CNBC framed the launch as arriving “just weeks after Anthropic unveiled Claude Mythos Preview” — the model Anthropic deemed too dangerous for broad release because of its cybersecurity capabilities. OpenAI claims GPT-5.5 narrowly beats Mythos on Terminal-Bench 2.0, essentially a statistical tie on a benchmark Mythos was supposed to dominate.

The most pointed reaction came from Albert Ziegler at security firm Xbow, who tested GPT-5.5 against software vulnerabilities using Xbow’s internal benchmarks. The model missed 10% of vulnerabilities — down from 40% on GPT-5 and 18% on Anthropic’s Opus 4.6. Ziegler called it “Mythos-like hacking, open to all,” a direct rebuke of Anthropic’s gated-cyber strategy. Hacker News skeptics pushed back that the comparison is hard to verify because Mythos isn’t publicly available, and other researchers have argued that smaller open-weight models can already reproduce much of what Anthropic showed in Mythos’s own demos.

This is where it stops being a benchmark story and starts being a strategy story. The premise of Anthropic’s calibrated-capability play was that the gap between “what we trained” and “what we ship” was a moat. OpenAI just made a credible claim to have closed the gap from the other direction — with no verification program, no gated access, just a $20 subscription.

The two-track race, more clearly

We’ve been writing for weeks that the AI race bifurcated — OpenAI optimizing the consumer surface, Anthropic optimizing the enterprise stack. The 70% enterprise win-rate shift we covered last week validated half of that thesis. GPT-5.5 lands the other half. OpenAI is making a credible case that consumer reach — ChatGPT’s 800-million-user surface — is its own form of moat, and that putting frontier capability on the cheapest paid tier is how you defend it.

The trade-off is visible in what each lab won’t do. Anthropic won’t ship Mythos-class capability without a verification program. OpenAI won’t restrict GPT-5.5 below the API safeguards layer. Both are coherent. Both are bets about which constraint matters more. Anthropic is betting that enterprise buyers and regulators will reward the lab that imposes friction on capability when the capability is dangerous. OpenAI is betting that consumer mindshare and developer adoption compound faster than the safety-shaped backlash.

The benchmark scoreboard makes it look like OpenAI took the lead this week. The procurement scoreboard from last week makes it look like Anthropic still holds enterprise. They aren’t the same scoreboard.

What this means if you’re building on it

For consumer-app developers and individual professionals, GPT-5.5 is now the strongest model anyone with $20 can use. The agentic coding gains in particular — 73.1% on long-horizon tasks, fewer tokens used per Codex job — will compress how long it takes to ship a personal project, a side hustle, or a small-team product. The latency parity with GPT-5.4 means you don’t pay a speed tax to get the upgrade.

For enterprise integrators, the lack of API access is the load-bearing detail. Until OpenAI ships GPT-5.5 to the API, anyone building production workflows can’t actually deploy it — they’re stuck on GPT-5.4 (still excellent) or making the case to switch to Anthropic’s Claude Opus 4.7, which has full API availability across AWS Bedrock, Vertex AI, and Microsoft Foundry. That timing gap matters. Enterprise procurement cycles don’t pause for “coming very soon.” Anthropic now has at least a few weeks of unimpeded selling on a model that, by benchmarks alone, is no longer the most capable available.

For security teams in particular, Ziegler’s framing — “Mythos-like hacking, open to all” — is a recruiting pitch and a warning at once. If GPT-5.5 really does close the cyber gap to Mythos, every red team and every adversary is now operating with the same baseline. That’s a structural shift, not a feature update.

§ Our take

Both labs are right. For their respective customers. OpenAI is right that consumer reach + ChatGPT’s 800-million surface is the kind of distribution moat that lets you ship aggressively and learn fast. Anthropic is right that enterprise buyers in 2026 weight release discipline more than headline capability and that the bet on calibrated capability is durable across a multi-year procurement horizon.

The uncomfortable read — the one we keep landing on — is that the AI race isn’t one race anymore. It’s two. OpenAI is winning the consumer-and-startup half of it. Anthropic is winning the enterprise-and-regulated half. The benchmarks GPT-5.5 just posted don’t move enterprise wins. The 70% enterprise win rate Anthropic just posted doesn’t move ChatGPT signups.

Watch the API release date for GPT-5.5. The day OpenAI ships Mythos-class capability to developers without a verification program is the day Anthropic’s calibrated-capability moat takes its first real hit. Until then, both bets stay alive.

What OpenAI actually shipped

The Mythos comparison is the actual story

The two-track race, more clearly

What this means if you’re building on it

Humberto J. Cuebas

More from the Journal

Anthropic's Quiet 70% — The AI Race Stopped Being About Benchmarks

AI Weekly Updates (Apr 15 – Apr 21, 2026)

Claude Design Changes Who Gets to Design — and What Designers Do Next