Archive - Epoch AI

Do the biorisk evaluations of AI labs actually measure the risk of developing bioweapons?

With the recent release of Claude Opus 4, Anthropic activated their AI Safety Level 3 protections.

Jun 14 •

and

Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning

Examining o3-mini’s math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.

Jun 8 •

,

, and

May 2025

GPQA Diamond: What’s Left?

Investigate GPQA Diamond benchmark’s validity: uncover flawed questions, model challenges, and why it still informs AI evaluation.

May 30 •

Is AI already superhuman on FrontierMath?

How do humans and AIs compare on FrontierMath? We ran a competition at MIT to put this to the test.

May 23 •

How Fast Can Algorithms Advance Capabilities?

This week’s issue is a guest post by Henry Josephson, who is a research manager at UChicago’s XLab and an AI governance intern at Google DeepMind.

May 16

How far can reasoning models scale?

Available evidence suggests that rapid growth in reasoning training can continue for a year or so.

May 9 •

Where’s my ten minute AGI?

Why don’t AIs automate more real-world tasks if they can handle 1-hour ones? Anson Ho explores key capability and context bottlenecks.

May 4 •

April 2025

The case for multi-decade AI timelines

In this Gradient Updates weekly issue, Ege discusses the case for multi-decade AI timelines.

Apr 26 •

The Epoch AI Brief - April 2025

GATE model, “Train Once, Deploy Many”, data insights, hiring, and more.

Apr 17 •

Is it 3 Years, or 3 Decades Away?

Disagreements on AGI Timelines

Apr 3 •

and

Matthew Barnett

March 2025

The real reason AI benchmarks haven’t reflected economic impacts

The real reason that AI benchmarks haven’t reflected real-world impacts historically is that they weren’t optimized for this, not because of fundamental…

Mar 28 •

and

Most AI value will come from broad automation, not from R&D

AI's biggest impact will come from broad labor automation—not R&D—driving economic growth through scale, not scientific breakthroughs.

Mar 21 •

and

Matthew Barnett

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts