Subscribe
Sign in
Home
Gradient Updates
The Epoch Brief
Epoch After Hours
Archive
About
Latest
Top
Do the biorisk evaluations of AI labs actually measure the risk of developing bioweapons?
With the recent release of Claude Opus 4, Anthropic activated their AI Safety Level 3 protections.
Jun 14
•
Anson Ho
and
Arden Berg
Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning
Examining o3-mini’s math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.
Jun 8
•
Anson Ho
,
JS Denain
, and
Elliot Glazer
May 2025
GPQA Diamond: What’s Left?
Investigate GPQA Diamond benchmark’s validity: uncover flawed questions, model challenges, and why it still informs AI evaluation.
May 30
•
Greg Burnham
Is AI already superhuman on FrontierMath?
How do humans and AIs compare on FrontierMath? We ran a competition at MIT to put this to the test.
May 23
•
Anson Ho
How Fast Can Algorithms Advance Capabilities?
This week’s issue is a guest post by Henry Josephson, who is a research manager at UChicago’s XLab and an AI governance intern at Google DeepMind.
May 16
How far can reasoning models scale?
Available evidence suggests that rapid growth in reasoning training can continue for a year or so.
May 9
•
Josh You
Where’s my ten minute AGI?
Why don’t AIs automate more real-world tasks if they can handle 1-hour ones? Anson Ho explores key capability and context bottlenecks.
May 4
•
Anson Ho
April 2025
The case for multi-decade AI timelines
In this Gradient Updates weekly issue, Ege discusses the case for multi-decade AI timelines.
Apr 26
•
Ege Erdil
The Epoch AI Brief - April 2025
GATE model, “Train Once, Deploy Many”, data insights, hiring, and more.
Apr 17
•
Epoch AI
Is it 3 Years, or 3 Decades Away?
Disagreements on AGI Timelines
Apr 3
•
Ege Erdil
and
Matthew Barnett
March 2025
The real reason AI benchmarks haven’t reflected economic impacts
The real reason that AI benchmarks haven’t reflected real-world impacts historically is that they weren’t optimized for this, not because of fundamental…
Mar 28
•
Anson Ho
and
JS Denain
Most AI value will come from broad automation, not from R&D
AI's biggest impact will come from broad labor automation—not R&D—driving economic growth through scale, not scientific breakthroughs.
Mar 21
•
Ege Erdil
and
Matthew Barnett
1
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts