Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.
Learn more about this graph
One way of interpreting an Epoch Capabilities Index (ECI) score is by calculating the expected performance on individual benchmarks based only on the ECI score. We plot this relationship for GSM8K, OTIS Mock AIME 2024-2025, and FrontierMath Tier 1-3, three benchmarks that vary considerably in difficulty. We also show 90% confidence intervals as a shaded region.
Data
Analysis
Limitations
Explore this data
Benchmark results featuring the performance of leading AI models on challenging tasks.