Epoch AI’s benchmarking brings together results from many of the most informative AI benchmarks—both those we run ourselves and those reported by reputable external sources—into one consistent, searchable place.
AI capabilities are moving quickly, but results can be scattered and hard to compare. We track diverse, informative, and challenging benchmarks so that researchers, practitioners, and policymakers can see where the state of the art is, and how it is changing.
Our internal evaluations are powered by Inspect, and are run with consistent and well documented settings across models. External results come from official leaderboards or primary sources. For the full details (prompting, temperatures, implementations), see the FAQ below.
Epoch AI’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
This hub also includes data sourced from external projects, which retains its original licensing. Specifically:
Users are responsible for ensuring compliance with the respective license terms for the specific data they use. Appropriate credit should be given to the original sources as indicated.