US models currently outperform non-US models

By Jean-Stanislas Denain

The best US models have consistently higher accuracies than the best non-US models on GPQA Diamond and MATH Level 5. For example, on GPQA Diamond the best-performing model is OpenAI’s o1, while on MATH Level 5 the leading model is o3-mini.

Benchmark

However, with the release of DeepSeek-R1 in January 2025, the gap between US and non-US models has reduced substantially: DeepSeek-R1 trails behind o3-mini by only 2 percentage points on MATH Level 5, and scores only 4 percentage points lower than o1 on GPQA Diamond.

Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.

Explore this data

AI Capabilities

Benchmark results featuring the performance of leading AI models on challenging tasks.

Research & Commentary

More

Datasets

Benchmarking Data

By Epoch AI

US models currently outperform non-US models

Explore this data

Research & Commentary

More

Datasets

Benchmarking Data

By Epoch AI

AI Trends & Statistics

Papers & Reports

Newsletter: Gradient Updates

Data Insights

Podcast: Epoch After Hours

Models

Frontier Data Centers

Hardware

Companies

Chip Sales

Polling on Usage

AI Capabilities

FrontierMath

Explore this data

Related topics