AI Companies Documentation

Show sidebarOverview

Overview

Epoch’s AI Companies Dataset is a collection of key economic data on some of the most important frontier AI companies, including data on revenue, funding, staff counts, compute spending, and usage.

This documentation describes which companies are included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.

The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the data, or suggest companies that should be added, feel free to contact us at data@epoch.ai.

If this dataset is useful for you, please cite it.

Use This Work

Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Citation

Josh You, John Croxton, Venkat Somala, Yafah Edelman, 'Data on AI Companies'. Published online at epoch.ai. Retrieved from 'https://epoch.ai/data/ai-companies-documentation' [online resource]. Accessed 21 Mar 2026.

BibTeX Citation

@misc{EpochAICompanies2025, title = {Data on AI Companies}, author = {{Josh You, John Croxton, Venkat Somala, Yafah Edelman}}, year = {2025}, month = {9}, url = {https://epoch.ai/data/ai-companies-documentation}, note = {Accessed: 21 Mar 2026} }

Inclusion

For the initial phase of the AI Companies Dataset, we focus on foundation model developers, or AI developers for whom training their own models is a core business priority.

There are currently no formal criteria for selection: we prioritized companies if their models are near the frontier in general-purpose AI capabilities, or if they are among the most commercially significant AI companies.

Other major types of AI companies include: AI application developers that use third-party models (e.g. Anysphere and Perplexity), cloud compute companies (e.g. Microsoft and Amazon), semiconductor companies (e.g. NVIDIA and TSMC), and AI data vendors (e.g. Scale AI and Mercor).

Records

The AI companies database contains key economic data about many frontier AI companies. Records in the database have information covering the following:

Financials including their revenue, funding and valuations, and spending

Operations such as staff counts and product usage

Metadata such as notes on the above fields with supporting evidence and context, our confidence in the key figures, etc.

Fields

We provide a comprehensive guide to the database’s fields below. This includes example field values as reference, which are taken from OpenAI unless otherwise indicated. Note that for each company, we collected rich time-series data on the company’s revenue, valuation, etc. We store these as individual records in separate tables, detailed below.

If you would like to ask any questions about the database, or request a field that should be added, feel free to contact us at data@epoch.ai.

Companies

ColumnTypeDefinitionExample valueCoverage

Revenue reports

ColumnTypeDefinitionExample valueCoverage

Funding rounds

ColumnTypeDefinitionExample valueCoverage

Staff counts

ColumnTypeDefinitionExample valueCoverage

Usage reports

ColumnTypeDefinitionExample valueCoverage

Compute spend

ColumnTypeDefinitionExample valueCoverage

Downloads

Download the AI Companies dataset as individual CSV files for specific data types, or as a complete package containing all datasets.

Acknowledgements

This data was collected by Epoch AI’s employees and collaborators, including John Croxton, Josh You, Venkat Somala, and Yafah Edelman.

This documentation was written by Josh You and Venkat Somala.