Epoch’s AI Companies Dataset is a collection of key economic data on some of the most important frontier AI companies, including data on revenue, funding, staff counts, compute spending, and usage.
This documentation describes which companies are included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.
The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.
If you would like to ask any questions about the data, or suggest companies that should be added, feel free to contact us at data@epoch.ai.
If this dataset is useful for you, please cite it.
Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
For the initial phase of the AI Companies Dataset, we focus on foundation model developers, or AI developers for whom training their own models is a core business priority.
There are currently no formal criteria for selection: we prioritized companies if their models are near the frontier in general-purpose AI capabilities, or if they are among the most commercially significant AI companies.
Other major types of AI companies include: AI application developers that use third-party models (e.g. Anysphere and Perplexity), cloud compute companies (e.g. Microsoft and Amazon), semiconductor companies (e.g. NVIDIA and TSMC), and AI data vendors (e.g. Scale AI and Mercor).
The AI companies database contains key economic data about many frontier AI companies. Records in the database have information covering the following:
Financials including their revenue, funding and valuations, and spending
Operations such as staff counts and product usage
Metadata such as notes on the above fields with supporting evidence and context, our confidence in the key figures, etc.
We provide a comprehensive guide to the database’s fields below. This includes example field values as reference, which are taken from OpenAI unless otherwise indicated. Note that for each company, we collected rich time-series data on the company’s revenue, valuation, etc. We store these as individual records in separate tables, detailed below.
If you would like to ask any questions about the database, or request a field that should be added, feel free to contact us at data@epoch.ai.
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
Download the AI Companies dataset as individual CSV files for specific data types, or as a complete package containing all datasets.
This data was collected by Epoch AI’s employees and collaborators, including John Croxton, Josh You, Venkat Somala, and Yafah Edelman.
This documentation was written by Josh You and Venkat Somala.