Equitable representation of human diversity in cell atlases is paramount for reducing health disparities and ensuring the broad applicability of medical research and healthcare interventions. However, genomics research has historically suffered from an overrepresentation of Northern European ancestries (e.g., ~94% of GWAS participants). Achieving the full potential of single-cell atlas initiatives to provide extensive references of cells in the human body necessitates the inclusion of diverse donor cohorts.
A critical barrier has been the infrequent reporting of genetic ancestry in single-cell studies, as the data conventionally used for ancestry inference (whole genome sequencing or genotype arrays) are often unavailable. To overcome this limitation, we developed scAncestry, a novel computational tool that accurately estimates genetic ancestry using genetic variants called from single-cell sequencing reads (r=0.995 compared with whole genome sequencing). Notably, scAncestry identifies 18-fold more genetic variants compared to previous methods, yielding robust concordance with gold-standard ancestry estimations (R² = 1).
Applying scAncestry to 15,185 samples from four major single-cell atlas consortia––the Human Cell Atlas, CZ CELLXGENE, HuBMAP, and the Human Tissue Atlas Network––revealed a large proportion of European-like ancestries (77%), with substantial underrepresentation of samples with African-like (8%), Oceanian-like (0.01%), South Asian-like (1%), and Indigenous American-like (3%) ancestries. This underscores critical gaps in donor diversity, presenting opportunities for targeted efforts to enhance representation. Furthermore, by integrating other crucial demographic metadata (sampling site, sex, age, and disease status), we reveal the impact of human diversity on cellular profile variation.
Beyond ancestry inference, this unprecedented catalogue of genetic variants from 15,185 single-cell samples establishes a transformative resource for downstream investigations, including genetic regulation, cell type proportions, and the genetic underpinnings of disease. Collectively, our work highlights opportunities to increase representation across single-cell atlases and lays the groundwork for uncovering variability across ancestries that will lead to equitable medical interventions.