Equitable implementation of genomic medicine requires large genomic datasets representative of the whole community, but existing global genomic resources have substantial under-representation of many non-European ancestry groups. The OurDNA program is currently working with 10 genomically under-represented Australian communities to recruit up to 10,000 individuals, collecting blood samples to enable whole-genome sequencing and multi-omic analysis. Our team recently released the first version of the OurDNA browser, which builds on the Genome Aggregation Database (gnomAD) browser framework to display allele frequency data from 12,882 individuals from OurDNA and other large Australian population cohorts.
Analysis of genetic ancestry within these cohorts confirms the under-representation of Australians of many non-European genetic ancestries in existing cohorts. It also highlights an implicit tension: while we still need discrete categories to identify under-sampled ancestries and fill in the blanks of global genetic diversity, the concept of discrete genetic ancestry “boxes” are socially problematic and scientifically and clinically inadequate, especially considering the large proportion of Australians with complex genetic ancestries.
We find, for example, that compared to a self reported rate of 5-7% of ‘mixed ancestry’, that around 10% of individuals in Mackenzies Mission exomes (a part of the first OurDNA browser release) are not ‘neatly’ classified by a single genetic ancestry label. This is likely an underestimate for the Australian population as a whole given selection bias into this cohort, and highlights the ongoing challenges in making genomic resources that benefit everyone.
This presentation will discuss the benefits and risks of alternative approaches to accommodating complex ancestry into allele frequency display. These include those that persist some degree of discrete ancestries, such as local ancestry inference, through to PCA based allele frequency interpolation, and finally the potential of reframing the problem as one of estimating allele age abstracted form ancestry.