Large-scale cohort studies linking genomic and longitudinal health/medical data are highly valuable for insights into health and disease. In particular, the 45 and Up Study[1] includes 267,357 participants age 45+ years recruited in 2005-2009, with extensive information on health and sociodemographic characteristics[2-3]. As part of the Australian Cancer Risk Study, we invited 30,541 participants to provide a DNA sample, including a randomly selected sub-cohort (n=9,986), and all participants diagnosed with prostate, breast, melanoma and colorectal cancer who were alive in October 2021 (identified via linked NSW Cancer Registry data).
Overall, 8,311 participants consented (27%), with higher consent among individuals age <85, with university education, excellent self-reported health, and higher household incomes (p<0.0001, adjusting for a wide range of health and sociodemographic characteristics). We then generated new genomic data for 7,408 participants using low-coverage whole-genome sequencing (minimum=0.4X, median coverage=0.8X) and genotype imputation (GLIMPSE2, using Gencove’s well-established analysis pipeline).
Following in-depth sample- and variant-level quality checks (QC), we retained a final high-quality genomic dataset of 6,741 participants, including 6,545 unrelated individuals with inferred European genetic ancestry.
Among the 6,545 unrelated European-ancestry participants, minor allele frequencies of >955K post-QC HapMap3 variants were highly correlated with European-ancestry 1000 Genomes reference data (r=0.998, p<10-10).
Considering key examples of cancer polygenic risk scores (PGS; breast: PGS313[4]; prostate: PGS269[5], PGS451, PGS400[6]; melanoma: PGS68[7]; colorectal: PGS205[8], PGS252[8-9]), we found that 62-76% of variants passed QC and were generally imputed with high confidence (genotype probability >90% in ≥90% participants). The risk prediction performance of these PGS in our new data was comparable to previous studies for prostate, breast and melanoma (area-under-the-ROC-curve 0.66-0.68, 0.62, 0.64, respectively), but slightly reduced for colorectal cancer (~0.57 vs ~0.62[6]).
In conclusion, we present a major newly generated high-quality Australian genomics resource, readily integrated with longitudinal linked health data from the 45 and Up Study.