Lightning Talk GENEMAPPERS 2026

Genomic Repeat inference from Depth (GRiD): High-Performance VNTR Length Estimation Improves Ancestry-Specific Prediction of Lp(a) (#15)

Zachary Caterer 1 2 , Meng Lin 1 , Qiang An 3 , Ethan Lange 1 , Chris Gignoux 1 , Misa Graff 3 , Christy L Avery 3 , Maggie Stanislawski 1
  1. Department of Biomedical Informatics, University of Colorado Anschutz, Aurora, Colorado, United States
  2. University of Colorado Boulder, Boulder, CO, United States
  3. Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States

Lipoprotein(a) [Lp(a)] is a highly atherogenic, pro-thrombotic lipoprotein and a major risk factor for cardiovascular disease for which highly potent therapeutics are under evaluation. Lp(a) consists of a low-density lipoprotein (LDL)-like particle covalently bound to apolipoprotein(a) [apo(a)]. Apo(a)’s size variation, a major driver of Lp(a)’s high heritability (~80%), is largely determined by a variable number tandem repeat (VNTR) comprised of two-exon repeat units that together account for approximately 70% of the coding sequence. The extreme VNTR length and repetitive structure is difficult to resolve and phase using both short- and long-read sequencing technologies. We are developing a computational framework to estimate haplotype-informed KIV-2 VNTR length across diverse populations, with an emphasis on scalability for large epidemiological datasets. In this framework, we integrate multiple forms of genetic ancestry (e.g., local ancestry and identity-by-descent) to improve phasing accuracy in this complex genomic region where in-phase variants can modify the effect of VNTR length on Lp(a) levels. We are applying this framework in a consortium of ancestrally diverse studies to capture population-specific variation in VNTR length and to generate more accurate, ancestry-aware polygenic risk scores (PRS) for Lp(a) Our open-source Python package, Genomic Repeat inference from Depth (GRiD: github.com/caterer-z-t/GRiD), provides an accessible, flexible, high-performance solution for modeling the genetic architecture of Lp(a). In preliminary analyses, GRiD VNTR-length estimation correlated with Lp(a) levels (R²=0.165;p<0.001), and we expect PRS performance to be highly accurate across population subgroups when combined with LPA variants (results forthcoming) By uniting VNTR-length inference with population-structure information, GRiD facilitates improved genetic risk predictions in large-scale studies and supports downstream translational analyses of cardiovascular risk.