Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
Faculty of Health Sciences, East Africa
Background: Genetic studies of biomedical phenotypes in underrepresented populations identify disproportionate numbers of novel associations. However, current genomics infrastructure--including most genotyping arrays and sequenced reference panels--best serves populations of European descent. A critical step for facilitating genetic studies in underrepresented populations is to ensure that genetic technologies accurately capture variation in all populations. Here, we quantify the accuracy of low-coverage sequencing in diverse African populations.
Results: We sequenced the whole genomes of 91 individuals to high-coverage (>20X) from the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study, in which participants were recruited from Ethiopia, Kenya, South Africa, and Uganda. We empirically tested two data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole genome sequencing data. We show that low-coverage sequencing at a depth of ≥4X captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1X) performed comparable to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation, with 4X sequencing detecting 45% of singletons and 95% of common variants identified in high-coverage African whole genomes.
Conclusion: These results indicate that low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, including those that capture variation most common in Europeans and Africans. Low-coverage sequencing effectively identifies novel variation (particularly in underrepresented populations), and presents opportunities to enhance variant discovery at a similar cost to traditional approaches.
Martin, A. R.,
Atkinson, E. G.,
Chapman, S. B.,
Stroud, R. E.,
Ashaba, F. K.,
(2020). Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations. bioRxiv, 1-16.
Available at: https://ecommons.aku.edu/eastafrica_fhs_mc_intern_med/156