Pf8: an open dataset of Plasmodium falciparum genome variation in 33,325 worldwide samples
Abdel Hamid MM., Abdelraheem MH., Acheampong DO., Adam I., Aide P., Ajibaye O., Ali M., Almagro-Garcia J., Amambua-Ngwa A., Amenga-Etego L., Aniebo I., Aninagyei E., Ansah F., Apinjoh TO., Ariani CV., Auburn S., Awandare GA., Balmer A., Bejon P., Boene S., Bwire G., Candrinho B., Chidimatembue A., Chindavongsa K., Comiche K., Conway D., Dara A., Diakite M., Djimde A., Dondorp A., Doumbia S., Drury E., Fanello CA., Ferdig M., Figueroa K., Gamboa D., Golassa L., Gonçalves S., Guindo MDA., Hamaluba M., Hanboonkunupakarn B., Howe K., Hussien M., Imwong M., Ishengoma D., Jeans J., Kabaghe A., Kamuhabwa A., Kindermans J-M., Konate DS., Kwiatkowski DP., Lee C., Lee SK., Lee SJ., Ley B., Llanos-Cuentas A., Marfurt J., Matambisso G., Maude RR., Maude RJ., Mayor A., Mayxay M., Maïga-Ascofaré O., McCann RS., Miles A., Miotto O., Mohamed AO., Morang’a CM., Murie K., Ngasala BE., Nguyen T-N., Nolasco O., Nosten F., Noviyanti R., O'Connor Í., Oboh M., Ochola-Oyier LI., Olufunke Falade C., Olukosi A., Olumide A., Olusola FI., Onyamboko MA., Oriero EC., Oyibo WA., Pannebaker D., Pearson RD., Phiri K., van der Pluijm RW., Price RN., Quang HH., Rajkumar Devaraju V., Randrianarivelojosia M., Ranford-Cartwright L., Rayner JC., Rovira-Vallbona E., Rowlands K., Ruano-Rubio V., Sanchez JF., Saúte F., Shettima S., da Silva C., Simpson VJ., Suddaby S., Takken W., Thu AM., Toure M., Unlu E., Valdivia HO., van Vugt M., Waithira N., Wellems T., Wendler J., White N., Wuendrich Ogidan R.
We describe the Pf8 data resource, the latest MalariaGEN release of curated genome variation data on over 33,000 Plasmodium falciparum samples from 99 partner studies and 122 locations over more than 50 years. This release provides open access to raw sequencing data and genotypes at over 12 million genomic positions. For the first time, it includes copy-number variation (CNV) calls in the drug-resistance associated genes gch1 and crt. As in Pf7, CNV calls are provided for mdr1 and plasmepsin2/3, along with calls for deletion in hrp2 and hrp3, genes associated with rapid diagnostic test failures. This data resource additionally features derived datasets, interactive web applications for exploring patterns of drug resistance and variation in over 5,000 genes, an updated Python package providing methods for accessing and analysing the data, and open access analysis notebooks that can be used as starting points for further analyses. In addition, informative example analyses show contrasting profiles of the decline of chloroquine resistance-associated mutations in Africa, and variation in copy number variation across 10 distinct sub-populations. To the best of our knowledge, Pf8 is the largest open data set of genome variation in any eukaryotic species, making it an invaluable foundational resource for understanding evolution, including that of pathogens.