AccuCalc: A Python Package for Accuracy Calculation in GWAS
Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
36672864
PubMed Central
PMC9858979
DOI
10.3390/genes14010123
PII: genes14010123
Knihovny.cz E-zdroje
- Klíčová slova
- GWAS, Manhattan plot, SP2CM, accuracy, causative mutation, python package,
- MeSH
- celogenomová asociační studie * metody MeSH
- fenotyp MeSH
- genom MeSH
- genomika * metody MeSH
- mutace MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered "GWAS to Genes" strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.
Christopher S Bond Life Sciences Center University of Missouri Columbia MO 65212 USA
Division of Plant Sciences University of Missouri Columbia MO 65201 USA
MU Data Science and Informatics Institute University of Missouri Columbia MO 65212 USA
Zobrazit více v PubMed
Uffelmann E., Huang Q.Q., Munung N.S., de Vries J., Okada Y., Martin A.R., Martin H.C., Lappalainen T., Posthuma D. Genome-wide association studies. Nat. Rev. Methods Prim. 2021;1:59. doi: 10.1038/s43586-021-00056-9. DOI
Cortes L.T., Zhang Z., Yu J. Status and prospects of genome-wide association studies in plants. Plant Genome. 2021;14:e20077. doi: 10.1002/tpg2.20077. PubMed DOI
Korte A., Farlow A. The advantages and limitations of trait analysis with GWAS: A review. Plant Methods. 2013;9:29. doi: 10.1186/1746-4811-9-29. PubMed DOI PMC
Spencer C.C.A., Su Z., Donnelly P., Marchini J. Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genet. 2009;5:e1000477. doi: 10.1371/journal.pgen.1000477. PubMed DOI PMC
Hayes B. Overview of Statistical Methods for Genome-Wide Association Studies (GWAS) In: Gondro C., van der Werf J., Hayes B., editors. Genome-Wide Association Studies and Genomic Prediction. Humana Press; Totowa, NJ, USA: 2013. pp. 149–169. PubMed DOI
Visscher P.M., Brown M.A., McCarthy M.I., Yang J. Five Years of GWAS Discovery. Am. J. Hum. Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. PubMed DOI PMC
Liu H.-J., Yan J. Crop genome-wide association study: A harvest of biological relevance. Plant J. 2018;97:8–18. doi: 10.1111/tpj.14139. PubMed DOI
Zhang Y. On The Use of p-Values in Genome Wide Disease Association Mapping. J. Biom. Biostat. 2016;7:1–2. doi: 10.4172/2155-6180.1000297. DOI
Ball R.D. Genome-Wide Association Studies and Genomic Prediction. Humana Press; Totowa, NJ, USA: 2013. Designing a GWAS: Power, Sample Size, and Data Structure; pp. 37–98. PubMed DOI
Gondro C., Lee S.H., Lee H.K., Porto-Neto L. Quality Control for Genome-Wide Association Studies. In: Gondro C., van der Werf J., Hayes B., editors. Genome-Wide Association Studies and Genomic Prediction. Humana Press; Totowa, NJ, USA: 2013. pp. 129–147. PubMed DOI
Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. PubMed DOI PMC
Škrabišová M., Dietz N., Zeng S., Chan Y.O., Wang J., Liu Y., Biová J., Joshi T., Bilyeu K.D. A novel Synthetic phenotype association study approach reveals the landscape of association for genomic variants and phenotypes. J. Adv. Res. 2022;42:117–133. doi: 10.1016/j.jare.2022.04.004. PubMed DOI PMC
Joshi T., Fitzpatrick M.R., Chen S., Liu Y., Zhang H., Endacott R.Z., Gaudiello E.C., Stacey G., Nguyen H.T., Xu D. Soybean knowledge base (SoyKB): A web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 2013;42:D1245–D1252. doi: 10.1093/nar/gkt905. PubMed DOI PMC
Weigel D., Mott R. The 1001 Genomes Project for Arabidopsis thaliana. Genome Biol. 2009;10:107. doi: 10.1186/gb-2009-10-5-107. PubMed DOI PMC
Navarro Gonzalez J., Zweig A.S., Speir M.L., Schmelter D., Rosenbloom K.R., Raney B.J., Powell C.C., Nassar L.R., Maulding N.D., Lee C.M., et al. The UCSC Genome Browser database: 2021 update. Nucleic Acids Res. 2021;49:D1046–D1057. doi: 10.1093/nar/gkaa1070. PubMed DOI PMC
Bandillo N.B., Lorenz A.J., Graef G.L., Jarquin D., Hyten D.L., Nelson R.L., Specht J.E. Genome-wide Association Mapping of Qualitatively Inherited Traits in a Germplasm Collection. Plant Genome. 2017;10:2. doi: 10.3835/plantgenome2016.06.0054. PubMed DOI
Palmer R.G., Pfeiffer T.W., Buss G.R., Kilen T.C. Soybeans: Improvement, Production, and Uses. 3rd ed. John Wiley & Sons, Ltd.; Hoboken, NJ, USA: 2016. Qualitative Genetics; pp. 137–233.
Zabala G., Vodkin L.O. A Rearrangement Resulting in Small Tandem Repeats in the F3′5′H Gene of White Flower Genotypes Is Associated with the Soybean W1 Locus. Crop. Sci. 2007;47:S-113–S-124. doi: 10.2135/cropsci2006.12.0838tpg. DOI
Song Q., Hyten D.L., Jia G., Quigley C.V., Fickus E.W., Nelson R.L., Cregan P.B. Development and Evaluation of SoySNP50K, a High-Density Genotyping Array for Soybean. PLoS ONE. 2013;8:e54985. doi: 10.1371/journal.pone.0054985. PubMed DOI PMC
Liu S., Fan L., Liu Z., Yang X., Zhang Z., Duan Z., Liang Q., Imran M., Zhang M., Tian Z. A Pd1–Ps–P1 Feedback Loop Controls Pubescence Density in Soybean. Mol. Plant. 2020;13:1768–1783. doi: 10.1016/j.molp.2020.10.004. PubMed DOI
Valliyodan B., Brown A.V., Wang J., Patil G., Liu Y., Otyama P.I., Nelson R.T., Vuong T., Song Q., Musket T.A., et al. Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci. Data. 2021;8:50. doi: 10.1038/s41597-021-00834-w. PubMed DOI PMC