The Allele Catalog Tool: a web-based interactive tool for allele discovery and analysis
Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
1920-152-0131-C
United Soybean Board
2220-152-0202
United Soybean Board
PubMed
36899307
PubMed Central
PMC10007842
DOI
10.1186/s12864-023-09161-3
PII: 10.1186/s12864-023-09161-3
Knihovny.cz E-zdroje
- Klíčová slova
- Allele Catalog Pipeline, Allele Catalog Tool, Alleles in Gene, Data Visualization, Variant Calling Pipeline,
- MeSH
- alely * MeSH
- Arabidopsis * genetika MeSH
- data mining * metody MeSH
- datové soubory jako téma * MeSH
- frekvence genu MeSH
- genotyp MeSH
- Glycine max * genetika MeSH
- internet * MeSH
- kukuřice setá * genetika MeSH
- metadata MeSH
- mutace MeSH
- pigmentace genetika MeSH
- rostlinné geny genetika MeSH
- software * MeSH
- substituce aminokyselin MeSH
- vegetační klid genetika MeSH
- vizualizace dat MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- DOG1 protein, Arabidopsis MeSH Prohlížeč
BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.
Christopher S Bond Life Sciences Center University of Missouri Columbia Columbia MO USA
Department of Biochemistry Faculty of Science Palacky University in Olomouc Olomouc Czech Republic
Department of Evolution and Ecology University of California Davis Davis CA USA
Department of Health Management and Informatics University of Missouri Columbia Columbia MO USA
Division of Plant Science and Technology University of Missouri Columbia Columbia MO USA
MU Institute for Data Science and Informatics University of Missouri Columbia Columbia MO USA
Zobrazit více v PubMed
Marees AT, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, Derks EM. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018;27(2):e1608. doi: 10.1002/mpr.1608. PubMed DOI PMC
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–2635. doi: 10.1093/bioinformatics/btm308. PubMed DOI
Milne I, Shaw P, Stephen G, Bayer M, Cardle L, Thomas WTB, Flavell AJ, Marshall D. Flapjack—graphical genotype visualization. Bioinformatics. 2010;26(24):3133–3134. doi: 10.1093/bioinformatics/btq580. PubMed DOI PMC
Zeng S, Škrabišová M, Lyu Z, Chan YO, Bilyeu K, Joshi T. SNPViz v2.0: A web-based tool for enhanced haplotype analysis using large scale resequencing datasets and discovery of phenotypes causative gene using allelic variations. 2020. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): 16–19 Dec. 2020. pp. 1408–1415.
Wang J, Zhang Z. GAPIT version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinformatics. 2021;19(4):629–640. doi: 10.1016/j.gpb.2021.08.005. PubMed DOI PMC
Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, Martin HC, Lappalainen T, Posthuma D. Genome-wide association studies. Nat Rev Methods Primers. 2021;1(1):59. doi: 10.1038/s43586-021-00056-9. DOI
Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480. PubMed DOI
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. PubMed DOI PMC
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303. doi: 10.1101/gr.107524.110. PubMed DOI PMC
Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Human Genet. 2018;103(3):338–348. doi: 10.1016/j.ajhg.2018.07.015. PubMed DOI PMC
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms. SnpEff Fly. 2012;6(2):80–92. doi: 10.4161/fly.19695. PubMed DOI PMC
Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, Li W, Yu Y, Shu L, Zhao Y, Ma Y, et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol. 2015;33(4):408–414. doi: 10.1038/nbt.3096. PubMed DOI
Liu Y, Du H, Li P, Shen Y, Peng H, Liu S, Zhou G-A, Zhang H, Liu Z, Shi M, et al. Pan-Genome of Wild and Cultivated Soybeans. Cell. 2020;182(1):162–176.e113. doi: 10.1016/j.cell.2020.05.023. PubMed DOI
Valliyodan B, Brown AV, Wang J, Patil G, Liu Y, Otyama PI, Nelson RT, Vuong T, Song Q, Musket TA, et al. Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci Data. 2021;8(1):50. doi: 10.1038/s41597-021-00834-w. PubMed DOI PMC
Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, Choi I-Y, Kim D-S, Lee Y-S, Park D, Ma J, et al. Whole-genome sequencing and intensive analysis of the undomesticated soybean (<i>Glycine soja</i> Sieb. and Zucc.) genome. Proc National Acad Sci. 2010;107(51):22032–22037. doi: 10.1073/pnas.1009526107. PubMed DOI PMC
Valliyodan B, Nguyen HT. Understanding regulatory networks and engineering for enhanced drought tolerance in plants. Curr Opin Plant Biol. 2006;9(2):189–195. doi: 10.1016/j.pbi.2006.01.019. PubMed DOI
Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, et al. The iPlant collaborative: cyberinfrastructure for plant biology. Front Plant Sci. 2011;2. https://www.frontiersin.org/articles/10.3389/fpls.2011.00034/full. PubMed DOI PMC
Merchant N, Lyons E, Goff S, Vaughn M, Ware D, Micklos D, Antin P. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016;14(1):e1002342–e1002342. doi: 10.1371/journal.pbio.1002342. PubMed DOI PMC
Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010.
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2011;40(D1):D1178–D1186. doi: 10.1093/nar/gkr944. PubMed DOI PMC
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, et al. 1,135 Genomes reveal the global pattern of polymorphism in arabidopsis thaliana. Cell. 2016;166(2):481–491. doi: 10.1016/j.cell.2016.05.063. PubMed DOI PMC
Bukowski R, Guo X, Lu Y, Zou C, He B, Rong Z, Wang B, Xu D, Yang B, Xie C, et al. Construction of the third-generation Zea mays haplotype map. GigaScience. 2017;7(4):134. PubMed PMC
Joshi T, Patil K, Fitzpatrick MR, Franklin LD, Yao Q, Cook JR, Wang Z, Libault M, Brechenmacher L, Valliyodan B, et al. Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics. 2012;13(1):S15. doi: 10.1186/1471-2164-13-S1-S15. PubMed DOI PMC
Joshi T, Fitzpatrick MR, Chen S, Liu Y, Zhang H, Endacott RZ, Gaudiello EC, Stacey G, Nguyen HT, Xu D. Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 2013;42(D1):D1245–D1252. doi: 10.1093/nar/gkt905. PubMed DOI PMC
Joshi T, Wang J, Zhang H, Chen S, Zeng S, Xu B, Xu D. The Evolution of Soybean Knowledge Base (SoyKB). Plant Genomics Databases: Methods and Protocols. Edited by van Dijk ADJ. New York, NY: Springer New York; 2017. pp. 149–159. PubMed
Zeng S, Lyu Z, Narisetti SRK, Xu D, Joshi T. Knowledge Base Commons (KBCommons) v1.0: A multi OMICS' web-based data integration framework for biological discoveries. 2018. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): 3–6 Dec 2018. pp. 589–594.
Zeng S, Lyu Z, Narisetti S, Xu D, Joshi T. Knowledge Base Commons (KBCommons) v1.0: a universal framework for multi-omics data integration and biological discoveries. BMC Genomics. 2019;20(11):947. doi: 10.1186/s12864-019-6287-8. PubMed DOI PMC
Ogiso-Tanaka E, Shimizu T, Hajika M, Kaga A, Ishimoto M. Highly multiplexed AmpliSeq technology identifies novel variation of flowering time-related genes in soybean (Glycine max) DNA Res. 2019;26(3):243–260. doi: 10.1093/dnares/dsz005. PubMed DOI PMC
Dietz N, Combs-Giroir R, Cooper G, Stacey M, Miranda C, Bilyeu K. Geographic distribution of the E1 family of genes and their effects on reproductive timing in soybean. BMC Plant Biol. 2021;21(1):441. doi: 10.1186/s12870-021-03197-x. PubMed DOI PMC
Gillman JD, Tetlow A, Lee J-D, Shannon JG, Bilyeu K. Loss-of-function mutations affecting a specific Glycine max R2R3 MYB transcription factor result in brown hilum and brown seed coats. BMC Plant Biol. 2011;11(1):155. doi: 10.1186/1471-2229-11-155. PubMed DOI PMC
Bouchet S, Servin B, Bertin P, Madur D, Combes V, Dumas F, Brunel D, Laborde J, Charcosset A, Nicolas S. Adaptation of maize to temperate climates: mid-density genome-wide association genetics and diversity patterns reveal key genomic regions, with a major contribution of the Vgt2 (ZCN8) locus. PLoS ONE. 2013;8(8):e71377. doi: 10.1371/journal.pone.0071377. PubMed DOI PMC
Castelletti S, Coupel-Ledru A, Granato I, Palaffre C, Cabrera-Bosquet L, Tonelli C, Nicolas SD, Tardieu F, Welcker C, Conti L. Maize adaptation across temperate climates was obtained via expression of two florigen genes. PLoS Genet. 2020;16(7):e1008882. doi: 10.1371/journal.pgen.1008882. PubMed DOI PMC
Lazakis CM, Coneva V, Colasanti J. ZCN8 encodes a potential orthologue of Arabidopsis FT florigen that integrates both endogenous and photoperiod flowering signals in maize. J Exp Bot. 2011;62(14):4833–4842. doi: 10.1093/jxb/err129. PubMed DOI PMC
Romero Navarro JA, Willcox M, Burgueño J, Romay C, Swarts K, Trachsel S, Preciado E, Terron A, Delgado HV, Vidal V, et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat Genet. 2017;49(3):476–480. doi: 10.1038/ng.3784. PubMed DOI
Guo L, Wang X, Zhao M, Huang C, Li C, Li D, Yang CJ, York AM, Xue W, Xu G, et al. Stepwise cis-Regulatory Changes in ZCN8 Contribute to Maize Flowering-Time Adaptation. Curr Biol. 2018;28(18):3005–3015.e3004. doi: 10.1016/j.cub.2018.07.029. PubMed DOI PMC
Bentsink L, Jowett J, Hanhart CJ, Koornneef M. Cloning of <i>DOG1</i>, a quantitative trait locus controlling seed dormancy in <i>Arabidopsis</i>. Proc Natl Acad Sci. 2006;103(45):17042–17047. doi: 10.1073/pnas.0607877103. PubMed DOI PMC
Chiang GCK, Bartsch M, Barua D, Nakabayashi K, Debieu M, Kronholm I, Koornneef M, Soppe WJJ, Donohue K, de Meaux J. DOG1 expression is predicted by the seed-maturation environment and contributes to geographical variation in germination in Arabidopsis thaliana. Mol Ecol. 2011;20(16):3336–3349. doi: 10.1111/j.1365-294X.2011.05181.x. PubMed DOI
Debieu M, Tang C, Stich B, Sikosek T, Effgen S, Josephs E, Schmitt J, Nordborg M, Koornneef M, de Meaux J. Co-Variation between Seed Dormancy, Growth Rate and Flowering Time Changes with Latitude in Arabidopsis thaliana. PLoS One. 2013;8(5):e61075. doi: 10.1371/journal.pone.0061075. PubMed DOI PMC
Kronholm I, Picó FX, Alonso-Blanco C, Goudet J. Meaux Jd: genetic basis of adaptation in arabidopsis thaliana: local adaptation at the seed dormancy qtl dog1. Evolution. 2012;66(7):2287–2302. doi: 10.1111/j.1558-5646.2012.01590.x. PubMed DOI
Kerdaffrec E, Filiault DL, Korte A, Sasaki E, Nizhynska V, Seren Ü, Nordborg M. Multiple alleles at a single locus control seed dormancy in Swedish Arabidopsis. ELife. 2016;5:e22502. doi: 10.7554/eLife.22502. PubMed DOI PMC