The Allele Catalog Tool: a web-based interactive tool for allele discovery and analysis

. 2023 Mar 10 ; 24 (1) : 107. [epub] 20230310

Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid36899307

Grantová podpora
1920-152-0131-C United Soybean Board
2220-152-0202 United Soybean Board

Odkazy

PubMed 36899307
PubMed Central PMC10007842
DOI 10.1186/s12864-023-09161-3
PII: 10.1186/s12864-023-09161-3
Knihovny.cz E-zdroje

BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.

Zobrazit více v PubMed

Marees AT, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, Derks EM. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018;27(2):e1608. doi: 10.1002/mpr.1608. PubMed DOI PMC

Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–2635. doi: 10.1093/bioinformatics/btm308. PubMed DOI

Milne I, Shaw P, Stephen G, Bayer M, Cardle L, Thomas WTB, Flavell AJ, Marshall D. Flapjack—graphical genotype visualization. Bioinformatics. 2010;26(24):3133–3134. doi: 10.1093/bioinformatics/btq580. PubMed DOI PMC

Zeng S, Škrabišová M, Lyu Z, Chan YO, Bilyeu K, Joshi T. SNPViz v2.0: A web-based tool for enhanced haplotype analysis using large scale resequencing datasets and discovery of phenotypes causative gene using allelic variations. 2020. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): 16–19 Dec. 2020. pp. 1408–1415.

Wang J, Zhang Z. GAPIT version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinformatics. 2021;19(4):629–640. doi: 10.1016/j.gpb.2021.08.005. PubMed DOI PMC

Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, Martin HC, Lappalainen T, Posthuma D. Genome-wide association studies. Nat Rev Methods Primers. 2021;1(1):59. doi: 10.1038/s43586-021-00056-9. DOI

Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480. PubMed DOI

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. PubMed DOI PMC

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303. doi: 10.1101/gr.107524.110. PubMed DOI PMC

Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Human Genet. 2018;103(3):338–348. doi: 10.1016/j.ajhg.2018.07.015. PubMed DOI PMC

Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms. SnpEff Fly. 2012;6(2):80–92. doi: 10.4161/fly.19695. PubMed DOI PMC

Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, Li W, Yu Y, Shu L, Zhao Y, Ma Y, et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol. 2015;33(4):408–414. doi: 10.1038/nbt.3096. PubMed DOI

Liu Y, Du H, Li P, Shen Y, Peng H, Liu S, Zhou G-A, Zhang H, Liu Z, Shi M, et al. Pan-Genome of Wild and Cultivated Soybeans. Cell. 2020;182(1):162–176.e113. doi: 10.1016/j.cell.2020.05.023. PubMed DOI

Valliyodan B, Brown AV, Wang J, Patil G, Liu Y, Otyama PI, Nelson RT, Vuong T, Song Q, Musket TA, et al. Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci Data. 2021;8(1):50. doi: 10.1038/s41597-021-00834-w. PubMed DOI PMC

Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, Choi I-Y, Kim D-S, Lee Y-S, Park D, Ma J, et al. Whole-genome sequencing and intensive analysis of the undomesticated soybean (<i>Glycine soja</i> Sieb. and Zucc.) genome. Proc National Acad Sci. 2010;107(51):22032–22037. doi: 10.1073/pnas.1009526107. PubMed DOI PMC

Valliyodan B, Nguyen HT. Understanding regulatory networks and engineering for enhanced drought tolerance in plants. Curr Opin Plant Biol. 2006;9(2):189–195. doi: 10.1016/j.pbi.2006.01.019. PubMed DOI

Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, et al. The iPlant collaborative: cyberinfrastructure for plant biology. Front Plant Sci. 2011;2. https://www.frontiersin.org/articles/10.3389/fpls.2011.00034/full. PubMed DOI PMC

Merchant N, Lyons E, Goff S, Vaughn M, Ware D, Micklos D, Antin P. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016;14(1):e1002342–e1002342. doi: 10.1371/journal.pbio.1002342. PubMed DOI PMC

Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010.

Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2011;40(D1):D1178–D1186. doi: 10.1093/nar/gkr944. PubMed DOI PMC

Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, et al. 1,135 Genomes reveal the global pattern of polymorphism in arabidopsis thaliana. Cell. 2016;166(2):481–491. doi: 10.1016/j.cell.2016.05.063. PubMed DOI PMC

Bukowski R, Guo X, Lu Y, Zou C, He B, Rong Z, Wang B, Xu D, Yang B, Xie C, et al. Construction of the third-generation Zea mays haplotype map. GigaScience. 2017;7(4):134. PubMed PMC

Joshi T, Patil K, Fitzpatrick MR, Franklin LD, Yao Q, Cook JR, Wang Z, Libault M, Brechenmacher L, Valliyodan B, et al. Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics. 2012;13(1):S15. doi: 10.1186/1471-2164-13-S1-S15. PubMed DOI PMC

Joshi T, Fitzpatrick MR, Chen S, Liu Y, Zhang H, Endacott RZ, Gaudiello EC, Stacey G, Nguyen HT, Xu D. Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 2013;42(D1):D1245–D1252. doi: 10.1093/nar/gkt905. PubMed DOI PMC

Joshi T, Wang J, Zhang H, Chen S, Zeng S, Xu B, Xu D. The Evolution of Soybean Knowledge Base (SoyKB). Plant Genomics Databases: Methods and Protocols. Edited by van Dijk ADJ. New York, NY: Springer New York; 2017. pp. 149–159. PubMed

Zeng S, Lyu Z, Narisetti SRK, Xu D, Joshi T. Knowledge Base Commons (KBCommons) v1.0: A multi OMICS' web-based data integration framework for biological discoveries. 2018. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): 3–6 Dec 2018. pp. 589–594.

Zeng S, Lyu Z, Narisetti S, Xu D, Joshi T. Knowledge Base Commons (KBCommons) v1.0: a universal framework for multi-omics data integration and biological discoveries. BMC Genomics. 2019;20(11):947. doi: 10.1186/s12864-019-6287-8. PubMed DOI PMC

Ogiso-Tanaka E, Shimizu T, Hajika M, Kaga A, Ishimoto M. Highly multiplexed AmpliSeq technology identifies novel variation of flowering time-related genes in soybean (Glycine max) DNA Res. 2019;26(3):243–260. doi: 10.1093/dnares/dsz005. PubMed DOI PMC

Dietz N, Combs-Giroir R, Cooper G, Stacey M, Miranda C, Bilyeu K. Geographic distribution of the E1 family of genes and their effects on reproductive timing in soybean. BMC Plant Biol. 2021;21(1):441. doi: 10.1186/s12870-021-03197-x. PubMed DOI PMC

Gillman JD, Tetlow A, Lee J-D, Shannon JG, Bilyeu K. Loss-of-function mutations affecting a specific Glycine max R2R3 MYB transcription factor result in brown hilum and brown seed coats. BMC Plant Biol. 2011;11(1):155. doi: 10.1186/1471-2229-11-155. PubMed DOI PMC

Bouchet S, Servin B, Bertin P, Madur D, Combes V, Dumas F, Brunel D, Laborde J, Charcosset A, Nicolas S. Adaptation of maize to temperate climates: mid-density genome-wide association genetics and diversity patterns reveal key genomic regions, with a major contribution of the Vgt2 (ZCN8) locus. PLoS ONE. 2013;8(8):e71377. doi: 10.1371/journal.pone.0071377. PubMed DOI PMC

Castelletti S, Coupel-Ledru A, Granato I, Palaffre C, Cabrera-Bosquet L, Tonelli C, Nicolas SD, Tardieu F, Welcker C, Conti L. Maize adaptation across temperate climates was obtained via expression of two florigen genes. PLoS Genet. 2020;16(7):e1008882. doi: 10.1371/journal.pgen.1008882. PubMed DOI PMC

Lazakis CM, Coneva V, Colasanti J. ZCN8 encodes a potential orthologue of Arabidopsis FT florigen that integrates both endogenous and photoperiod flowering signals in maize. J Exp Bot. 2011;62(14):4833–4842. doi: 10.1093/jxb/err129. PubMed DOI PMC

Romero Navarro JA, Willcox M, Burgueño J, Romay C, Swarts K, Trachsel S, Preciado E, Terron A, Delgado HV, Vidal V, et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat Genet. 2017;49(3):476–480. doi: 10.1038/ng.3784. PubMed DOI

Guo L, Wang X, Zhao M, Huang C, Li C, Li D, Yang CJ, York AM, Xue W, Xu G, et al. Stepwise cis-Regulatory Changes in ZCN8 Contribute to Maize Flowering-Time Adaptation. Curr Biol. 2018;28(18):3005–3015.e3004. doi: 10.1016/j.cub.2018.07.029. PubMed DOI PMC

Bentsink L, Jowett J, Hanhart CJ, Koornneef M. Cloning of <i>DOG1</i>, a quantitative trait locus controlling seed dormancy in <i>Arabidopsis</i>. Proc Natl Acad Sci. 2006;103(45):17042–17047. doi: 10.1073/pnas.0607877103. PubMed DOI PMC

Chiang GCK, Bartsch M, Barua D, Nakabayashi K, Debieu M, Kronholm I, Koornneef M, Soppe WJJ, Donohue K, de Meaux J. DOG1 expression is predicted by the seed-maturation environment and contributes to geographical variation in germination in Arabidopsis thaliana. Mol Ecol. 2011;20(16):3336–3349. doi: 10.1111/j.1365-294X.2011.05181.x. PubMed DOI

Debieu M, Tang C, Stich B, Sikosek T, Effgen S, Josephs E, Schmitt J, Nordborg M, Koornneef M, de Meaux J. Co-Variation between Seed Dormancy, Growth Rate and Flowering Time Changes with Latitude in Arabidopsis thaliana. PLoS One. 2013;8(5):e61075. doi: 10.1371/journal.pone.0061075. PubMed DOI PMC

Kronholm I, Picó FX, Alonso-Blanco C, Goudet J. Meaux Jd: genetic basis of adaptation in arabidopsis thaliana: local adaptation at the seed dormancy qtl dog1. Evolution. 2012;66(7):2287–2302. doi: 10.1111/j.1558-5646.2012.01590.x. PubMed DOI

Kerdaffrec E, Filiault DL, Korte A, Sasaki E, Nizhynska V, Seren Ü, Nordborg M. Multiple alleles at a single locus control seed dormancy in Swedish Arabidopsis. ELife. 2016;5:e22502. doi: 10.7554/eLife.22502. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...