Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
37928239
PubMed Central
PMC10623549
DOI
10.3389/fgene.2023.1251382
PII: 1251382
Knihovny.cz E-zdroje
- Klíčová slova
- Indels, SNPs, copy number variation, genomic variations, phenotypes, promoter, transcription factor, whole genome re-sequencing data,
- Publikační typ
- časopisecké články MeSH
The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.
Christopher S Bond Life Sciences Center University of Missouri Columbia Columbia MO United States
Department of Biochemistry Faculty of Science Palacky University in Olomouc Olomouc Czechia
Division of Plant Science and Technology University of Missouri Columbia Columbia MO United States
Zobrazit více v PubMed
Alonso-Blanco C., Andrade J., Becker C., Bemm F., Bergelson J., Borgwardt K. M., et al. (2016). 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana . Cell 166, 481–491. 10.1016/j.cell.2016.05.063 PubMed DOI PMC
Bailey T. L., Johnson J., Grant C. E., Noble W. S. (2015). The MEME Suite. Nucleic Acids Res. 43, W39–W49. 10.1093/nar/gkv416 PubMed DOI PMC
Bayer M. (2012). SQLAlchemy. Mountain view: aosabook.org.
Bolger M., Schwacke R., Gundlach H., Schmutzer T., Chen J., Arend D., et al. (2017). From plant genomes to phenotypes. J. Biotechnol. 261, 46–52. 10.1016/j.jbiotec.2017.06.003 PubMed DOI
Castro-Mondragon J. A., Riudavets-Puig R., Rauluseviciute I., Berhanu lemma R., Turchi L., Blanc-Mathieu R., et al. (2021). JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173. 10.1093/nar/gkab1113 PubMed DOI PMC
Gabrielaite M., Torp M. H., Rasmussen M. S., Andreu-Sánchez S., Vieira F. G., Pedersen C. B., et al. (2021). A Comparison of Tools for Copy-Number Variation Detection in Germline Whole Exome and Whole Genome Sequencing Data. Cancers 13, 6283. 10.3390/cancers13246283 PubMed DOI PMC
Goff S., Vaughn M., Mckay S., Lyons E., Stapleton A., Gessler D., et al. (2011). The iPlant Collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2, 34. 10.3389/fpls.2011.00034 PubMed DOI PMC
Goodstein D. M., Shu S., Howson R., Neupane R., Hayes R. D., Fazo J., et al. (2011). Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186. 10.1093/nar/gkr944 PubMed DOI PMC
Jin J., Tian F., Yang D.-C., Meng Y.-Q., Kong L., Luo J., et al. (2016). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 45, D1040–D1045. 10.1093/nar/gkw982 PubMed DOI PMC
Joshi T., Fitzpatrick M. R., Chen S., Liu Y., Zhang H., Endacott R. Z., et al. (2013). Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 42, D1245–D1252. 10.1093/nar/gkt905 PubMed DOI PMC
Joshi T., Patil K., Fitzpatrick M. R., Franklin L. D., Yao Q., Cook J. R., et al. (2012). Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics 13, S15. 10.1186/1471-2164-13-S1-S15 PubMed DOI PMC
Joshi T., Wang J., Zhang H., Chen S., Zeng S., Xu B., et al. (2017). “The Evolution of Soybean Knowledge Base (SoyKB),” in Plant genomics databases: methods and protocols. Editor Van Dijk A. D. J. (New York, NY: Springer New York; ), 149–159. PubMed
Kim M. Y., Lee S., Van K., Kim T.-H., Jeong S.-C., Choi I.-Y., et al. (2010). Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc. Natl. Acad. Sci. 107, 22032–22037. 10.1073/pnas.1009526107 PubMed DOI PMC
Klambauer G., Schwarzbauer K., Mayr A., Clevert D. A., Mitterecker A., Bodenhofer U., et al. (2012). cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res. 40, e69. 10.1093/nar/gks003 PubMed DOI PMC
Li A., Liu A., Wu S., Qu K., Hu H., Yang J., et al. (2022). Comparison of structural variants in the whole genome sequences of two Medicago truncatula ecotypes: jemalong a17 and r108. BMC Plant Biol. 22, 77. 10.1186/s12870-022-03469-0 PubMed DOI PMC
Li H., Durbin R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760. 10.1093/bioinformatics/btp324 PubMed DOI PMC
Liu Y., Du H., Li P., Shen Y., Peng H., Liu S., et al. (2020). Pan-Genome of Wild and Cultivated Soybeans. Cell 182, 162–176. 10.1016/j.cell.2020.05.023 PubMed DOI
Liu Y., Khan S. M., Wang J., Rynge M., Zhang Y., Zeng S., et al. (2016). PGen: large-scale genomic variations analysis workflow and browser in SoyKB. BMC Bioinforma. 17, 337. 10.1186/s12859-016-1227-y PubMed DOI PMC
Mckenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. 10.1101/gr.107524.110 PubMed DOI PMC
Merchant N., Lyons E., Goff S., Vaughn M., Ware D., Micklos D., et al. (2016). The iPlant Collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 14, e1002342. 10.1371/journal.pbio.1002342 PubMed DOI PMC
Périer R. C., Praz V., Junier T., Bonnard C., Bucher P. (2000). The eukaryotic promoter database (EPD). Nucleic Acids Res. 28, 302–303. 10.1093/nar/28.1.302 PubMed DOI PMC
Sakai H., Lee S. S., Tanaka T., Numa H., Kim J., Kawahara Y., et al. (2013). Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol. 54, e6. 10.1093/pcp/pcs183 PubMed DOI PMC
Samarakoon P. S., Sorte H. S., Stray-Pedersen A., Rødningen O. K., Rognes T., Lyle R. (2016). cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data. BMC Genomics 17, 51. 10.1186/s12864-016-2374-2 PubMed DOI PMC
Schneider T. D., Stephens R. M. (1990). Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100. 10.1093/nar/18.20.6097 PubMed DOI PMC
The 3,000 rice genomes project (2014). The 3,000 rice genomes project. GigaScience 3, 7. 10.1186/2047-217X-3-7 PubMed DOI PMC
Thomas S. G., Phillips A. L., Hedden P. (1999). Molecular cloning and functional expression of gibberellin 2- oxidases, multifunctional enzymes involved in gibberellin deactivation. Proc. Natl. Acad. Sci. 96, 4698–4703. 10.1073/pnas.96.8.4698 PubMed DOI PMC
Tian F., Yang D.-C., Meng Y.-Q., Jin J., Gao G. (2019). PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res. 48, D1104-D1113–D1113. 10.1093/nar/gkz1020 PubMed DOI PMC
Valliyodan B., Brown A. V., Wang J., Patil G., Liu Y., Otyama P. I., et al. (2021). Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci. Data 8, 50. 10.1038/s41597-021-00834-w PubMed DOI PMC
Valliyodan B., Nguyen H. T. (2006). Understanding regulatory networks and engineering for enhanced drought tolerance in plants. Curr. Opin. Plant Biol. 9, 189–195. 10.1016/j.pbi.2006.01.019 PubMed DOI
Wang X., Li M.-W., Wong F.-L., Luk C.-Y., Chung C.Y.-L., Yung W.-S., et al. (2021). Increased copy number of gibberellin 2-oxidase 8 genes reduced trailing growth and shoot length during soybean domestication. Plant J. 107, 1739–1755. 10.1111/tpj.15414 PubMed DOI
Xie C., Tammi M. T. (2009). CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinforma. 10, 80. 10.1186/1471-2105-10-80 PubMed DOI PMC
Yevshin I., Sharipov R., Valeev T., Kel A., Kolpakov F. (2017). GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments. Nucleic Acids Res. 45, D61-D67–d67. 10.1093/nar/gkw951 PubMed DOI PMC
Zeng S., Lyu Z., Narisetti S. R. K., Xu D., Joshi T. (2018). “Knowledge Base Commons (KBCommons) v1.0: A multi OMICS' web-based data integration framework for biological discoveries,” in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, December 6 2018, 589–594.
Zeng S., Lyu Z., Narisetti S. R. K., Xu D., Joshi T. (2019). Knowledge Base Commons (KBCommons) v1.1: a universal framework for multi-omics data integration and biological discoveries. BMC Genomics 20, 947. 10.1186/s12864-019-6287-8 PubMed DOI PMC
Zhou Z., Jiang Y., Wang Z., Gou Z., Lyu J., Li W., et al. (2015). Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414. 10.1038/nbt.3096 PubMed DOI
Żmieńko A., Samelak A., Kozłowski P., Figlerowicz M. (2014). Copy number polymorphism in plant genomes. Theor. Appl. Genet. 127, 1–18. 10.1007/s00122-013-2177-7 PubMed DOI PMC