Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences

. 2023 ; 14 () : 1251382. [epub] 20231009

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37928239

The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.

Zobrazit více v PubMed

Alonso-Blanco C., Andrade J., Becker C., Bemm F., Bergelson J., Borgwardt K. M., et al. (2016). 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana . Cell 166, 481–491. 10.1016/j.cell.2016.05.063 PubMed DOI PMC

Bailey T. L., Johnson J., Grant C. E., Noble W. S. (2015). The MEME Suite. Nucleic Acids Res. 43, W39–W49. 10.1093/nar/gkv416 PubMed DOI PMC

Bayer M. (2012). SQLAlchemy. Mountain view: aosabook.org.

Bolger M., Schwacke R., Gundlach H., Schmutzer T., Chen J., Arend D., et al. (2017). From plant genomes to phenotypes. J. Biotechnol. 261, 46–52. 10.1016/j.jbiotec.2017.06.003 PubMed DOI

Castro-Mondragon J. A., Riudavets-Puig R., Rauluseviciute I., Berhanu lemma R., Turchi L., Blanc-Mathieu R., et al. (2021). JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173. 10.1093/nar/gkab1113 PubMed DOI PMC

Gabrielaite M., Torp M. H., Rasmussen M. S., Andreu-Sánchez S., Vieira F. G., Pedersen C. B., et al. (2021). A Comparison of Tools for Copy-Number Variation Detection in Germline Whole Exome and Whole Genome Sequencing Data. Cancers 13, 6283. 10.3390/cancers13246283 PubMed DOI PMC

Goff S., Vaughn M., Mckay S., Lyons E., Stapleton A., Gessler D., et al. (2011). The iPlant Collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2, 34. 10.3389/fpls.2011.00034 PubMed DOI PMC

Goodstein D. M., Shu S., Howson R., Neupane R., Hayes R. D., Fazo J., et al. (2011). Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186. 10.1093/nar/gkr944 PubMed DOI PMC

Jin J., Tian F., Yang D.-C., Meng Y.-Q., Kong L., Luo J., et al. (2016). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 45, D1040–D1045. 10.1093/nar/gkw982 PubMed DOI PMC

Joshi T., Fitzpatrick M. R., Chen S., Liu Y., Zhang H., Endacott R. Z., et al. (2013). Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucleic Acids Res. 42, D1245–D1252. 10.1093/nar/gkt905 PubMed DOI PMC

Joshi T., Patil K., Fitzpatrick M. R., Franklin L. D., Yao Q., Cook J. R., et al. (2012). Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics 13, S15. 10.1186/1471-2164-13-S1-S15 PubMed DOI PMC

Joshi T., Wang J., Zhang H., Chen S., Zeng S., Xu B., et al. (2017). “The Evolution of Soybean Knowledge Base (SoyKB),” in Plant genomics databases: methods and protocols. Editor Van Dijk A. D. J. (New York, NY: Springer New York; ), 149–159. PubMed

Kim M. Y., Lee S., Van K., Kim T.-H., Jeong S.-C., Choi I.-Y., et al. (2010). Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc. Natl. Acad. Sci. 107, 22032–22037. 10.1073/pnas.1009526107 PubMed DOI PMC

Klambauer G., Schwarzbauer K., Mayr A., Clevert D. A., Mitterecker A., Bodenhofer U., et al. (2012). cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res. 40, e69. 10.1093/nar/gks003 PubMed DOI PMC

Li A., Liu A., Wu S., Qu K., Hu H., Yang J., et al. (2022). Comparison of structural variants in the whole genome sequences of two Medicago truncatula ecotypes: jemalong a17 and r108. BMC Plant Biol. 22, 77. 10.1186/s12870-022-03469-0 PubMed DOI PMC

Li H., Durbin R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760. 10.1093/bioinformatics/btp324 PubMed DOI PMC

Liu Y., Du H., Li P., Shen Y., Peng H., Liu S., et al. (2020). Pan-Genome of Wild and Cultivated Soybeans. Cell 182, 162–176. 10.1016/j.cell.2020.05.023 PubMed DOI

Liu Y., Khan S. M., Wang J., Rynge M., Zhang Y., Zeng S., et al. (2016). PGen: large-scale genomic variations analysis workflow and browser in SoyKB. BMC Bioinforma. 17, 337. 10.1186/s12859-016-1227-y PubMed DOI PMC

Mckenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. 10.1101/gr.107524.110 PubMed DOI PMC

Merchant N., Lyons E., Goff S., Vaughn M., Ware D., Micklos D., et al. (2016). The iPlant Collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 14, e1002342. 10.1371/journal.pbio.1002342 PubMed DOI PMC

Périer R. C., Praz V., Junier T., Bonnard C., Bucher P. (2000). The eukaryotic promoter database (EPD). Nucleic Acids Res. 28, 302–303. 10.1093/nar/28.1.302 PubMed DOI PMC

Sakai H., Lee S. S., Tanaka T., Numa H., Kim J., Kawahara Y., et al. (2013). Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol. 54, e6. 10.1093/pcp/pcs183 PubMed DOI PMC

Samarakoon P. S., Sorte H. S., Stray-Pedersen A., Rødningen O. K., Rognes T., Lyle R. (2016). cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data. BMC Genomics 17, 51. 10.1186/s12864-016-2374-2 PubMed DOI PMC

Schneider T. D., Stephens R. M. (1990). Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100. 10.1093/nar/18.20.6097 PubMed DOI PMC

The 3,000 rice genomes project (2014). The 3,000 rice genomes project. GigaScience 3, 7. 10.1186/2047-217X-3-7 PubMed DOI PMC

Thomas S. G., Phillips A. L., Hedden P. (1999). Molecular cloning and functional expression of gibberellin 2- oxidases, multifunctional enzymes involved in gibberellin deactivation. Proc. Natl. Acad. Sci. 96, 4698–4703. 10.1073/pnas.96.8.4698 PubMed DOI PMC

Tian F., Yang D.-C., Meng Y.-Q., Jin J., Gao G. (2019). PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res. 48, D1104-D1113–D1113. 10.1093/nar/gkz1020 PubMed DOI PMC

Valliyodan B., Brown A. V., Wang J., Patil G., Liu Y., Otyama P. I., et al. (2021). Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci. Data 8, 50. 10.1038/s41597-021-00834-w PubMed DOI PMC

Valliyodan B., Nguyen H. T. (2006). Understanding regulatory networks and engineering for enhanced drought tolerance in plants. Curr. Opin. Plant Biol. 9, 189–195. 10.1016/j.pbi.2006.01.019 PubMed DOI

Wang X., Li M.-W., Wong F.-L., Luk C.-Y., Chung C.Y.-L., Yung W.-S., et al. (2021). Increased copy number of gibberellin 2-oxidase 8 genes reduced trailing growth and shoot length during soybean domestication. Plant J. 107, 1739–1755. 10.1111/tpj.15414 PubMed DOI

Xie C., Tammi M. T. (2009). CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinforma. 10, 80. 10.1186/1471-2105-10-80 PubMed DOI PMC

Yevshin I., Sharipov R., Valeev T., Kel A., Kolpakov F. (2017). GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments. Nucleic Acids Res. 45, D61-D67–d67. 10.1093/nar/gkw951 PubMed DOI PMC

Zeng S., Lyu Z., Narisetti S. R. K., Xu D., Joshi T. (2018). “Knowledge Base Commons (KBCommons) v1.0: A multi OMICS' web-based data integration framework for biological discoveries,” in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, December 6 2018, 589–594.

Zeng S., Lyu Z., Narisetti S. R. K., Xu D., Joshi T. (2019). Knowledge Base Commons (KBCommons) v1.1: a universal framework for multi-omics data integration and biological discoveries. BMC Genomics 20, 947. 10.1186/s12864-019-6287-8 PubMed DOI PMC

Zhou Z., Jiang Y., Wang Z., Gou Z., Lyu J., Li W., et al. (2015). Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414. 10.1038/nbt.3096 PubMed DOI

Żmieńko A., Samelak A., Kozłowski P., Figlerowicz M. (2014). Copy number polymorphism in plant genomes. Theor. Appl. Genet. 127, 1–18. 10.1007/s00122-013-2177-7 PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...