JavaScript NENÍ povolen !

* Zobrazit nápovědu

5 záznamů v PubMed Filtry

Článek

Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences

Chan, Yen On
Autor Chan, Yen On MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, United States
Biová, Jana
Autor Biová, Jana Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czechia
Mahmood, Anser
Autor Mahmood, Anser Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
Dietz, Nicholas
Autor Dietz, Nicholas Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
Bilyeu, Kristin
Autor Bilyeu, Kristin Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Columbia, MO, United States
Škrabišová, Mária
Autor Škrabišová, Mária Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czechia
Joshi, Trupti
Autor Joshi, Trupti MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, United States Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, United States Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, United States Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri-Columbia, Columbia, MO, United States

Frontiers in genetics. 2023 ; 14 () : 1251382. [epub] 20231009

Front Genet
ISSN 1664-8021
Zdroj

The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.

Klíčová slova
Indels, SNPs, copy number variation, genomic variations, phenotypes, promoter, transcription factor, whole genome re-sequencing data,
Publikační typ
časopisecké články MeSH

Článek

The Allele Catalog Tool: a web-based interactive tool for allele discovery and analysis

BMC genomics. 2023 Mar 10 ; 24 (1) : 107. [epub] 20230310

BMC Genomics
ISSN 1471-2164
Zdroj

BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.

Klíčová slova
Allele Catalog Pipeline, Allele Catalog Tool, Alleles in Gene, Data Visualization, Variant Calling Pipeline,
MeSH
alely * MeSH
Arabidopsis * genetika MeSH
data mining * metody MeSH
datové soubory jako téma * MeSH
frekvence genu MeSH
genotyp MeSH
Glycine max * genetika MeSH
internet * MeSH
kukuřice setá * genetika MeSH
metadata MeSH
mutace MeSH
pigmentace genetika MeSH
rostlinné geny genetika MeSH
software * MeSH
substituce aminokyselin MeSH
vegetační klid genetika MeSH
vizualizace dat MeSH
Publikační typ
časopisecké články MeSH
Názvy látek
DOG1 protein, Arabidopsis MeSH Prohlížeč

Článek

Natural and artificial selection of multiple alleles revealed through genomic analyses

Frontiers in genetics. 2023 ; 14 () : 1320652. [epub] 20240108

Front Genet
ISSN 1664-8021
Zdroj

Genome-to-phenome research in agriculture aims to improve crops through in silico predictions. Genome-wide association study (GWAS) is potent in identifying genomic loci that underlie important traits. As a statistical method, increasing the sample quantity, data quality, or diversity of the GWAS dataset positively impacts GWAS power. For more precise breeding, concrete candidate genes with exact functional variants must be discovered. Many post-GWAS methods have been developed to narrow down the associated genomic regions and, ideally, to predict candidate genes and causative mutations (CMs). Historical natural selection and breeding-related artificial selection both act to change the frequencies of different alleles of genes that control phenotypes. With higher diversity and more extensive GWAS datasets, there is an increased chance of multiple alleles with independent CMs in a single causal gene. This can be caused by the presence of samples from geographically isolated regions that arose during natural or artificial selection. This simple fact is a complicating factor in GWAS-driven discoveries. Currently, none of the existing association methods address this issue and need to identify multiple alleles and, more specifically, the actual CMs. Therefore, we developed a tool that computes a score for a combination of variant positions in a single candidate gene and, based on the highest score, identifies the best number and combination of CMs. The tool is publicly available as a Python package on GitHub, and we further created a web-based Multiple Alleles discovery (MADis) tool that supports soybean and is hosted in SoyKB (https://soykb.org/SoybeanMADisTool/). We tested and validated the algorithm and presented the utilization of MADis in a pod pigmentation L1 gene case study with multiple CMs from natural or artificial selection. Finally, we identified a candidate gene for the pod color L2 locus and predicted the existence of multiple alleles that potentially cause loss of pod pigmentation. In this work, we show how a genomic analysis can be employed to explore the natural and artificial selection of multiple alleles and, thus, improve and accelerate crop breeding in agriculture.

Klíčová slova
GWAS, alleles, breeding, causal gene, causative mutation, genetic variation, soybean,
Publikační typ
časopisecké články MeSH

Článek

AccuCalc: A Python Package for Accuracy Calculation in GWAS

Genes. 2023 Jan 01 ; 14 (1) : . [epub] 20230101

Genes (Basel)
ISSN 2073-4425
Zdroj

The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered "GWAS to Genes" strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.

Klíčová slova
GWAS, Manhattan plot, SP2CM, accuracy, causative mutation, python package,
MeSH
celogenomová asociační studie * metody MeSH
fenotyp MeSH
genom MeSH
genomika * metody MeSH
mutace MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

A novel Synthetic phenotype association study approach reveals the landscape of association for genomic variants and phenotypes

Journal of advanced research. 2022 Dec ; 42 () : 117-133. [epub] 20220412

J Adv Res
ISSN 2090-1224
Zdroj

INTRODUCTION: Genome-Wide Association Studies (GWAS) identify tagging variants in the genome that are statistically associated with the phenotype because of their linkage disequilibrium (LD) relationship with the causative mutation (CM). When both low-density genotyped accession panels with phenotypes and resequenced data accession panels are available, tagging variants can assist with post-GWAS challenges in CM discovery. OBJECTIVES: Our objective was to identify additional GWAS evaluation criteria to assess correspondence between genomic variants and phenotypes, as well as enable deeper analysis of the localized landscape of association. METHODS: We used genomic variant positions as Synthetic phenotypes in GWAS that we named "Synthetic phenotype association study" (SPAS). The extreme case of SPAS is what we call an "Inverse GWAS" where we used CM positions of cloned soybean genes. We developed and validated the Accuracy concept as a measure of the correspondence between variant positions and phenotypes. RESULTS: The SPAS approach demonstrated that the genotype status of an associated variant used as a Synthetic phenotype enabled us to explore the relationships between tagging variants and CMs, and further, that utilizing CMs as Synthetic phenotypes in Inverse GWAS illuminated the landscape of association. We implemented the Accuracy calculation for a curated accession panel to an online Accuracy calculation tool (AccuTool) as a resource for gene identification in soybean. We demonstrated our concepts on three examples of soybean cloned genes. As a result of our findings, we devised an enhanced "GWAS to Genes" analysis (Synthetic phenotype to CM strategy, SP2CM). Using SP2CM, we identified a CM for a novel gene. CONCLUSION: The SP2CM strategy utilizing Synthetic phenotypes and the Accuracy calculation of correspondence provides crucial information to assist researchers in CM discovery. The impact of this work is a more effective evaluation of landscapes of GWAS associations.

Klíčová slova
GWAS, Genomics, Genotyping, Phenotyping, Resequencing, Soybean,
MeSH
celogenomová asociační studie * MeSH
fenotyp MeSH
genomika * MeSH
genotyp MeSH
vazebná nerovnováha MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

* Zobrazit nápovědu

Upřesnit dle MeSH