The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered "GWAS to Genes" strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.
- Keywords
- GWAS, Manhattan plot, SP2CM, accuracy, causative mutation, python package,
- MeSH
- Genome-Wide Association Study * methods MeSH
- Phenotype MeSH
- Genome MeSH
- Genomics * methods MeSH
- Mutation MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: Phylogenies are a key part of research in many areas of biology. Tools that automate some parts of the process of phylogenetic reconstruction, mainly molecular character matrix assembly, have been developed for the advantage of both specialists in the field of phylogenetics and non-specialists. However, interpretation of results, comparison with previously available phylogenetic hypotheses, and selection of one phylogeny for downstream analyses and discussion still impose difficulties to one that is not a specialist either on phylogenetic methods or on a particular group of study. RESULTS: Physcraper is a command-line Python program that automates the update of published phylogenies by adding public DNA sequences to underlying alignments of previously published phylogenies. It also provides a framework for straightforward comparison of published phylogenies with their updated versions, by leveraging upon tools from the Open Tree of Life project to link taxonomic information across databases. The program can be used by the nonspecialist, as a tool to generate phylogenetic hypotheses based on publicly available expert phylogenetic knowledge. Phylogeneticists and taxonomic group specialists will find it useful as a tool to facilitate molecular dataset gathering and comparison of alternative phylogenetic hypotheses (topologies). CONCLUSION: The Physcraper workflow showcases the benefits of doing open science for phylogenetics, encouraging researchers to strive for better scientific sharing practices. Physcraper can be used with any OS and is released under an open-source license. Detailed instructions for installation and usage are available at https://physcraper.readthedocs.io.
- Keywords
- DNA alignment, Gene phylogeny, Gene tree, Interoperability, Multilocus, Open Tree of Life, Open science, Otol, Public database, Reproducibility,
- MeSH
- Phylogeny * MeSH
- Publication type
- Journal Article MeSH
We present the Python-based Molecule Builder for ESPResSo (pyMBE), an open source software application to design custom coarse-grained (CG) models, as well as pre-defined models of polyelectrolytes, peptides, and globular proteins in the Extensible Simulation Package for Research on Soft Matter (ESPResSo). The Python interface of ESPResSo offers a flexible framework, capable of building custom CG models from scratch. As a downside, building CG models from scratch is prone to mistakes, especially for newcomers in the field of CG modeling, or for molecules with complex architectures. The pyMBE module builds CG models in ESPResSo using a hierarchical bottom-up approach, providing a robust tool to automate the setup of CG models and helping new users prevent common mistakes. ESPResSo features the constant pH (cpH) and grand-reaction (G-RxMC) methods, which have been designed to study chemical reaction equilibria in macromolecular systems with many reactive species. However, setting up these methods for systems, which contain several types of reactive groups, is an error-prone task, especially for beginners. The pyMBE module enables the automatic setup of cpH and G-RxMC simulations in ESPResSo, lowering the barrier for newcomers and opening the door to investigate complex systems not studied with these methods yet. To demonstrate some of the applications of pyMBE, we showcase several case studies where we successfully reproduce previously published simulations of charge-regulating peptides and globular proteins in bulk solution and weak polyelectrolytes in dialysis. The pyMBE module is publicly available as a GitHub repository (https://github.com/pyMBE-dev/pyMBE), which includes its source code and various sample and test scripts, including the ones that we used to generate the data presented in this article.
- Publication type
- Journal Article MeSH
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
- MeSH
- Algorithms * MeSH
- Models, Biological MeSH
- History, 20th Century MeSH
- History, 21st Century MeSH
- Linear Models MeSH
- Nonlinear Dynamics MeSH
- Computer Simulation MeSH
- Signal Processing, Computer-Assisted MeSH
- Programming Languages * MeSH
- Software * MeSH
- Computational Biology history methods MeSH
- Check Tag
- History, 20th Century MeSH
- History, 21st Century MeSH
- Publication type
- Journal Article MeSH
- Historical Article MeSH
- Review MeSH
Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.
Computational models of gene regulations help to understand regulatory mechanisms and are extensively used in a wide range of areas, e.g., biotechnology or medicine, with significant benefits. Unfortunately, there are only a few computational gene regulatory models of whole genomes allowing static and dynamic analysis due to the lack of sophisticated tools for their reconstruction. Here, we describe Augusta, an open-source Python package for Gene Regulatory Network (GRN) and Boolean Network (BN) inference from the high-throughput gene expression data. Augusta can reconstruct genome-wide models suitable for static and dynamic analyses. Augusta uses a unique approach where the first estimation of a GRN inferred from expression data is further refined by predicting transcription factor binding motifs in promoters of regulated genes and by incorporating verified interactions obtained from databases. Moreover, a refined GRN is transformed into a draft BN by searching in the curated model database and setting logical rules to incoming edges of target genes, which can be further manually edited as the model is provided in the SBML file format. The approach is applicable even if information about the organism under study is not available in the databases, which is typically the case for non-model organisms including most microbes. Augusta can be operated from the command line and, thus, is easy to use for automated prediction of models for various genomes. The Augusta package is freely available at github.com/JanaMus/Augusta. Documentation and tutorials are available at augusta.readthedocs.io.
- Keywords
- Databases, Gene interactions, Mutual information, Python package, Transcription factor binding motifs,
- Publication type
- Journal Article MeSH
Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles.Scientific contributionThis pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries.
- Keywords
- Library cleaning, Mass spectrometry, Metabolomics, Metadata, Python Package,
- Publication type
- Journal Article MeSH
SUMMARY: Protein design requires information about how mutations affect protein stability. Many web-based predictors are available for this purpose, yet comparing them or using them en masse is difficult. Here, we present BenchStab, a console tool/Python package for easy and quick execution of 19 predictors and result collection on a list of mutants. Moreover, the tool is easily extensible with additional predictors. We created an independent dataset derived from the FireProtDB and evaluated 24 different prediction methods. AVAILABILITY AND IMPLEMENTATION: BenchStab is an open-source Python package available at https://github.com/loschmidt/BenchStab with a detailed README and example usage at https://loschmidt.chemi.muni.cz/benchstab. The BenchStab dataset is available on Zenodo: https://zenodo.org/records/10637728.
- MeSH
- Databases, Protein MeSH
- Internet * MeSH
- Proteins chemistry MeSH
- Software * MeSH
- Protein Stability MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Proteins MeSH
MOTIVATION: Molecular dynamics simulation is very useful but computationally demanding method of studying dynamics of biomolecular systems. Many enhanced sampling methods were developed in order to obtain the desired results in available computational time. Metadynamics and its variants are common enhanced sampling methods used for this purpose. Metadynamics simulations allow the user to gather large amounts of data, which have to be analyzed to elucidate the properties of the studied system. RESULTS: Here, we present metadynminer.py, a Python package that allows easy and user-friendly analysis and visualization of the results obtained from metadynamics simulations. The built-in functions automate frequent tasks and make the package easy to use for new users, while its many customization options and object-oriented nature allow for integration into specialized data analysis workflows by more advanced users. AVAILABILITY AND IMPLEMENTATION: The "metadynminer.py" Python package is available under the GPL-3.0 license via PyPi and Conda. The development version is available on GitHub along with issue support (https://github.com/Jan8be/metadynminer.py). Documentation, tutorial and Jupyter Notebook (provided through the public mybinder.org service) are available at https://metadynreporter.cz.
- MeSH
- Molecular Dynamics Simulation * MeSH
- Software * MeSH
- Publication type
- Journal Article MeSH
We describe recent improvements of our method named powder nanobeam diffraction in four-dimensional scanning transmission electron microscopy (4D-STEM/PNBD). The method can change an arbitrary SEM equipped with a 2D-array STEM detector to a user-friendly powder electron diffractometer. It reduces a 4D-STEM dataset to a single 2D powder electron diffraction pattern (using our Python package named STEMDIFF; https://pypi.org/project/stemdiff) and then to 1D radially averaged diffraction profile (using our Python package named EDIFF; https://pypi.org/project/ediff). Moreover, the EDIFF package can compare the final diffractogram with theoretically calculated X-ray diffraction patterns. Both STEMDIFF and EDIFF can be used in the form of simple interactive templates in Jupyter environment, which makes them accessible to common SEM users. The recent improvements in STEMDIFF and EDIFF (better dataset filtering, parallelization, and more flexible user interface) enabled us to analyze not only strongly diffracting nanocrystals but also samples with higher absorption and/or lower diffraction power. The final results obtained from 4D-STEM/PNBD datasets of all six samples analyzed in this contribution (two types of Au nanocrystals, GdF3 and TbF3 aggregates, and Fe3O4 nanoclusters with/without organic shell) were shown to be comparable with the results of the classical TEM/SAED method (selected area electron diffraction in TEM).
- Keywords
- 4D-STEM-in-SEM, electron diffraction, nanobeam diffraction, nanocrystals, phase identification, powder diffraction,
- Publication type
- Journal Article MeSH