Python Package
Dotaz
Zobrazit nápovědu
The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered "GWAS to Genes" strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.
- MeSH
- celogenomová asociační studie * metody MeSH
- fenotyp MeSH
- genom MeSH
- genomika * metody MeSH
- mutace MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
... Syntaktické diagramy 467 -- C Konvence pro psaní programů v Pythonu 469 -- Literatura 472 -- Rejstřík ... ... Objekt, třída, instance 38 -- Objekt 38 -- Třída 38 -- Instance 38 -- 1.4 Nejdůležitější zvláštnosti Pythonu ... ... 102 -- Návrh podle kontraktu 103 -- 6.7 Složené příkazy a odsazování 105 -- Výhody a nevýhodjeloce Pythonu ... ... ) 355 super (/type/, obj ect-or-type//) 356 -- 23.10 Shrnutí 356 -- 24 Násobné dědění 357 -- 24.1 Python ... ... 412 -- Slabé odkazy (weak references) 413 -- 27.9 Shrnutí 414 -- 28 Balíčky 415 -- 28.1 Balíčky (packages ...
1. elektronické vydání 1 online zdroj (480 stran)
Kniha je věnována aktuální verzi 3.9 jazyka Python. Využijí ji ti, kteří potřebují občas napsat jednoduchý program, je také vhodná jako doprovodná učebnice ke kurzům programování i jako referenční příručka.; Python vznikl jako jazyk, který měl laikům usnadnit vstup do světa programování a umožnit jim s co nejmenším úsilím vytvářet jednoduché programy.
... Co najdete v „Ponořme se do Pythonu 3“ nového —17 -- 0. Instalujeme Python — 21 -- 1. ... ... Instalujeme Python — 21 -- 0.1. Ponořme se — 23 -- 0.2. Který Python je pro vás ten správný? ... ... Použití Python Shell — 41 -- 0.8. Editory a vývojová prostředí pro Python — 43 -- 1. ... ... Krátká odbočka к vícesouborovým do Python Package Index — 373 modulům — 338 16.11. ... ... Přidání našeho softwaru do Python Package Index — 373 -- 16.11. ...
1. elektronické vydání 1 online zdroj (434 stran)
Mark Pilgrim se nesmazatelně zapsal do povědomí pythonovské komunity už svojí knihou "Dive Into Python", ve které originálním a nezapomenutelným způsobem přiblížil čtenářům osobitý styl programování v tomto jazyce, aby se o několik let později připomenul ještě výrazněji s knihou "Dive Into Python 3", která je stejně originálním a zábavným způsobem věnována jeho nejnovější verzi.; Jedná se o překlad anglické příručky šířené pod volnou autorskou licencí. Věnuje se programování v jazyce Python 3 od jeho instalace až po tvorbu vlastních programů. Předpokládají se programátorské zkušenosti.
Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.
Building reliable and robust quantitative structure-property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of their models and facilitate their transferability into practice. In this work, we introduce QSPRpred, a toolkit for analysis of bioactivity data sets and QSPR modelling, which attempts to address the aforementioned challenges. QSPRpred's modular Python API enables users to intuitively describe different parts of a modelling workflow using a plethora of pre-implemented components, but also integrates customized implementations in a "plug-and-play" manner. QSPRpred data sets and models are directly serializable, which means they can be readily reproduced and put into operation after training as the models are saved with all required data pre-processing steps to make predictions on new compounds directly from SMILES strings. The general-purpose character of QSPRpred is also demonstrated by inclusion of support for multi-task and proteochemometric modelling. The package is extensively documented and comes with a large collection of tutorials to help new users. In this paper, we describe all of QSPRpred's functionalities and also conduct a small benchmarking case study to illustrate how different components can be leveraged to compare a diverse set of models. QSPRpred is fully open-source and available at https://github.com/CDDLeiden/QSPRpred .Scientific ContributionQSPRpred aims to provide a complex, but comprehensive Python API to conduct all tasks encountered in QSPR modelling from data preparation and analysis to model creation and model deployment. In contrast to similar packages, QSPRpred offers a wider and more exhaustive range of capabilities and integrations with many popular packages that also go beyond QSPR modelling. A significant contribution of QSPRpred is also in its automated and highly standardized serialization scheme, which significantly improves reproducibility and transferability of models.
- Publikační typ
- časopisecké články MeSH
Cellular force generation and force transmission are of fundamental importance for numerous biological processes and can be studied with the methods of Traction Force Microscopy (TFM) and Monolayer Stress Microscopy. Traction Force Microscopy and Monolayer Stress Microscopy solve the inverse problem of reconstructing cell-matrix tractions and inter- and intra-cellular stresses from the measured cell force-induced deformations of an adhesive substrate with known elasticity. Although several laboratories have developed software for Traction Force Microscopy and Monolayer Stress Microscopy computations, there is currently no software package available that allows non-expert users to perform a full evaluation of such experiments. Here we present pyTFM, a tool to perform Traction Force Microscopy and Monolayer Stress Microscopy on cell patches and cell layers grown in a 2-dimensional environment. pyTFM was optimized for ease-of-use; it is open-source and well documented (hosted at https://pytfm.readthedocs.io/) including usage examples and explanations of the theoretical background. pyTFM can be used as a standalone Python package or as an add-on to the image annotation tool ClickPoints. In combination with the ClickPoints environment, pyTFM allows the user to set all necessary analysis parameters, select regions of interest, examine the input data and intermediary results, and calculate a wide range of parameters describing forces, stresses, and their distribution. In this work, we also thoroughly analyze the accuracy and performance of the Traction Force Microscopy and Monolayer Stress Microscopy algorithms of pyTFM using synthetic and experimental data from epithelial cell patches.
- MeSH
- algoritmy MeSH
- fyzikální jevy MeSH
- mikroskopie metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks . CONCLUSIONS: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.
Image processing in cryogenic electron tomography (cryoET) is currently at a similar state as Single Particle Analysis (SPA) in cryogenic electron microscopy (cryoEM) was a few years ago. Its data processing workflows are far from being well defined and the user experience is still not smooth. Moreover, file formats of different software packages and their associated metadata are not standardized, mainly since different packages are developed by different groups, focusing on different steps of the data processing pipeline. The Scipion framework, originally developed for SPA (de la Rosa-Trevín et al., 2016), has a generic python workflow engine that gives it the versatility to be extended to other fields, as demonstrated for model building (Martínez et al., 2020). In this article, we provide an extension of Scipion based on a set of tomography plugins (referred to as ScipionTomo hereafter), with a similar purpose: to allow users to be focused on the data processing and analysis instead of having to deal with multiple software installation issues and the inconvenience of switching from one to another, converting metadata files, managing possible incompatibilities, scripting (writing a simple program in a language that the computer must convert to machine language each time the program is run), etcetera. Additionally, having all the software available in an integrated platform allows comparing the results of different algorithms trying to solve the same problem. In this way, the commonalities and differences between estimated parameters shed light on which results can be more trusted than others. ScipionTomo is developed by a collaborative multidisciplinary team composed of Scipion team engineers, structural biologists, and in some cases, the developers whose software packages have been integrated. It is open to anyone in the field willing to contribute to this project. The result is a framework extension that combines the acquired knowledge of Scipion developers in close collaboration with third-party developers, and the on-demand design of functionalities requested by beta testers applying this solution to actual biological problems.