MOTIVATION: Transposable elements (TEs) in eukaryotes often get inserted into one another, forming sequences that become a complex mixture of full-length elements and their fragments. The reconstruction of full-length elements and the order in which they have been inserted is important for genome and transposon evolution studies. However, the accumulation of mutations and genome rearrangements over evolutionary time makes this process error-prone and decreases the efficiency of software aiming to recover all nested full-length TEs. RESULTS: We created software that uses a greedy recursive algorithm to mine increasingly fragmented copies of full-length LTR retrotransposons in assembled genomes and other sequence data. The software called TE-greedy-nester considers not only sequence similarity but also the structure of elements. This new tool was tested on a set of natural and synthetic sequences and its accuracy was compared to similar software. We found TE-greedy-nester to be superior in a number of parameters, namely computation time and full-length TE recovery in highly nested regions. AVAILABILITY AND IMPLEMENTATION: http://gitlab.fi.muni.cz/lexa/nested. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: ShinySOM offers a user-friendly interface for reproducible, high-throughput analysis of high-dimensional flow and mass cytometry data guided by self-organizing maps. The software implements a FlowSOM-style workflow, with improvements in performance, visualizations and data dissection possibilities. The outputs of the analysis include precise statistical information about the dissected samples, and R-compatible metadata useful for the batch processing of large sample volumes. AVAILABILITY AND IMPLEMENTATION: ShinySOM is free and open-source, available online at gitlab.com/exaexa/ShinySOM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy * MeSH
- metadata MeSH
- průběh práce MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
MOTIVATION: G-quadruplexes (G4) are important regulatory non-B DNA structures with therapeutic potential. A tool for rational design of mutations leading to decreased propensity for G4 formation should be useful in studying G4 functions. Although tools exist for G4 prediction, no easily accessible tool for the rational design of G4 mutations has been available. RESULTS: We developed a web-based tool termed G4Killer that is based on the G4Hunter algorithm. This new tool is a platform-independent and user-friendly application to design mutations crippling G4 propensity in a parsimonious way (i.e., keeping the primary sequence as close as possible to the original one). The tool is integrated into our DNA analyzer server and allows for generating mutated DNA sequences having the desired lowered G4Hunter score with minimal mutation steps. AVAILABILITY AND IMPLEMENTATION: The G4Killer web tool can be accessed at: http://bioinformatics.ibp.cz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy MeSH
- DNA MeSH
- G-kvadruplexy * MeSH
- mutace MeSH
- sekvenční analýza DNA MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
SUMMARY: Untargeted liquid chromatography-high-resolution mass spectrometry analysis produces a large number of features which correspond to the potential compounds in the sample that is analyzed. During the data processing, it is necessary to merge features associated with one compound to prevent multiplicities in the data and possible misidentification. The processing tools that are currently employed use complex algorithms to detect abundances, such as adducts or isotopes. However, most of them are not able to deal with unpredictable adducts and in-source fragments. We introduce a simple open-source R-script CROP based on Pearson pairwise correlations and retention time together with a graphical representation of the correlation network to remove these redundant features. AVAILABILITY AND IMPLEMENTATION: The CROP R-script is available online at www.github.com/rendju/CROP under GNU GPL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: G-quadruplex is a DNA or RNA form in which four guanine-rich regions are held together by base pairing between guanine nucleotides in coordination with potassium ions. G-quadruplexes are increasingly seen as a biologically important component of genomes. Their detection in vivo is problematic; however, sequencing and spectrometric techniques exist for their in vitro detection. We previously devised the pqsfinder algorithm for PQS identification, implemented it in C++ and published as an R/Bioconductor package. We looked for ways to optimize pqsfinder for faster and user-friendly sequence analysis. RESULTS: We identified two weak points where pqsfinder could be optimized. We modified the internals of the recursive algorithm to avoid matching and scoring many sub-optimal PQS conformations that are later discarded. To accommodate the needs of a broader range of users, we created a website for submission of sequence analysis jobs that does not require knowledge of R to use pqsfinder. AVAILABILITY AND IMPLEMENTATION: https://pqsfinder.fi.muni.cz, https://bioconductor.org/packages/pqsfinder. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy MeSH
- G-kvadruplexy * MeSH
- genom MeSH
- RNA MeSH
- software MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
SUMMARY: The New E-Resource for Drug Discovery (NERDD) is a quickly expanding web portal focused on the provision of peer-reviewed in silico tools for drug discovery. NERDD currently hosts tools for predicting the sites of metabolism (FAME) and metabolites (GLORY) of small organic molecules, for flagging compounds that are likely to interfere with biological assays (Hit Dexter), and for identifying natural products and natural product derivatives in large compound collections (NP-Scout). Several additional models and components are currently in development. AVAILABILITY AND IMPLEMENTATION: The NERDD web server is available at https://nerdd.zbh.uni-hamburg.de. Most tools are also available as software packages for local installation.
SUMMARY: Structures in PDB tend to contain errors. This is a very serious issue for authors that rely on such potentially problematic data. The community of structural biologists develops validation methods as countermeasures, which are also included in the PDB deposition system. But how are these validation efforts influencing the structure quality of subsequently published data? Which quality aspects are improving, and which remain problematic? We developed ValTrendsDB, a database that provides the results of an extensive exploratory analysis of relationships between quality criteria, size and metadata of biomacromolecules. Key input data are sourced from PDB. The discovered trends are presented via precomputed information-rich plots. ValTrendsDB also supports the visualization of a set of user-defined structures on top of general quality trends. Therefore, ValTrendsDB enables users to see the quality of structures published by selected author, laboratory or journal, discover quality outliers, etc. ValTrendsDB is updated weekly. AVAILABILITY AND IMPLEMENTATION: Freely accessible at http://ncbr.muni.cz/ValTrendsDB. The web interface was implemented in JavaScript. The database was implemented in C++. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Accurate genotyping of DNA from a single cell is required for applications such as de novo mutation detection, linkage analysis and lineage tracing. However, achieving high precision genotyping in the single-cell environment is challenging due to the errors caused by whole-genome amplification. Two factors make genotyping from single cells using single nucleotide polymorphism (SNP) arrays challenging. The lack of a comprehensive single-cell dataset with a reference genotype and the absence of genotyping tools specifically designed to detect noise from the whole-genome amplification step. Algorithms designed for bulk DNA genotyping cause significant data loss when used for single-cell applications. RESULTS: In this study, we have created a resource of 28.7 million SNPs, typed at high confidence from whole-genome amplified DNA from single cells using the Illumina SNP bead array technology. The resource is generated from 104 single cells from two cell lines that are available from the Coriell repository. We used mother-father-proband (trio) information from multiple technical replicates of bulk DNA to establish a high quality reference genotype for the two cell lines on the SNP array. This enabled us to develop SureTypeSC-a two-stage machine learning algorithm that filters a substantial part of the noise, thereby retaining the majority of the high quality SNPs. SureTypeSC also provides a simple statistical output to show the confidence of a particular single-cell genotype using Bayesian statistics. AVAILABILITY AND IMPLEMENTATION: The implementation of SureTypeSC in Python and sample data are available in the GitHub repository: https://github.com/puko818/SureTypeSC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Protein tunnels and channels are key transport pathways that allow ligands to pass between proteins' external and internal environments. These functionally important structural features warrant detailed attention. It is difficult to study the ligand binding and unbinding processes experimentally, while molecular dynamics simulations can be time-consuming and computationally demanding. RESULTS: CaverDock is a new software tool for analysing the ligand passage through the biomolecules. The method uses the optimized docking algorithm of AutoDock Vina for ligand placement docking and implements a parallel heuristic algorithm to search the space of possible trajectories. The duration of the simulations takes from minutes to a few hours. Here we describe the implementation of the method and demonstrate CaverDock's usability by: (i) comparison of the results with other available tools, (ii) determination of the robustness with large ensembles of ligands and (iii) the analysis and comparison of the ligand trajectories in engineered tunnels. Thorough testing confirms that CaverDock is applicable for the fast analysis of ligand binding and unbinding in fundamental enzymology and protein engineering. AVAILABILITY AND IMPLEMENTATION: User guide and binaries for Ubuntu are freely available for non-commercial use at https://loschmidt.chemi.muni.cz/caverdock/. The web implementation is available at https://loschmidt.chemi.muni.cz/caverweb/. The source code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Objective assessment of bioimage analysis methods is an essential step towards understanding their robustness and parameter sensitivity, calling for the availability of heterogeneous bioimage datasets accompanied by their reference annotations. Because manual annotations are known to be arduous, highly subjective and barely reproducible, numerous simulators have emerged over past decades, generating synthetic bioimage datasets complemented with inherent reference annotations. However, the installation and configuration of these tools generally constitutes a barrier to their widespread use. RESULTS: We present a modern, modular web-interface, CytoPacq, to facilitate the generation of synthetic benchmark datasets relevant for multi-dimensional cell imaging. CytoPacq poses a user-friendly graphical interface with contextual tooltips and currently allows a comfortable access to various cell simulation systems of fluorescence microscopy, which have already been recognized and used by the scientific community, in a straightforward and self-contained form. AVAILABILITY AND IMPLEMENTATION: CytoPacq is a publicly available online service running at https://cbia.fi.muni.cz/simulator. More information about it as well as examples of generated bioimage datasets are available directly through the web-interface. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- počítačová simulace MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH