Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
BACKGROUND: Integration of multi-omics data can provide a more complex view of the biological system consisting of different interconnected molecular components, the crucial aspect for developing novel personalised therapeutic strategies for complex diseases. Various tools have been developed to integrate multi-omics data. However, an efficient multi-omics framework for regulatory network inference at the genome level that incorporates prior knowledge is still to emerge. RESULTS: We present IntOMICS, an efficient integrative framework based on Bayesian networks. IntOMICS systematically analyses gene expression, DNA methylation, copy number variation and biological prior knowledge to infer regulatory networks. IntOMICS complements the missing biological prior knowledge by so-called empirical biological knowledge, estimated from the available experimental data. Regulatory networks derived from IntOMICS provide deeper insights into the complex flow of genetic information on top of the increasing accuracy trend compared to a published algorithm designed exclusively for gene expression data. The ability to capture relevant crosstalks between multi-omics modalities is verified using known associations in microsatellite stable/instable colon cancer samples. Additionally, IntOMICS performance is compared with two algorithms for multi-omics regulatory network inference that can also incorporate prior knowledge in the inference framework. IntOMICS is also applied to detect potential predictive biomarkers in microsatellite stable stage III colon cancer samples. CONCLUSIONS: We provide IntOMICS, a framework for multi-omics data integration using a novel approach to biological knowledge discovery. IntOMICS is a powerful resource for exploratory systems biology and can provide valuable insights into the complex mechanisms of biological processes that have a vital role in personalised medicine.
- MeSH
- algoritmy MeSH
- Bayesova věta MeSH
- genové regulační sítě MeSH
- lidé MeSH
- nádory tračníku * MeSH
- systémová biologie metody MeSH
- variabilita počtu kopií segmentů DNA * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Boolean networks (BNs) provide an effective modelling formalism for various complex biochemical phenomena. Their long term behaviour is represented by attractors-subsets of the state space towards which the BN eventually converges. These are then typically linked to different biological phenotypes. Depending on various logical parameters, the structure and quality of attractors can undergo a significant change, known as a bifurcation. We present a methodology for analysing bifurcations in asynchronous parametrised Boolean networks. RESULTS: In this paper, we propose a computational framework employing advanced symbolic graph algorithms that enable the analysis of large networks with hundreds of Boolean variables. To visualise the results of this analysis, we developed a novel interactive presentation technique based on decision trees, allowing us to quickly uncover parameters crucial to the changes in the attractor landscape. As a whole, the methodology is implemented in our tool AEON. We evaluate the method's applicability on a complex human cell signalling network describing the activity of type-1 interferons and related molecules interacting with SARS-COV-2 virion. In particular, the analysis focuses on explaining the potential suppressive role of the recently proposed drug molecule GRL0617 on replication of the virus. CONCLUSIONS: The proposed method creates a working analogy to the concept of bifurcation analysis widely used in kinetic modelling to reveal the impact of parameters on the system's stability. The important feature of our tool is its unique capability to work fast with large-scale networks with a relatively large extent of unknown information. The results obtained in the case study are in agreement with the recent biological findings.
- MeSH
- algoritmy MeSH
- aniliny MeSH
- benzamidy MeSH
- COVID-19 * MeSH
- genové regulační sítě * MeSH
- lidé MeSH
- modely genetické MeSH
- naftaleny MeSH
- SARS-CoV-2 MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
- MeSH
- genom lidský * MeSH
- genomika MeSH
- lidé MeSH
- nejistota MeSH
- shluková analýza MeSH
- strukturální variace genomu * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The insertion sequence elements (IS elements) represent the smallest and the most abundant mobile elements in prokaryotic genomes. It has been shown that they play a significant role in genome organization and evolution. To better understand their function in the host genome, it is desirable to have an effective detection and annotation tool. This need becomes even more crucial when considering rapid-growing genomic and metagenomic data. The existing tools for IS elements detection and annotation are usually based on comparing sequence similarity with a database of known IS families. Thus, they have limited ability to discover distant and putative novel IS elements. RESULTS: In this paper, we present digIS, a software tool based on profile hidden Markov models assembled from catalytic domains of transposases. It shows a very good performance in detecting known IS elements when tested on datasets with manually curated annotation. The main contribution of digIS is in its ability to detect distant and putative novel IS elements while maintaining a moderate level of false positives. In this category it outperforms existing tools, especially when tested on large datasets of archaeal and bacterial genomes. CONCLUSION: We provide digIS, a software tool using a novel approach based on manually curated profile hidden Markov models, which is able to detect distant and putative novel IS elements. Although digIS can find known IS elements as well, we expect it to be used primarily by scientists interested in finding novel IS elements. The tool is available at https://github.com/janka2012/digIS.
- MeSH
- genom bakteriální genetika MeSH
- genomika MeSH
- lidé MeSH
- prokaryotické buňky * MeSH
- software MeSH
- transpozibilní elementy DNA * genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Telomeres, nucleoprotein structures comprising short tandem repeats and delimiting the ends of linear eukaryotic chromosomes, play an important role in the maintenance of genome stability. Therefore, the determination of the length of telomeres is of high importance for many studies. Over the last years, new methods for the analysis of the length of telomeres have been developed, including those based on PCR or analysis of NGS data. Despite that, terminal restriction fragment (TRF) method remains the gold standard to this day. However, this method lacks universally accepted and precise tool capable to analyse and statistically evaluate TRF results. RESULTS: To standardize the processing of TRF results, we have developed WALTER, an online toolset allowing rapid, reproducible, and user-friendly analysis including statistical evaluation of the data. Given its web-based nature, it provides an easily accessible way to analyse TRF data without any need to install additional software. CONCLUSIONS: WALTER represents a major upgrade from currently available tools for the image processing of TRF scans. This toolset enables a rapid, highly reproducible, and user-friendly evaluation of almost any TRF scan including in-house statistical evaluation of the data. WALTER platform together with user manual describing the evaluation of TRF scans in detail and presenting tips and troubleshooting, as well as test data to demo the software are available at https://www.ceitec.eu/chromatin-molecular-complexes-jiri-fajkus/rg51/tab?tabId=125#WALTER and the source code at https://github.com/mlyc93/WALTER .
- MeSH
- software * MeSH
- telomery * genetika MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: In biomedicine, infrared thermography is the most promising technique among other conventional methods for revealing the differences in skin temperature, resulting from the irregular temperature dispersion, which is the significant signaling of diseases and disorders in human body. Given the process of detecting emitted thermal radiation of human body temperature by infrared imaging, we, in this study, present the current utility of thermal camera models namely FLIR and SEEK in biomedical applications as an extension of our previous article. RESULTS: The most significant result is the differences between image qualities of the thermograms captured by thermal camera models. In other words, the image quality of the thermal images in FLIR One is higher than SEEK Compact PRO. However, the thermal images of FLIR One are noisier than SEEK Compact PRO since the thermal resolution of FLIR One is 160 × 120 while it is 320 × 240 in SEEK Compact PRO. CONCLUSION: Detecting and revealing the inhomogeneous temperature distribution on the injured toe of the subject, we, in this paper, analyzed the imaging results of two different smartphone-based thermal camera models by making comparison among various thermograms. Utilizing the feasibility of the proposed method for faster and comparative diagnosis in biomedical problems is the main contribution of this study.
- MeSH
- chytrý telefon MeSH
- infračervené záření * MeSH
- lidé MeSH
- noha (od hlezna dolů) fyziologie MeSH
- tělesná teplota MeSH
- termografie přístrojové vybavení metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Because of its non-destructive nature, label-free imaging is an important strategy for studying biological processes. However, routine microscopic techniques like phase contrast or DIC suffer from shadow-cast artifacts making automatic segmentation challenging. The aim of this study was to compare the segmentation efficacy of published steps of segmentation work-flow (image reconstruction, foreground segmentation, cell detection (seed-point extraction) and cell (instance) segmentation) on a dataset of the same cells from multiple contrast microscopic modalities. RESULTS: We built a collection of routines aimed at image segmentation of viable adherent cells grown on the culture dish acquired by phase contrast, differential interference contrast, Hoffman modulation contrast and quantitative phase imaging, and we performed a comprehensive comparison of available segmentation methods applicable for label-free data. We demonstrated that it is crucial to perform the image reconstruction step, enabling the use of segmentation methods originally not applicable on label-free images. Further we compared foreground segmentation methods (thresholding, feature-extraction, level-set, graph-cut, learning-based), seed-point extraction methods (Laplacian of Gaussians, radial symmetry and distance transform, iterative radial voting, maximally stable extremal region and learning-based) and single cell segmentation methods. We validated suitable set of methods for each microscopy modality and published them online. CONCLUSIONS: We demonstrate that image reconstruction step allows the use of segmentation methods not originally intended for label-free imaging. In addition to the comprehensive comparison of methods, raw and reconstructed annotated data and Matlab codes are provided.
- MeSH
- algoritmy MeSH
- frakcionace buněk metody MeSH
- lidé MeSH
- mikroskopie metody MeSH
- počítačové zpracování obrazu MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
BACKGROUND: Although the etiology of chronic lymphocytic leukemia (CLL), the most common type of adult leukemia, is still unclear, strong evidence implicates antigen involvement in disease ontogeny and evolution. Primary and 3D structure analysis has been utilised in order to discover indications of antigenic pressure. The latter has been mostly based on the 3D models of the clonotypic B cell receptor immunoglobulin (BcR IG) amino acid sequences. Therefore, their accuracy is directly dependent on the quality of the model construction algorithms and the specific methods used to compare the ensuing models. Thus far, reliable and robust methods that can group the IG 3D models based on their structural characteristics are missing. RESULTS: Here we propose a novel method for clustering a set of proteins based on their 3D structure focusing on 3D structures of BcR IG from a large series of patients with CLL. The method combines techniques from the areas of bioinformatics, 3D object recognition and machine learning. The clustering procedure is based on the extraction of 3D descriptors, encoding various properties of the local and global geometrical structure of the proteins. The descriptors are extracted from aligned pairs of proteins. A combination of individual 3D descriptors is also used as an additional method. The comparison of the automatically generated clusters to manual annotation by experts shows an increased accuracy when using the 3D descriptors compared to plain bioinformatics-based comparison. The accuracy is increased even more when using the combination of 3D descriptors. CONCLUSIONS: The experimental results verify that the use of 3D descriptors commonly used for 3D object recognition can be effectively applied to distinguishing structural differences of proteins. The proposed approach can be applied to provide hints for the existence of structural groups in a large set of unannotated BcR IG protein files in both CLL and, by logical extension, other contexts where it is relevant to characterize BcR IG structural similarity. The method does not present any limitations in application and can be extended to other types of proteins.
BACKGROUND: High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. RESULTS: Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. CONCLUSIONS: ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totem.software .