Forward-directed genetic screens are extremely powerful in identifying novel genes involved in a specific biological process, including various chromatin regulatory pathways. However, the traditional ways of genetic mapping are time- and cost-demanding. Recently, the whole process was revolutionized by the development of mapping-by-sequencing (MBS) protocols. In MBS, the causal mutations and their positions within genes are identified directly by whole-genome sequencing and bioinformatics analysis of the bulk of mutant plants selected based on the mutant phenotype from a segregating population. MBS increases precision and economizes the mapping. Here, we describe a general protocol and provide practical tips on how to proceed with the mapping-by-sequencing on the example of Arabidopsis forward-directed genetic screen designed to identify mutants sensitive to a specific type of DNA damage. The described protocol is generally applicable to a wide range of genetic screens in various inbreeding species with a reference genome sequence.
- Klíčová slova
- DNA damage repair, DNA-protein crosslinks, Forward genetics, Genetic mapping, High-throughput sequencing, Mapping-by-sequencing, SNP calling, Zebularine,
- MeSH
- Arabidopsis * genetika MeSH
- fenotyp MeSH
- genom rostlinný MeSH
- mapování chromozomů * metody MeSH
- mutace MeSH
- sekvenování celého genomu metody MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování * metody MeSH
- Publikační typ
- časopisecké články MeSH
Plastids of diatoms and related algae with complex plastids of red algal origin are surrounded by four membranes, which also define the periplastidic compartment (PPC), the space between the second and third membranes. Metabolic reactions as well as cell biological processes take place in the PPC; however, genome-wide predictions of the proteins targeted to this compartment were so far based on manual annotation work. Using published experimental protein localizations as reference data, we developed the first automatic prediction method for PPC proteins, which we included as a new feature in an updated version of the plastid protein predictor ASAFind. With our method, at least a subset of the PPC proteins can be predicted with high specificity, with an estimate of at least 81 proteins (0.7% of the predicted proteome) targeted to the PPC in the model diatom Phaeodactylum tricornutum. The proportion of PPC proteins varies, since 180 PPC proteins (1.3% of the predicted proteome) were predicted in the genome of the diatom Thalassiosira pseudonana. The new ASAFind version can also generate a newly designed graphical output that visualizes the contribution of each position in the sequence to the score and accepts the output of the recent versions of SignalP (5.0) and TargetP (2.0) as input data. Furthermore, we release a script to calculate custom scoring matrices that can be used for predictions in a simplified score cut-off mode. This allows for adjustments of the method to other groups of algae.
- Klíčová slova
- chloroplast, diatoms, evolution, gene transfer, genome annotation, mitochondria, organelle, periplastidic compartment, protein transport, secretory pathway, technical advance,
- MeSH
- bílkoviny řas * metabolismus MeSH
- plastidy * metabolismus MeSH
- proteom MeSH
- Rhodophyta metabolismus MeSH
- rozsivky * metabolismus genetika MeSH
- software * MeSH
- výpočetní biologie * metody MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- bílkoviny řas * MeSH
- proteom MeSH
BACKGROUND: Current experimental data on RNA interactions remain limited, particularly for non-coding RNAs, many of which have only recently been discovered and operate within complex regulatory networks. Researchers often rely on in-silico interaction detection algorithms, such as TargetScan, which are based on biochemical sequence alignment. However, these algorithms have limited performance. RNA-seq expression data can provide valuable insights into regulatory networks, especially for understudied interactions such as circRNA-miRNA-mRNA. By integrating RNA-seq data with prior interaction networks obtained experimentally or through in-silico predictions, researchers can discover novel interactions, validate existing ones, and improve interaction prediction accuracy. RESULTS: This paper introduces Pi-GMIFS, an extension of the generalized monotone incremental forward stagewise (GMIFS) regression algorithm that incorporates prior knowledge. The algorithm first estimates prior response values through a prior-only regression, interpolates between these prior values and the original data, and then applies the GMIFS method. Our experimental results on circRNA-miRNA-mRNA regulatory interaction networks demonstrate that Pi-GMIFS consistently enhances precision and recall in RNA interaction prediction by leveraging implicit information from bulk RNA-seq expression data, outperforming the initial prior knowledge. CONCLUSION: Pi-GMIFS is a robust algorithm for inferring acyclic interaction networks when the variable ordering is known. Its effectiveness was confirmed through extensive experimental validation. We proved that RNA-seq data of a representative size help infer previously unknown interactions available in TarBase v9 and improve the quality of circRNA disease annotation.
- Klíčová slova
- Bayesian network, Circular RNA, Functional annotation, Penalized regression, Structure inference,
- MeSH
- algoritmy MeSH
- genové regulační sítě MeSH
- kruhová RNA * genetika metabolismus MeSH
- lidé MeSH
- lineární modely MeSH
- messenger RNA * genetika metabolismus MeSH
- mikro RNA * genetika metabolismus MeSH
- sekvenční analýza RNA metody MeSH
- sekvenování transkriptomu * metody MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- kruhová RNA * MeSH
- messenger RNA * MeSH
- mikro RNA * MeSH
MOTIVATION: Boolean networks are popular dynamical models of cellular processes in systems biology. Their attractors model phenotypes that arise from the interplay of key regulatory subcircuits. A succession diagram (SD) describes this interplay in a discrete analog of Waddington's epigenetic attractor landscape that allows for fast identification of attractors and attractor control strategies. Efficient computational tools for studying SDs are essential for the understanding of Boolean attractor landscapes and connecting them to their biological functions. RESULTS: We present a new approach to SD construction for asynchronously updated Boolean networks, implemented in the biologist's Boolean attractor landscape mapper, biobalm. We compare biobalm to similar tools and find a substantial performance increase in SD construction, attractor identification, and attractor control. We perform the most comprehensive comparative analysis to date of the SD structure in experimentally-validated Boolean models of cell processes and random ensembles. We find that random models (including critical Kauffman networks) have relatively small SDs, indicating simple decision structures. In contrast, nonrandom models from the literature are enriched in extremely large SDs, indicating an abundance of decision points and suggesting the presence of complex Waddington landscapes in nature. AVAILABILITY AND IMPLEMENTATION: The tool biobalm is available online at https://github.com/jcrozum/biobalm. Further data, scripts for testing, analysis, and figure generation are available online at https://github.com/jcrozum/biobalm-analysis and in the reproducibility artefact at https://doi.org/10.5281/zenodo.13854760.
BACKGROUND AND OBJECTIVE: Metabolomic interaction networks provide critical insights into the dynamic relationships between metabolites and their regulatory mechanisms. This study introduces MInfer, a novel computational framework that integrates outputs from MetaboAnalyst, a widely used metabolomic analysis tool, with Jacobian analysis to enhance the derivation and interpretation of these networks. METHODS: MInfer combines the comprehensive data processing capabilities of MetaboAnalyst with the mathematical modeling power of Jacobian analysis. This framework was applied to various metabolomic datasets, employing advanced statistical tests to construct interaction networks and identify key metabolic pathways. RESULTS: The application of MInfer revealed significant metabolic pathways and potential regulatory mechanisms across multiple datasets. The framework demonstrated high precision, sensitivity, and specificity in identifying interactions, enabling robust network interpretations. CONCLUSIONS: MInfer enhances the interpretation of metabolomic data by providing detailed interaction networks and uncovering key regulatory insights. This tool holds significant potential for advancing the study of complex biological systems.
- Klíčová slova
- Dynamic relationships, Metabolite interactions, Metabolomic data analysis, Systems biology,
- MeSH
- algoritmy MeSH
- lidé MeSH
- metabolické sítě a dráhy * MeSH
- metabolomika * metody MeSH
- software * MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
SUMMARY: The Sequencing Read Archive is one of the largest and fastest-growing repositories of sequencing data, containing tens of petabytes of sequenced reads. Its data is used by a wide scientific community, often beyond the primary study that generated them. Such analyses rely on accurate metadata concerning the type of experiment and library, as well as the organism from which the sequenced reads were derived. These metadata are typically entered manually by contributors in an error-prone process, and are frequently incomplete. In addition, easy-to-use computational tools that verify the consistency and completeness of metadata describing the libraries to facilitate data reuse, are largely unavailable. Here, we introduce HTSinfer, a Python-based tool to infer metadata directly and solely from bulk RNA-sequencing data generated on Illumina platforms. HTSinfer leverages genome sequence information and diagnostic genes to rapidly and accurately infer the library source and library type, as well as the relative read orientation, 3' adapter sequence and read length statistics. HTSinfer is written in a modular manner, published under a permissible free and open-source license and encourages contributions by the community, enabling easy addition of new functionalities, e.g. for the inference of additional metrics, or the support of different experiment types or sequencing platforms. AVAILABILITY AND IMPLEMENTATION: HTSinfer is released under the Apache License 2.0. Latest code is available via GitHub at https://github.com/zavolanlab/htsinfer, while releases are published on Bioconda. A snapshot of the HTSinfer version described in this article was deposited at Zenodo at 10.5281/zenodo.13985958.
Rare diseases may affect the quality of life of patients and be life-threatening. Therapeutic opportunities are often limited, in part because of the lack of understanding of the molecular mechanisms underlying these diseases. This can be ascribed to the low prevalence of rare diseases and therefore the lower sample sizes available for research. A way to overcome this is to integrate experimental rare disease data with prior knowledge using network-based methods. Taking this one step further, we hypothesized that combining and analyzing the results from multiple network-based methods could provide data-driven hypotheses of pathogenic mechanisms from multiple perspectives.We analyzed a Huntington's disease transcriptomics dataset using six network-based methods in a collaborative way. These methods either inherently reported enriched annotation terms or their results were fed into enrichment analyses. The resulting significantly enriched Reactome pathways were then summarized using the ontological hierarchy which allowed the integration and interpretation of outputs from multiple methods. Among the resulting enriched pathways, there are pathways that have been shown previously to be involved in Huntington's disease and pathways whose direct contribution to disease pathogenesis remains unclear and requires further investigation.In summary, our study shows that collaborative network analysis approaches are well-suited to study rare diseases, as they provide hypotheses for pathogenic mechanisms from multiple perspectives. Applying different methods to the same case study can uncover different disease mechanisms that would not be apparent with the application of a single method.
- Klíčová slova
- Collaborative analysis, Huntington’s disease, Network analysis, Rare disease,
- MeSH
- genové regulační sítě MeSH
- Huntingtonova nemoc * genetika MeSH
- lidé MeSH
- stanovení celkové genové exprese * metody MeSH
- transkriptom * MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
MOTIVATION: Structure-based methods for detecting protein-ligand binding sites play a crucial role in various domains, from fundamental research to biomedical applications. However, current prediction methodologies often rely on holo (ligand-bound) protein conformations for training and evaluation, overlooking the significance of the apo (ligand-free) states. This oversight is particularly problematic in the case of cryptic binding sites (CBSs) where holo-based assessment yields unrealistic performance expectations. RESULTS: To advance the development in this domain, we introduce CryptoBench, a benchmark dataset tailored for training and evaluating novel CBS prediction methodologies. CryptoBench is constructed upon a large collection of apo-holo protein pairs, grouped by UniProtID, clustered by sequence identity, and filtered to contain only structures with substantial structural change in the binding site. CryptoBench comprises 1107 structures with predefined cross-validation splits, making it the most extensive CBS dataset to date. To establish a performance baseline, we measured the predictive power of sequence- and structure-based CBS residue prediction methods using the benchmark. We selected PocketMiner as the state-of-the-art representative of the structure-based methods for CBS detection, and P2Rank, a widely-used structure-based method for general binding site prediction that is not specifically tailored for cryptic sites. For sequence-based approaches, we trained a neural network to classify binding residues using protein language model embeddings. Our sequence-based approach outperformed PocketMiner and P2Rank across key metrics, including area under the curve, area under the precision-recall curve, Matthew's correlation coefficient, and F1 scores. These results provide baseline benchmark results for future CBS and potentially also non-CBS prediction endeavors, leveraging CryptoBench as the foundational platform for further advancements in the field. AVAILABILITY AND IMPLEMENTATION: The CryptoBench dataset, including the benchmark model, is available on Open Science Framework-https://osf.io/pz4a9/. The code and tutorial are available at the GitHub repository-https://github.com/skrhakv/CryptoBench/.
- MeSH
- benchmarking MeSH
- databáze proteinů MeSH
- konformace proteinů MeSH
- ligandy MeSH
- proteiny * chemie metabolismus MeSH
- software * MeSH
- vazba proteinů MeSH
- vazebná místa MeSH
- výpočetní biologie * metody MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- ligandy MeSH
- proteiny * MeSH
Plant specialized metabolites have diversified vastly over the course of plant evolution, and they are considered key players in complex interactions between plants and their environment. The chemical diversity of these metabolites has been widely explored and utilized in agriculture and crop enhancement, the food industry, and drug development, among other areas. However, the immensity of the plant metabolome can make its exploration challenging. Here we describe a protocol for exploring plant specialized metabolites that combines high-resolution mass spectrometry and computational metabolomics strategies, including molecular networking, identification of structural motifs, as well as prediction of chemical structures and metabolite classes.
- Klíčová slova
- GNPS, MS2LDA, MS2Query, MZmine, Molecular networking, Plant metabolomics, SIRIUS, Specialized metabolites,
- MeSH
- hmotnostní spektrometrie * metody MeSH
- metabolom * MeSH
- metabolomika * metody MeSH
- rostliny * metabolismus MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Neurotropic pathogens, notably, herpesviruses, have been associated with significant neuropsychiatric effects. As a group, these pathogens can exploit molecular mimicry mechanisms to manipulate the host central nervous system to their advantage. Here, we present a systematic computational approach that may ultimately be used to unravel protein-protein interactions and molecular mimicry processes that have not yet been solved experimentally. Toward this end, we validate this approach by replicating a set of pre-existing experimental findings that document the structural and functional similarities shared by the human cytomegalovirus-encoded UL144 glycoprotein and human tumor necrosis factor receptor superfamily member 14 (TNFRSF14). We began with a thorough exploration of the Homo sapiens protein database using the Basic Local Alignment Search Tool (BLASTx) to identify proteins sharing sequence homology with UL144. Subsequently, we used AlphaFold2 to predict the independent three-dimensional structures of UL144 and TNFRSF14. This was followed by a comprehensive structural comparison facilitated by Distance-Matrix Alignment and Foldseek. Finally, we used AlphaFold-multimer and PPIscreenML to elucidate potential protein complexes and confirm the predicted binding activities of both UL144 and TNFRSF14. We then used our in silico approach to replicate the experimental finding that revealed TNFRSF14 binding to both B- and T-lymphocyte attenuator (BTLA) and glycoprotein domain and UL144 binding to BTLA alone. This computational framework offers promise in identifying structural similarities and interactions between pathogen-encoded proteins and their host counterparts. This information will provide valuable insights into the cognitive mechanisms underlying the neuropsychiatric effects of viral infections.
- Klíčová slova
- Bioinformatics, Cognition, Mitochondria, Psychiatry, Virus,
- MeSH
- kognice fyziologie MeSH
- lidé MeSH
- molekulární mimikry * MeSH
- molekulární modely MeSH
- sekvence aminokyselin MeSH
- vazba proteinů MeSH
- virové proteiny metabolismus chemie MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- virové proteiny MeSH