Lipidomics and metabolomics communities comprise various informatics tools; however, software programs handling multimodal mass spectrometry (MS) data with structural annotations guided by the Lipidomics Standards Initiative are limited. Here, we provide MS-DIAL 5 for in-depth lipidome structural elucidation through electron-activated dissociation (EAD)-based tandem MS and determining their molecular localization through MS imaging (MSI) data using a species/tissue-specific lipidome database containing the predicted collision-cross section values. With the optimized EAD settings using 14 eV kinetic energy, the program correctly delineated lipid structures for 96.4% of authentic standards, among which 78.0% had the sn-, OH-, and/or C = C positions correctly assigned at concentrations exceeding 1 μM. We showcased our workflow by annotating the sn- and double-bond positions of eye-specific phosphatidylcholines containing very-long-chain polyunsaturated fatty acids (VLC-PUFAs), characterized as PC n-3-VLC-PUFA/FA. Using MSI data from the eye and n-3-VLC-PUFA-supplemented HeLa cells, we identified glycerol 3-phosphate acyltransferase as an enzyme candidate responsible for incorporating n-3 VLC-PUFAs into the sn1 position of phospholipids in mammalian cells, which was confirmed using EAD-MS/MS and recombinant proteins in a cell-free system. Therefore, the MS-DIAL 5 environment, combined with optimized MS data acquisition methods, facilitates a better understanding of lipid structures and their localization, offering insights into lipid biology.
- MeSH
- data mining * metody MeSH
- fosfatidylcholiny metabolismus chemie MeSH
- HeLa buňky MeSH
- hmotnostní spektrometrie metody MeSH
- lidé MeSH
- lipidomika * metody MeSH
- lipidy chemie analýza MeSH
- metabolomika metody MeSH
- nenasycené mastné kyseliny metabolismus chemie MeSH
- software MeSH
- tandemová hmotnostní spektrometrie metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.
- MeSH
- alely * MeSH
- Arabidopsis * genetika MeSH
- data mining * metody MeSH
- datové soubory jako téma * MeSH
- frekvence genu MeSH
- genotyp MeSH
- Glycine max * genetika MeSH
- internet * MeSH
- kukuřice setá * genetika MeSH
- metadata MeSH
- mutace MeSH
- pigmentace genetika MeSH
- rostlinné geny genetika MeSH
- software * MeSH
- substituce aminokyselin MeSH
- vegetační klid genetika MeSH
- vizualizace dat MeSH
- Publikační typ
- časopisecké články MeSH
Intensive care unit (ICU) is a very special unit of a hospital, where healthcare professionals provide treatment and, later, close follow-up to the patients. It is crucial to estimate mortality in ICU patients from many viewpoints. The purpose of this study is to classify the status of patients with acute kidney injury (AKI) in ICU as early mortality, late mortality, and survival by the application of Classification and Regression Trees (CART) algorithm to the patients' attributes such as blood urea nitrogen, creatinine, serum and urine neutrophil gelatinase-associated lipocalin (NGAL), alkaline phosphatase, lactate dehydrogenase (LDH), gamma-glutamyl transferase, laboratory electrolytes, blood gas, mean arterial pressure, central venous pressure and demographic details of patients. This study was conducted 50 patients with AKI who were followed up in the ICU. The study also aims to determine the significance of relationship between the attributes used in the prediction of mortality in CART and patients' status by employing the Kruskal-Wallis H test. The classification accuracy, sensitivity, and specificity of CART for the tested attributes for the prediction of early mortality, late mortality, and survival of patients were 90.00%, 83.33%, and 91.67%, respectively. The values of both urine NGAL and LDH on day 7 showed a considerable difference according to the patients' status after being examined by the Kruskal-Wallis H test.
- MeSH
- akutní poškození ledvin * mortalita MeSH
- algoritmy MeSH
- biologické markery analýza MeSH
- data mining metody MeSH
- dospělí MeSH
- klasifikace MeSH
- laktátdehydrogenasy analýza MeSH
- lidé MeSH
- lipokalin-2 analýza MeSH
- metody pro podporu rozhodování MeSH
- mortalita v nemocnicích * MeSH
- prognóza MeSH
- rozhodovací podpůrné systémy pro řízení MeSH
- rozhodovací stromy MeSH
- statistika jako téma MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency.Database URL: www.prot2hg.com.
- MeSH
- anotace sekvence metody MeSH
- data mining metody MeSH
- databáze genetické * MeSH
- datové kurátorství metody MeSH
- genetická variace * MeSH
- genom lidský genetika MeSH
- genomika metody MeSH
- internet MeSH
- lidé MeSH
- proteinové domény genetika MeSH
- proteiny chemie genetika metabolismus MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Siderophores represent important microbial virulence factors and infection biomarkers. Their monitoring in fermentation broths, bodily fluids, and tissues should be reproducible. Similar isolation, characterization, and quantitation studies can often have conflicting results, and without proper documentation of sample collection, data processing, and analysis methods, it is difficult to reexamine the data and reconcile these differences. In this Springer Nature Protocol, we present the procedure optimized for ferricrocin/triacetylfusarinine C extraction from biological material as well as for tissue fixation and cryosectioning for optical microscopy and for both elemental and molecular mass spectrometry imaging. Special attention is paid to siderophore data mining from conventional and product ion mass spectra, liquid chromatography, and mass spectrometry imaging datasets, performed here by our free software called CycloBranch.
- MeSH
- Aspergillus fumigatus metabolismus MeSH
- biologické markery analýza MeSH
- chromatografie kapalinová metody MeSH
- data mining metody MeSH
- datové soubory jako téma MeSH
- ferrichrom analogy a deriváty izolace a purifikace metabolismus MeSH
- fixace tkání metody MeSH
- hmotnostní spektrometrie metody MeSH
- invazivní plicní aspergilóza diagnóza mikrobiologie MeSH
- kryoultramikrotomie metody MeSH
- krysa rodu rattus MeSH
- kyseliny hydroxamové izolace a purifikace metabolismus MeSH
- lidé MeSH
- modely nemocí na zvířatech MeSH
- siderofory izolace a purifikace metabolismus MeSH
- software MeSH
- železité sloučeniny izolace a purifikace metabolismus MeSH
- zvířata MeSH
- Check Tag
- krysa rodu rattus MeSH
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
With a rapidly-growing amount of biomedical information available only in textual form, there is considerable interest in applying NLP techniques to extract such information from the biomedical literature. Much of the research has paid special attention to extracting information about biomedical named entities. In this paper, we conducted a survey on biomedical named entity recognition and normalization, focusing on gene mention recognition and normalization. We believe this can help researchers to find work of their interest and interpret their own research.
The study focused on QSAR model interpretation. The goal was to develop a workflow for the identification of molecular fragments in different contexts important for the property modelled. Using a previously established approach - Structural and physicochemical interpretation of QSAR models (SPCI) - fragment contributions were calculated and their relative influence on the compounds' properties characterised. Analysis of the distributions of these contributions using Gaussian mixture modelling was performed to identify groups of compounds (clusters) comprising the same fragment, where these fragments had substantially different contributions to the property studied. SMARTSminer was used to detect patterns discriminating groups of compounds from each other and visual inspection if the former did not help. The approach was applied to analyse the toxicity, in terms of 40 hour inhibition of growth, of 1984 compounds to Tetrahymena pyriformis. The results showed that the clustering technique correctly identified known toxicophoric patterns: it detected groups of compounds where fragments have specific molecular context making them contribute substantially more to toxicity. The results show the applicability of the interpretation of QSAR models to retrieve reasonable patterns, even from data sets consisting of compounds having different mechanisms of action, something which is difficult to achieve using conventional pattern/data mining approaches.
- MeSH
- antiprotozoální látky chemie toxicita MeSH
- data mining metody MeSH
- kvantitativní vztahy mezi strukturou a aktivitou * MeSH
- racionální návrh léčiv * MeSH
- simulace molekulového dockingu metody MeSH
- software MeSH
- Tetrahymena účinky léků MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Thousands of eukaryotes transcriptomes have been generated, mainly to investigate nuclear genes expression, and the amount of available data is constantly increasing. A neglected but promising use of this large amount of data is to assemble organelle genomes. To assess the reliability of this approach, we attempted to reconstruct complete mitochondrial genomes from RNA-Seq experiments of Reticulitermes termite species, for which transcriptomes and conspecific mitogenomes are available. We successfully assembled complete molecules, although a few gaps corresponding to tRNAs had to be filled manually. We also reconstructed, for the first time, the mitogenome of Reticulitermes banyulensis. The accuracy and completeness of mitogenomes reconstruction appeared independent from transcriptome size, read length and sequencing design (single/paired end), and using reference genomes from congeneric or intra-familial taxa did not significantly affect the assembly. Transcriptome-derived mitogenomes were found highly similar to the conspecific ones obtained from genome sequencing (nucleotide divergence ranging from 0% to 3.5%) and yielded a congruent phylogenetic tree. Reads from contaminants and nuclear transcripts, although slowing down the process, did not result in chimeric sequence reconstruction. We suggest that the described approach has the potential to increase the number of available mitogenomes by exploiting the rapidly increasing number of transcriptomes.
- MeSH
- anotace sekvence metody MeSH
- data mining metody MeSH
- fylogeneze MeSH
- genom mitochondriální * MeSH
- Isoptera genetika MeSH
- reprodukovatelnost výsledků MeSH
- sekvence nukleotidů genetika MeSH
- sekvenční analýza DNA MeSH
- sekvenování transkriptomu MeSH
- transkriptom genetika MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.