JavaScript is NOT enabled !

Please enable JavaScript.

* Show help

Reset

Most cited: 32839597

18 citations in PubMed Filters

Most cited article - PubMed ID 32839597

Feature-based molecular networking in the GNPS analysis environment

Nature methods. 2020 Sep ; 17 (9) : 905-908. [epub] 20200824

Nat Methods
ISSN 1548-7105 | 1548-7091
Source

Article

An evaluation methodology for machine learning-based tandem mass spectra similarity prediction

Strobel, Michael
Author Strobel, Michael ORCID Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA
Gil-de-la-Fuente, Alberto
Author Gil-de-la-Fuente, Alberto ORCID Information Technologies Department, Escuela Politécnica Superior, Universidad San Pablo-CEU, CEU Universities, Urbanización Montepríncipe, Boadilla Del monte, 28668, Madrid, Spain
Zare Shahneh, Mohammad Reza
Author Zare Shahneh, Mohammad Reza ORCID Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA
Abiead, Yasin El
Author Abiead, Yasin El ORCID Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, 9255 Pharmacy Ln, San Diego, CA, 92093, USA
Bushuiev, Roman
Author Bushuiev, Roman ORCID Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nám. 542/2, Prague, 16000, Czech Republic Czech Institute of Informatics, Robotics and Cybernetics, Jugoslávských partyzánů 1580/3, Prague, 16000, Czech Republic
Bushuiev, Anton
Author Bushuiev, Anton ORCID Czech Institute of Informatics, Robotics and Cybernetics, Jugoslávských partyzánů 1580/3, Prague, 16000, Czech Republic
Pluskal, Tomáš
Author Pluskal, Tomáš ORCID Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nám. 542/2, Prague, 16000, Czech Republic
Wang, Mingxun
Author Wang, Mingxun ORCID Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA. mingxun.wang@cs.ucr.edu

BMC bioinformatics. 2025 Jul 11 ; 26 (1) : 174. [epub] 20250711

BMC Bioinformatics
ISSN 1471-2105
Source

BACKGROUND: Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively structurally related compounds. However, a key bottleneck of this approach is the comparison of MS/MS spectra used to identify nearby structural neighbors. Machine learning (ML) approaches have emerged as a promising technique to predict structural similarity from MS/MS that may surpass the current state-of-the-art algorithmic methods. However, the comparison between these different ML methods remains a challenge because there is a lack of standardization to benchmark, evaluate, and compare MS/MS similarity methods, and there are no methods that address data leakage between training and test data in order to analyze model generalizability. RESULT: In this work, we present the creation of a new evaluation methodology using a train/test split that allows for the evaluation of machine learning models at varying degrees of structural similarity between training and test sets. We also introduce a training and evaluation framework that measures prediction accuracy on domain-inspired annotation and retrieval metrics designed to mirror real-world applications. We further show how two alternative training methods that leverage MS specific insights (e.g., similar instrumentation, collision energy, adduct) affect method performance and demonstrate the orthogonality of the proposed metrics. We especially highlight the role that collision energy plays in prediction errors. Finally, we release a continually updated version of our dataset online along with our data cleaning and splitting pipelines for community use. CONCLUSION: It is our hope that this benchmark will serve as the basis of development for future machine learning approaches in MS/MS similarity and facilitate comparison between models. We anticipate that the introduced set of evaluation metrics allows for a better reflection of practical performance.

Keywords
Benchmark, Machine learning, Mass spectrometry, Metabolomics, Spectral similarity measure,
MeSH
Algorithms MeSH
Machine Learning * MeSH
Tandem Mass Spectrometry * methods MeSH
Publication type
Journal Article MeSH

Article

Computational metabolomics reveals overlooked chemodiversity of alkaloid scaffolds in Piper fimbriulatum

The Plant journal. 2025 Mar ; 121 (5) : e70086.

Plant J
ISSN 1365-313X | 0960-7412
Source

Plant specialized metabolites play key roles in diverse physiological processes and ecological interactions. Identifying structurally novel metabolites, as well as discovering known compounds in new species, is often crucial for answering broader biological questions. The Piper genus (Piperaceae family) is known for its special phytochemistry and has been extensively studied over the past decades. Here, we investigated the alkaloid diversity of Piper fimbriulatum, a myrmecophytic plant native to Central America, using a metabolomics workflow that combines untargeted LC-MS/MS analysis with a range of recently developed computational tools. Specifically, we leverage open MS/MS spectral libraries and metabolomics data repositories for metabolite annotation, guiding isolation efforts toward structurally new compounds (i.e., dereplication). As a result, we identified several alkaloids belonging to five different classes and isolated one novel seco-benzylisoquinoline alkaloid featuring a linear quaternary amine moiety which we named fimbriulatumine. Notably, many of the identified compounds were never reported in Piperaceae plants. Our findings expand the known alkaloid diversity of this family and demonstrate the value of revisiting well-studied plant families using state-of-the-art computational metabolomics workflows to uncover previously overlooked chemodiversity. To contextualize our findings within a broader biological context, we employed a workflow for automated mining of literature reports of the identified alkaloid scaffolds and mapped the results onto the angiosperm tree of life. By doing so, we highlight the remarkable alkaloid diversity within the Piper genus and provide a framework for generating hypotheses on the biosynthetic evolution of these specialized metabolites. Many of the computational tools and data resources used in this study remain underutilized within the plant science community. This manuscript demonstrates their potential through a practical application and aims to promote broader accessibility to untargeted metabolomics approaches.

Keywords
Piper fimbriulatum, Piperaceae, Wikidata, alkaloids, angiosperms, computational metabolomics, mass spectrometry, technical advance,
MeSH
Alkaloids * metabolism chemistry MeSH
Chromatography, Liquid MeSH
Metabolomics * methods MeSH
Piper * metabolism chemistry MeSH
Tandem Mass Spectrometry MeSH
Publication type
Journal Article MeSH
Names of Substances
Alkaloids * MeSH

Article

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

Nature protocols. 2025 Jan ; 20 (1) : 92-162. [epub] 20240920

Nat Protoc
ISSN 1750-2799
Source

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.

MeSH
Chromatography, Liquid methods MeSH
Data Interpretation, Statistical MeSH
Metabolomics * methods MeSH
Software MeSH
Tandem Mass Spectrometry methods MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, N.I.H., Extramural MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH

Article

Studying Plant Specialized Metabolites Using Computational Metabolomics Strategies

Methods in molecular biology (Clifton, N.J.). 2024 ; 2788 () : 97-136.

Methods Mol Biol
ISSN 1940-6029 | 1064-3745
Source

Plant specialized metabolites have diversified vastly over the course of plant evolution, and they are considered key players in complex interactions between plants and their environment. The chemical diversity of these metabolites has been widely explored and utilized in agriculture and crop enhancement, the food industry, and drug development, among other areas. However, the immensity of the plant metabolome can make its exploration challenging. Here we describe a protocol for exploring plant specialized metabolites that combines high-resolution mass spectrometry and computational metabolomics strategies, including molecular networking, identification of structural motifs, as well as prediction of chemical structures and metabolite classes.

Keywords
GNPS, MS2LDA, MS2Query, MZmine, Molecular networking, Plant metabolomics, SIRIUS, Specialized metabolites,
MeSH
Mass Spectrometry * methods MeSH
Metabolome * MeSH
Metabolomics * methods MeSH
Plants * metabolism MeSH
Computational Biology methods MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Article

Empirically establishing drug exposure records directly from untargeted metabolomics data

bioRxiv. 2024 Oct 26 ; () : . [epub] 20241026

ISSN 2692-8205
Source

Despite extensive efforts, extracting information on medication exposure from clinical records remains challenging. To complement this approach, we developed the tandem mass spectrometry (MS/MS) based GNPS Drug Library. This resource integrates MS/MS data for drugs and their metabolites/analogs with controlled vocabularies on exposure sources, pharmacologic classes, therapeutic indications, and mechanisms of action. It enables direct analysis of drug exposure and metabolism from untargeted metabolomics data independent of clinical records. Our library facilitates stratification of individuals in clinical studies based on the empirically detected medications, exemplified by drug-dependent microbiota-derived N-acyl lipid changes in a cohort with human immunodeficiency virus. The GNPS Drug Library holds potential for broader applications in drug discovery and precision medicine.

Publication type
Journal Article MeSH
Preprint MeSH

Article

Reproducible mass spectrometry data processing and compound annotation in MZmine 3

Nature protocols. 2024 Sep ; 19 (9) : 2597-2641. [epub] 20240520

Nat Protoc
ISSN 1750-2799
Source

Untargeted mass spectrometry (MS) experiments produce complex, multidimensional data that are practically impossible to investigate manually. For this reason, computational pipelines are needed to extract relevant information from raw spectral data and convert it into a more comprehensible format. Depending on the sample type and/or goal of the study, a variety of MS platforms can be used for such analysis. MZmine is an open-source software for the processing of raw spectral data generated by different MS platforms. Examples include liquid chromatography-MS, gas chromatography-MS and MS-imaging. These data might typically be associated with various applications including metabolomics and lipidomics. Moreover, the third version of the software, described herein, supports the processing of ion mobility spectrometry (IMS) data. The present protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by different instrumental setups: liquid chromatography-(IMS-)MS, gas chromatography-MS and (IMS-)MS imaging. For training purposes, example datasets are provided together with configuration batch files (i.e., list of processing steps and parameters) to allow new users to easily replicate the described workflows. Depending on the number of data files and available computing resources, we anticipate this to take between 2 and 24 h for new MZmine users and nonexperts. Within each procedure, we provide a detailed description for all processing parameters together with instructions/recommendations for their optimization. The main generated outputs are represented by aligned feature tables and fragmentation spectra lists that can be used by other third-party tools for further downstream analysis.

MeSH
Chromatography, Liquid methods MeSH
Mass Spectrometry * methods MeSH
Ion Mobility Spectrometry methods MeSH
Metabolomics methods MeSH
Gas Chromatography-Mass Spectrometry methods MeSH
Reproducibility of Results MeSH
Software * MeSH
Publication type
Journal Article MeSH
Review MeSH

Article

Functional metabolomics of the human scalp: a metabolic niche for Staphylococcus epidermidis

mSystems. 2024 Feb 20 ; 9 (2) : e0035623. [epub] 20240111

ISSN 2379-5077
Source

Although metabolomics data acquisition and analysis technologies have become increasingly sophisticated over the past 5-10 years, deciphering a metabolite's function from a description of its structure and its abundance in a given experimental setting is still a major scientific and intellectual challenge. To point out ways to address this "data to knowledge" challenge, we developed a functional metabolomics strategy that combines state-of-the-art data analysis tools and applied it to a human scalp metabolomics data set: skin swabs from healthy volunteers with normal or oily scalp (Sebumeter score 60-120, n = 33; Sebumeter score > 120, n = 41) were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS), yielding four metabolomics data sets for reversed phase chromatography (C18) or hydrophilic interaction chromatography (HILIC) separation in electrospray ionization (ESI) + or - ionization mode. Following our data analysis strategy, we were able to obtain increasingly comprehensive structural and functional annotations, by applying the Global Natural Product Social Networking (M. Wang, J. J. Carver, V. V. Phelan, L. M. Sanchez, et al., Nat Biotechnol 34:828-837, 2016, https://doi.org/10.1038/nbt.3597), SIRIUS (K. Dührkop, M. Fleischauer, M. Ludwig, A. A. Aksenov, et al., Nat Methods 16:299-302, 2019, https://doi.org/10.1038/s41592-019-0344-8), and MicrobeMASST (S. ZuffaS, R. Schmid, A. Bauermeister, P. W, P. Gomes, et al., bioRxiv:rs.3.rs-3189768, 2023, https://doi.org/10.21203/rs.3.rs-3189768/v1) tools. We finally combined the metabolomics data with a corresponding metagenomic sequencing data set using MMvec (J. T. Morton, A. A. Aksenov, L. F. Nothias, J. R. Foulds, et. al., Nat Methods 16:1306-1314, 2019, https://doi.org/10.1038/s41592-019-0616-3), gaining insights into the metabolic niche of one of the most prominent microbes on the human skin, Staphylococcus epidermidis.IMPORTANCESystems biology research on host-associated microbiota focuses on two fundamental questions: which microbes are present and how do they interact with each other, their host, and the broader host environment? Metagenomics provides us with a direct answer to the first part of the question: it unveils the microbial inhabitants, e.g., on our skin, and can provide insight into their functional potential. Yet, it falls short in revealing their active role. Metabolomics shows us the chemical composition of the environment in which microbes thrive and the transformation products they produce. In particular, untargeted metabolomics has the potential to observe a diverse set of metabolites and is thus an ideal complement to metagenomics. However, this potential often remains underexplored due to the low annotation rates in MS-based metabolomics and the necessity for multiple experimental chromatographic and mass spectrometric conditions. Beyond detection, prospecting metabolites' functional role in the host/microbiome metabolome requires identifying the biological processes and entities involved in their production and biotransformations. In the present study of the human scalp, we developed a strategy to achieve comprehensive structural and functional annotation of the metabolites in the human scalp environment, thus diving one step deeper into the interpretation of "omics" data. Leveraging a collection of openly accessible software tools and integrating microbiome data as a source of functional metabolite annotations, we finally identified the specific metabolic niche of Staphylococcus epidermidis, one of the key players of the human skin microbiome.

Keywords
metabolite annotation, metabolomics, multi-omics integration, scalp, skin microbiome,
MeSH
Chromatography, Liquid MeSH
Humans MeSH
Metabolomics methods MeSH
Scalp * MeSH
Staphylococcus epidermidis * MeSH
Tandem Mass Spectrometry MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH

Article

On-tissue dataset-dependent MALDI-TIMS-MS2 bioimaging

Nature communications. 2023 Nov 18 ; 14 (1) : 7495. [epub] 20231118

Nat Commun
ISSN 2041-1723
Source

Trapped ion mobility spectrometry (TIMS) adds an additional separation dimension to mass spectrometry (MS) imaging, however, the lack of fragmentation spectra (MS2) impedes confident compound annotation in spatial metabolomics. Here, we describe spatial ion mobility-scheduled exhaustive fragmentation (SIMSEF), a dataset-dependent acquisition strategy that augments TIMS-MS imaging datasets with MS2 spectra. The fragmentation experiments are systematically distributed across the sample and scheduled for multiple collision energies per precursor ion. Extendable data processing and evaluation workflows are implemented into the open source software MZmine. The workflow and annotation capabilities are demonstrated on rat brain tissue thin sections, measured by matrix-assisted laser desorption/ionisation (MALDI)-TIMS-MS, where SIMSEF enables on-tissue compound annotation through spectral library matching and rule-based lipid annotation within MZmine and maps the (un)known chemical space by molecular networking. The SIMSEF algorithm and data analysis pipelines are open source and modular to provide a community resource.

MeSH
Algorithms MeSH
Ion Mobility Spectrometry * MeSH
Rats MeSH
Metabolomics * methods MeSH
Software MeSH
Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization methods MeSH
Animals MeSH
Check Tag
Rats MeSH
Animals MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Article

More than just an eagle killer: The freshwater cyanobacterium Aetokthonos hydrillicola produces highly toxic dolastatin derivatives

Proceedings of the National Academy of Sciences of the United States of America. 2023 Oct 03 ; 120 (40) : e2219230120. [epub] 20230926

Proc Natl Acad Sci U S A
ISSN 1091-6490 | 0027-8424
Source

Cyanobacteria are infamous producers of toxins. While the toxic potential of planktonic cyanobacterial blooms is well documented, the ecosystem level effects of toxigenic benthic and epiphytic cyanobacteria are an understudied threat. The freshwater epiphytic cyanobacterium Aetokthonos hydrillicola has recently been shown to produce the "eagle killer" neurotoxin aetokthonotoxin (AETX) causing the fatal neurological disease vacuolar myelinopathy. The disease affects a wide array of wildlife in the southeastern United States, most notably waterfowl and birds of prey, including the bald eagle. In an assay for cytotoxicity, we found the crude extract of the cyanobacterium to be much more potent than pure AETX, prompting further investigation. Here, we describe the isolation and structure elucidation of the aetokthonostatins (AESTs), linear peptides belonging to the dolastatin compound family, featuring a unique modification of the C-terminal phenylalanine-derived moiety. Using immunofluorescence microscopy and molecular modeling, we confirmed that AEST potently impacts microtubule dynamics and can bind to tubulin in a similar matter as dolastatin 10. We also show that AEST inhibits reproduction of the nematode Caenorhabditis elegans. Bioinformatic analysis revealed the AEST biosynthetic gene cluster encoding a nonribosomal peptide synthetase/polyketide synthase accompanied by a unique tailoring machinery. The biosynthetic activity of a specific N-terminal methyltransferase was confirmed by in vitro biochemical studies, establishing a mechanistic link between the gene cluster and its product.

Keywords
aetokthonostatin, biosynthesis, cyanotoxin, cytotoxicity, dolastatin,
MeSH
Eagles * MeSH
Caenorhabditis elegans MeSH
Ecosystem MeSH
Cyanobacteria * genetics MeSH
Fresh Water MeSH
Animals MeSH
Check Tag
Animals MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Names of Substances
aetokthonotoxin MeSH Browser

Article

Evaluation of Data-Dependent MS/MS Acquisition Parameters for Non-Targeted Metabolomics and Molecular Networking of Environmental Samples: Focus on the Q Exactive Platform

Analytical chemistry. 2023 Aug 29 ; 95 (34) : 12673-12682. [epub] 20230814

Anal Chem
ISSN 1520-6882 | 0003-2700
Source

Non-targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a widely used tool for metabolomics analysis, enabling the detection and annotation of small molecules in complex environmental samples. Data-dependent acquisition (DDA) of product ion spectra is thereby currently one of the most frequently applied data acquisition strategies. The optimization of DDA parameters is central to ensuring high spectral quality, coverage, and number of compound annotations. Here, we evaluated the influence of 10 central DDA settings of the Q Exactive mass spectrometer on natural organic matter samples from ocean, river, and soil environments. After data analysis with classical and feature-based molecular networking using MZmine and GNPS, we compared the total number of network nodes, multivariate clustering, and spectrum quality-related metrics such as annotation and singleton rates, MS/MS placement, and coverage. Our results show that automatic gain control, microscans, mass resolving power, and dynamic exclusion are the most critical parameters, whereas collision energy, TopN, and isolation width had moderate and apex trigger, monoisotopic selection, and isotopic exclusion minor effects. The insights into the data acquisition ergonomics of the Q Exactive platform presented here can guide new users and provide them with initial method parameters, some of which may also be transferable to other sample types and MS platforms.

* Show help

Feature-based molecular networking in the GNPS analysis environment

Refine by MeSH