Most cited article - PubMed ID 34158495
Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment
Metabolite identification in non-targeted mass spectrometry-based metabolomics remains a major challenge due to limited spectral library coverage and difficulties in predicting metabolite fragmentation patterns. Here, we introduce Multiplexed Chemical Metabolomics (MCheM), which employs orthogonal post-column derivatization reactions integrated into a unified mass spectrometry data framework. MCheM generates orthogonal structural information that substantially improves metabolite annotation through in silico spectrum matching and open-modification searches, offering a powerful new toolbox for the structure elucidation of unknown metabolites at scale.
- MeSH
- Metabolome * MeSH
- Metabolomics * methods MeSH
- Tandem Mass Spectrometry * methods MeSH
- Publication type
- Journal Article MeSH
Despite being information rich, the vast majority of untargeted mass spectrometry data are underutilized; most analytes are not used for downstream interpretation or reanalysis after publication. The inability to dive into these rich raw mass spectrometry datasets is due to the limited flexibility and scalability of existing software tools. Here we introduce a new language, the Mass Spectrometry Query Language (MassQL), and an accompanying software ecosystem that addresses these issues by enabling the community to directly query mass spectrometry data with an expressive set of user-defined mass spectrometry patterns. Illustrated by real-world examples, MassQL provides a data-driven definition of chemical diversity by enabling the reanalysis of all public untargeted metabolomics data, empowering scientists across many disciplines to make new discoveries. MassQL has been widely implemented in multiple open-source and commercial mass spectrometry analysis tools, which enhances the ability, interoperability and reproducibility of mining of mass spectrometry data for the research community.
- MeSH
- Data Mining * methods MeSH
- Mass Spectrometry * methods MeSH
- Humans MeSH
- Metabolomics * methods MeSH
- Programming Languages * MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.
- MeSH
- Chromatography, Liquid methods MeSH
- Data Interpretation, Statistical MeSH
- Metabolomics * methods MeSH
- Software MeSH
- Tandem Mass Spectrometry methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
Despite extensive efforts, extracting information on medication exposure from clinical records remains challenging. To complement this approach, we developed the tandem mass spectrometry (MS/MS) based GNPS Drug Library. This resource integrates MS/MS data for drugs and their metabolites/analogs with controlled vocabularies on exposure sources, pharmacologic classes, therapeutic indications, and mechanisms of action. It enables direct analysis of drug exposure and metabolism from untargeted metabolomics data independent of clinical records. Our library facilitates stratification of individuals in clinical studies based on the empirically detected medications, exemplified by drug-dependent microbiota-derived N-acyl lipid changes in a cohort with human immunodeficiency virus. The GNPS Drug Library holds potential for broader applications in drug discovery and precision medicine.
- Publication type
- Journal Article MeSH
- Preprint MeSH
Untargeted mass spectrometry (MS) experiments produce complex, multidimensional data that are practically impossible to investigate manually. For this reason, computational pipelines are needed to extract relevant information from raw spectral data and convert it into a more comprehensible format. Depending on the sample type and/or goal of the study, a variety of MS platforms can be used for such analysis. MZmine is an open-source software for the processing of raw spectral data generated by different MS platforms. Examples include liquid chromatography-MS, gas chromatography-MS and MS-imaging. These data might typically be associated with various applications including metabolomics and lipidomics. Moreover, the third version of the software, described herein, supports the processing of ion mobility spectrometry (IMS) data. The present protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by different instrumental setups: liquid chromatography-(IMS-)MS, gas chromatography-MS and (IMS-)MS imaging. For training purposes, example datasets are provided together with configuration batch files (i.e., list of processing steps and parameters) to allow new users to easily replicate the described workflows. Depending on the number of data files and available computing resources, we anticipate this to take between 2 and 24 h for new MZmine users and nonexperts. Within each procedure, we provide a detailed description for all processing parameters together with instructions/recommendations for their optimization. The main generated outputs are represented by aligned feature tables and fragmentation spectra lists that can be used by other third-party tools for further downstream analysis.
The annotation of metabolites detected in LC-MS-based untargeted metabolomics studies routinely applies accurate m/z of the intact metabolite (MS1) as well as chromatographic retention time and MS/MS data. Electrospray ionization and transfer of ions through the mass spectrometer can result in the generation of multiple "features" derived from the same metabolite with different m/z values but the same retention time. The complexity of the different charged and neutral adducts, in-source fragments, and charge states has not been previously and deeply characterized. In this paper, we report the first large-scale characterization using publicly available data sets derived from different research groups, instrument manufacturers, LC assays, sample types, and ion modes. 271 m/z differences relating to different metabolite feature pairs were reported, and 209 were annotated. The results show a wide range of different features being observed with only a core 32 m/z differences reported in >50% of the data sets investigated. There were no patterns reporting specific m/z differences that were observed in relation to ion mode, instrument manufacturer, LC assay type, and mammalian sample type, although some m/z differences were related to study group (mammal, microbe, plant) and mobile phase composition. The results provide the metabolomics community with recommendations of adducts, in-source fragments, and charge states to apply in metabolite annotation workflows.
- MeSH
- Chromatography, Liquid MeSH
- Spectrometry, Mass, Electrospray Ionization * methods MeSH
- Humans MeSH
- Metabolomics * methods MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Although metabolomics data acquisition and analysis technologies have become increasingly sophisticated over the past 5-10 years, deciphering a metabolite's function from a description of its structure and its abundance in a given experimental setting is still a major scientific and intellectual challenge. To point out ways to address this "data to knowledge" challenge, we developed a functional metabolomics strategy that combines state-of-the-art data analysis tools and applied it to a human scalp metabolomics data set: skin swabs from healthy volunteers with normal or oily scalp (Sebumeter score 60-120, n = 33; Sebumeter score > 120, n = 41) were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS), yielding four metabolomics data sets for reversed phase chromatography (C18) or hydrophilic interaction chromatography (HILIC) separation in electrospray ionization (ESI) + or - ionization mode. Following our data analysis strategy, we were able to obtain increasingly comprehensive structural and functional annotations, by applying the Global Natural Product Social Networking (M. Wang, J. J. Carver, V. V. Phelan, L. M. Sanchez, et al., Nat Biotechnol 34:828-837, 2016, https://doi.org/10.1038/nbt.3597), SIRIUS (K. Dührkop, M. Fleischauer, M. Ludwig, A. A. Aksenov, et al., Nat Methods 16:299-302, 2019, https://doi.org/10.1038/s41592-019-0344-8), and MicrobeMASST (S. ZuffaS, R. Schmid, A. Bauermeister, P. W, P. Gomes, et al., bioRxiv:rs.3.rs-3189768, 2023, https://doi.org/10.21203/rs.3.rs-3189768/v1) tools. We finally combined the metabolomics data with a corresponding metagenomic sequencing data set using MMvec (J. T. Morton, A. A. Aksenov, L. F. Nothias, J. R. Foulds, et. al., Nat Methods 16:1306-1314, 2019, https://doi.org/10.1038/s41592-019-0616-3), gaining insights into the metabolic niche of one of the most prominent microbes on the human skin, Staphylococcus epidermidis.IMPORTANCESystems biology research on host-associated microbiota focuses on two fundamental questions: which microbes are present and how do they interact with each other, their host, and the broader host environment? Metagenomics provides us with a direct answer to the first part of the question: it unveils the microbial inhabitants, e.g., on our skin, and can provide insight into their functional potential. Yet, it falls short in revealing their active role. Metabolomics shows us the chemical composition of the environment in which microbes thrive and the transformation products they produce. In particular, untargeted metabolomics has the potential to observe a diverse set of metabolites and is thus an ideal complement to metagenomics. However, this potential often remains underexplored due to the low annotation rates in MS-based metabolomics and the necessity for multiple experimental chromatographic and mass spectrometric conditions. Beyond detection, prospecting metabolites' functional role in the host/microbiome metabolome requires identifying the biological processes and entities involved in their production and biotransformations. In the present study of the human scalp, we developed a strategy to achieve comprehensive structural and functional annotation of the metabolites in the human scalp environment, thus diving one step deeper into the interpretation of "omics" data. Leveraging a collection of openly accessible software tools and integrating microbiome data as a source of functional metabolite annotations, we finally identified the specific metabolic niche of Staphylococcus epidermidis, one of the key players of the human skin microbiome.
- Keywords
- metabolite annotation, metabolomics, multi-omics integration, scalp, skin microbiome,
- MeSH
- Chromatography, Liquid MeSH
- Humans MeSH
- Metabolomics methods MeSH
- Scalp * MeSH
- Staphylococcus epidermidis * MeSH
- Tandem Mass Spectrometry MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Non-targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a widely used tool for metabolomics analysis, enabling the detection and annotation of small molecules in complex environmental samples. Data-dependent acquisition (DDA) of product ion spectra is thereby currently one of the most frequently applied data acquisition strategies. The optimization of DDA parameters is central to ensuring high spectral quality, coverage, and number of compound annotations. Here, we evaluated the influence of 10 central DDA settings of the Q Exactive mass spectrometer on natural organic matter samples from ocean, river, and soil environments. After data analysis with classical and feature-based molecular networking using MZmine and GNPS, we compared the total number of network nodes, multivariate clustering, and spectrum quality-related metrics such as annotation and singleton rates, MS/MS placement, and coverage. Our results show that automatic gain control, microscans, mass resolving power, and dynamic exclusion are the most critical parameters, whereas collision energy, TopN, and isolation width had moderate and apex trigger, monoisotopic selection, and isotopic exclusion minor effects. The insights into the data acquisition ergonomics of the Q Exactive platform presented here can guide new users and provide them with initial method parameters, some of which may also be transferable to other sample types and MS platforms.