A universal language for finding mass spectrometry data patterns
Language English Country United States Media print-electronic
Document type Journal Article
PubMed
40355727
DOI
10.1038/s41592-025-02660-z
PII: 10.1038/s41592-025-02660-z
Knihovny.cz E-resources
- MeSH
- Data Mining * methods MeSH
- Mass Spectrometry * methods MeSH
- Humans MeSH
- Metabolomics * methods MeSH
- Programming Languages * MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Despite being information rich, the vast majority of untargeted mass spectrometry data are underutilized; most analytes are not used for downstream interpretation or reanalysis after publication. The inability to dive into these rich raw mass spectrometry datasets is due to the limited flexibility and scalability of existing software tools. Here we introduce a new language, the Mass Spectrometry Query Language (MassQL), and an accompanying software ecosystem that addresses these issues by enabling the community to directly query mass spectrometry data with an expressive set of user-defined mass spectrometry patterns. Illustrated by real-world examples, MassQL provides a data-driven definition of chemical diversity by enabling the reanalysis of all public untargeted metabolomics data, empowering scientists across many disciplines to make new discoveries. MassQL has been widely implemented in multiple open-source and commercial mass spectrometry analysis tools, which enhances the ability, interoperability and reproducibility of mining of mass spectrometry data for the research community.
Bioinformatics Group Wageningen University and Research Wageningen the Netherlands
Biologicals and Natural Products Crop Protection R and D Corteva Agrisciences Indianapolis IN USA
BioMolecular Sciences School of Pharmacy University of Mississippi Oxford MS USA
Center for Urban Waters University of Washington Tacoma WA USA
Chemistry and Chemical Biology Northeastern University Boston MA USA
Clinical Biomarkers Laboratory School of Medicine Emory University Atlanta GA USA
College of Pharmacy Sookmyung Women's University Seoul Republic of Korea
College of Pharmacy University of Rhode Island Kingston RI USA
Crop Protection R and D Corteva Agrisciences Indianapolis IN USA
Data Science and Bioinformatics Corteva Agrisciences Dublin OH USA
Department of Biochemistry University of California Riverside Riverside CA USA
Department of Biochemistry University of Johannesburg Johannesburg South Africa
Department of Bioengineering University of California San Diego La Jolla CA USA
Department of BioMolecular Sciences School of Pharmacy University of Mississippi Oxford MS USA
Department of Biotechnology and Biomedicine Technical University of Denmark Kongens Lyngby Denmark
Department of Chemistry and Biochemistry San Diego State University San Diego CA USA
Department of Chemistry and Biochemistry UC Santa Cruz Santa Cruz CA USA
Department of Chemistry and Biochemistry University of Arizona Tucson AZ USA
Department of Chemistry and Biochemistry University of Denver Denver CO USA
Department of Chemistry BMC Science for Life Laboratory Uppsala University Uppsala Sweden
Department of Chemistry Case Western Reserve University Cleveland OH USA
Department of Computer Science University of California Riverside Riverside CA USA
Department of Fundamental Chemistry Institute of Chemistry University of São Paulo São Paulo Brazil
Department of Medicinal Chemistry College of Pharmacy University of Michigan Ann Arbor MI USA
Department of Pharmacy University of Marburg Marburg Germany
Environmental Genomics and Systems Biology Division Lawrence Berkeley National Lab Berkeley CA USA
Faculty of Chemistry Institute of Exact and Natural Science Federal University of Para Belem Brazil
Functional Metabolomics Lab CMFI Cluster of Excellence University of Tuebingen Tuebingen Germany
Institute for Biomedicine Eurac Research Bolzano Italy
Institute of Inorganic and Analytical Chemistry University of Münster Münster Germany
Institute of Pharmaceutical Biology Goethe University Frankfurt Frankfurt Germany
Institute of Pharmaceutical Biology University of Bonn Bonn Germany
Institute of Pharmacy Freie Universität Berlin Berlin Germany
Natural Products Discovery Core Life Sciences Institute University of Michigan Ann Arbor MI USA
Pharmacognosy Department Faculty of Pharmacy Cairo University Cairo Egypt
Pharmacognosy Faculty of Pharmacy Al Azhar University Nasr City Egypt
RIKEN Center for Integrative Medical Sciences Tsurumi ku Japan
RIKEN Center for Sustainable Resource Science Tsurumi ku Japan
School of Chemistry and Biochemistry Georgia Institute of Technology Atlanta GA USA
The Joint Genome Institute Lawrence Berkeley National Lab Berkeley CA USA
West Coast Metabolomics Center University of California Davis Davis CA USA
See more in PubMed
Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass. Spectrom. 5, 859–866 (1994). PubMed DOI
Baars, O., Morel, F. M. M. & Perlman, D. H. ChelomEx: isotope-assisted discovery of metal chelates in complex media using high-resolution LC–MS. Anal. Chem. 86, 11298–11305 (2014). PubMed DOI
Huber, F. et al. matchms—processing and similarity evaluation of mass spectrometry data. J. Open Source Softw. 5, 2411 (2020). DOI
Chang, H.-Y. et al. A practical guide to metabolomics software development. Anal. Chem. 93, 1912–1923 (2021). PubMed DOI PMC
Matsuda, F. Regular expressions of MS/MS spectra for partial annotation of metabolite features. Metabolomics 12, 113 (2016). DOI
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016). PubMed DOI PMC
Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016). PubMed DOI
Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020). PubMed
Petras, D. et al. GNPS Dashboard: collaborative exploration of mass spectrometry data in the web browser. Nat. Methods 19, 134–136 (2022). PubMed DOI PMC
Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 41, 447–449 (2023). PubMed DOI PMC
Pfeuffer, J. et al. OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat. Methods 21, 365–367 (2024). PubMed DOI
Tsugawa, H. et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015). PubMed DOI PMC
Kostelic, M. M., & Marty, M. T. Deconvolving native and intact protein mass spectra with UniDec. Methods Mol. Biol. https://doi.org/10.1007/978-1-0716-2325-1_12 (2022).
Rainer, J. et al. A modular and expandable ecosystem for metabolomics data annotation in R. Metabolites 12, 173 (2022). PubMed DOI PMC
Hider, R. C. & Kong, X. Chemistry and biology of siderophores. Nat. Prod. Rep. 27, 637–657 (2010). PubMed DOI
Sandy, M. & Butler, A. Microbial iron acquisition: marine and terrestrial siderophores. Chem. Rev. 109, 4580–4595 (2009). PubMed DOI PMC
Aron, A. T. et al. Native mass spectrometry-based metabolomics identifies metal-binding compounds. Nat. Chem. 14, 100–109 (2022). PubMed DOI
Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021). PubMed DOI PMC
Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008). PubMed DOI
Cruz-Huerta, E. et al. Short communication: identification of iron-binding peptides from whey protein hydrolysates using iron (III)-immobilized metal ion affinity chromatography and reversed phase-HPLC-tandem mass spectrometry. J. Dairy Sci. 99, 77–82 (2016). PubMed DOI
Nalini, S. & Balasubramanian, K. A. Studies on iron binding by free fatty acids. Indian J. Biochem. Biophys. 30, 224–228 (1993). PubMed
Sanyal, A. J., Hirsch, J. I. & Moore, E. W. Premicellar taurocholate avidly binds ferrous (Fe PubMed
Tamilmani, P. & Pandey, M. C. Iron binding efficiency of polyphenols: comparison of effect of ascorbic acid and ethylenediaminetetraacetic acid on catechol and galloyl groups. Food Chem. 197, 1275–1279 (2016). PubMed DOI
Reemtsma, T., Quintana, J. B., Rodil, R., García-López, M. & Rodríguez, I. Organophosphorus flame retardants and plasticizers in water and air I. Occurrence and fate. Trends Anal. Chem. 27, 727–737 (2008). DOI
van der Veen, I. & de Boer, J. Phosphorus flame retardants: properties, production, environmental occurrence, toxicity and analysis. Chemosphere 88, 1119–1153 (2012). PubMed DOI
Yao, C., Yang, H. & Li, Y. A review on organophosphate flame retardants in the environment: occurrence, accumulation, metabolism and toxicity. Sci. Total Environ. 795, 148837 (2021). PubMed DOI
Meng, W. et al. Functional group-dependent screening of organophosphate esters (OPEs) and discovery of an abundant OPE bis-(2-ethylhexyl)-phenyl phosphate in indoor dust. Environ. Sci. Technol. 54, 4455–4464 (2020). PubMed DOI
Wang, L., Jia, Y. & Hu, J. Nine alkyl organophosphate triesters newly identified in house dust. Environ. Int. 165, 107333 (2022). PubMed DOI
Ye, L., Meng, W., Huang, J., Li, J. & Su, G. Establishment of a target, suspect, and functional group-dependent screening strategy for organophosphate esters (OPEs): “into the unknown” of OPEs in the sediment of Taihu Lake, China. Environ. Sci. Technol. 55, 5836–5847 (2021). PubMed DOI
Bittremieux, W., Laukens, K., Noble, W. S. & Dorrestein, P. C. Large-scale tandem mass spectrum clustering using fast nearest neighbor searching. Rapid Commun. Mass Spectrom. https://doi.org/10.1002/rcm.9153 (2021).
Mohanty, I. et al. The underappreciated diversity of bile acid modifications. Cell 187, 1801–1818 (2024). PubMed DOI
El Abiead, Y. et al. Heterogeneous multimeric metabolite ion species observed in LC–MS based metabolomics data sets. Anal. Chim. Acta 1229, 340352 (2022). PubMed DOI
Oesterle, I. et al. Exposomic biomonitoring of polyphenols by non-targeted analysis and suspect screening. Anal. Chem. 95, 10686–10694 (2023). PubMed DOI PMC
Liu, Z. et al. Localized cardiac small molecule trajectories and persistent chemical sequelae in experimental Chagas disease. Nat. Commun. 14, 6769 (2023). PubMed DOI PMC
Ahmed, M. M. A., Tripathi, S. K. & Boudreau, P. D. Comparative metabolomic profiling of Cupriavidus necator B-4383 revealed production of cupriachelin siderophores, one with activity against Cryptococcus neoformans. Front. Chem. 11, 1256962 (2023). PubMed DOI PMC
Ahmed, M. M. A. & Boudreau, P. D. LCMS-metabolomic profiling and genome mining of Delftia lacustris DSM 21246 revealed lipophilic delftibactin metallophores. J. Nat. Prod. 87, 1384–1393 (2024). PubMed DOI PMC
Allard, P.-M. et al. Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts. GigaScience 12, giac124 (2023). DOI PMC
Berger, T. et al. A MassQL-integrated molecular networking approach for the discovery and substructure annotation of bioactive. Cycl. Pept. J. Nat. Prod. 87, 692–704 (2024). DOI
Gaudry, A. et al. A sample-centric and knowledge-driven computational framework for natural products drug discovery. ACS Cent. Sci. 10, 494–510 (2024). PubMed DOI PMC
Leão, T. F. et al. NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters. PNAS Nexus 1, pgac257 (2022). PubMed DOI PMC
Quiros-Guerrero, L.-M. et al. Comprehensive mass spectrometric metabolomic profiling of a chemically diverse collection of plants of the Celastraceae family. Sci. Data 11, 415 (2024). PubMed DOI PMC
Selegato, D. M., Zanatta, A. C., Pilon, A. C., Veloso, J. H. & Castro-Gamboa, I. Application of feature-based molecular networking and MassQL for the MS/MS fragmentation study of depsipeptides. Front. Mol. Biosci. 10, 1238475 (2023). PubMed DOI PMC
Bittremieux, W. et al. Comparison of cosine, modified cosine, and neutral loss based spectrum alignment for discovery of structurally related molecules. J. Am. Soc. Mass. Spectrom. 33, 1733–1744 (2022). PubMed DOI
Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnol. 38, 23–26 (2020). PubMed DOI PMC
Goloborodko, A. A., Levitsky, L. I., Ivanov, M. V. & Gorshkov, M. V. Pyteomics—a Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass. Spectrom. 24, 301–304 (2013). PubMed DOI
Martens, L. et al. mzML—a community standard for mass spectrometry data. Mol. Cell. Proteom. 10, R110.000133 (2011). DOI
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017). PubMed DOI
Wang, M. et al. mwang87/MassQueryLanguage: release 2024.12.12. Zenodo https://doi.org/10.5281/zenodo.14419767 (2024).