JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

Autor: Wang, Mingxun

11 záznamů v PubMed Filtry

Článek

An evaluation methodology for machine learning-based tandem mass spectra similarity prediction

Strobel, Michael
Autor Strobel, Michael ORCID Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA
Gil-de-la-Fuente, Alberto
Autor Gil-de-la-Fuente, Alberto ORCID Information Technologies Department, Escuela Politécnica Superior, Universidad San Pablo-CEU, CEU Universities, Urbanización Montepríncipe, Boadilla Del monte, 28668, Madrid, Spain
Zare Shahneh, Mohammad Reza
Autor Zare Shahneh, Mohammad Reza ORCID Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA
Abiead, Yasin El
Autor Abiead, Yasin El ORCID Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, 9255 Pharmacy Ln, San Diego, CA, 92093, USA
Bushuiev, Roman
Autor Bushuiev, Roman ORCID Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nám. 542/2, Prague, 16000, Czech Republic Czech Institute of Informatics, Robotics and Cybernetics, Jugoslávských partyzánů 1580/3, Prague, 16000, Czech Republic
Bushuiev, Anton
Autor Bushuiev, Anton ORCID Czech Institute of Informatics, Robotics and Cybernetics, Jugoslávských partyzánů 1580/3, Prague, 16000, Czech Republic
Pluskal, Tomáš
Autor Pluskal, Tomáš ORCID Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nám. 542/2, Prague, 16000, Czech Republic
Wang, Mingxun
Autor Wang, Mingxun ORCID Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA. mingxun.wang@cs.ucr.edu

BMC bioinformatics. 2025 Jul 11 ; 26 (1) : 174. [epub] 20250711

BMC Bioinformatics
ISSN 1471-2105
Zdroj

BACKGROUND: Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively structurally related compounds. However, a key bottleneck of this approach is the comparison of MS/MS spectra used to identify nearby structural neighbors. Machine learning (ML) approaches have emerged as a promising technique to predict structural similarity from MS/MS that may surpass the current state-of-the-art algorithmic methods. However, the comparison between these different ML methods remains a challenge because there is a lack of standardization to benchmark, evaluate, and compare MS/MS similarity methods, and there are no methods that address data leakage between training and test data in order to analyze model generalizability. RESULT: In this work, we present the creation of a new evaluation methodology using a train/test split that allows for the evaluation of machine learning models at varying degrees of structural similarity between training and test sets. We also introduce a training and evaluation framework that measures prediction accuracy on domain-inspired annotation and retrieval metrics designed to mirror real-world applications. We further show how two alternative training methods that leverage MS specific insights (e.g., similar instrumentation, collision energy, adduct) affect method performance and demonstrate the orthogonality of the proposed metrics. We especially highlight the role that collision energy plays in prediction errors. Finally, we release a continually updated version of our dataset online along with our data cleaning and splitting pipelines for community use. CONCLUSION: It is our hope that this benchmark will serve as the basis of development for future machine learning approaches in MS/MS similarity and facilitate comparison between models. We anticipate that the introduced set of evaluation metrics allows for a better reflection of practical performance.

Klíčová slova
Benchmark, Machine learning, Mass spectrometry, Metabolomics, Spectral similarity measure,
MeSH
algoritmy MeSH
strojové učení * MeSH
tandemová hmotnostní spektrometrie * metody MeSH
Publikační typ
časopisecké články MeSH

Článek

A universal language for finding mass spectrometry data patterns

Nature methods. 2025 Jun ; 22 (6) : 1247-1254. [epub] 20250512

Nat Methods
ISSN 1548-7105 | 1548-7091
Zdroj

Despite being information rich, the vast majority of untargeted mass spectrometry data are underutilized; most analytes are not used for downstream interpretation or reanalysis after publication. The inability to dive into these rich raw mass spectrometry datasets is due to the limited flexibility and scalability of existing software tools. Here we introduce a new language, the Mass Spectrometry Query Language (MassQL), and an accompanying software ecosystem that addresses these issues by enabling the community to directly query mass spectrometry data with an expressive set of user-defined mass spectrometry patterns. Illustrated by real-world examples, MassQL provides a data-driven definition of chemical diversity by enabling the reanalysis of all public untargeted metabolomics data, empowering scientists across many disciplines to make new discoveries. MassQL has been widely implemented in multiple open-source and commercial mass spectrometry analysis tools, which enhances the ability, interoperability and reproducibility of mining of mass spectrometry data for the research community.

Článek

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

Nature protocols. 2025 Jan ; 20 (1) : 92-162. [epub] 20240920

Nat Protoc
ISSN 1750-2799
Zdroj

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.

MeSH
chromatografie kapalinová metody MeSH
interpretace statistických dat MeSH
metabolomika * metody MeSH
software MeSH
tandemová hmotnostní spektrometrie metody MeSH
Publikační typ
časopisecké články MeSH

Článek

Empirically establishing drug exposure records directly from untargeted metabolomics data

bioRxiv. 2024 Oct 26 ; () : . [epub] 20241026

ISSN 2692-8205
Zdroj

Despite extensive efforts, extracting information on medication exposure from clinical records remains challenging. To complement this approach, we developed the tandem mass spectrometry (MS/MS) based GNPS Drug Library. This resource integrates MS/MS data for drugs and their metabolites/analogs with controlled vocabularies on exposure sources, pharmacologic classes, therapeutic indications, and mechanisms of action. It enables direct analysis of drug exposure and metabolism from untargeted metabolomics data independent of clinical records. Our library facilitates stratification of individuals in clinical studies based on the empirically detected medications, exemplified by drug-dependent microbiota-derived N-acyl lipid changes in a cohort with human immunodeficiency virus. The GNPS Drug Library holds potential for broader applications in drug discovery and precision medicine.

Článek

Reproducible MS/MS library cleaning pipeline in matchms

Journal of cheminformatics. 2024 Jul 29 ; 16 (1) : 88. [epub] 20240729

J Cheminform
ISSN 1758-2946
Zdroj

Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles.Scientific contributionThis pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries.

Klíčová slova
Library cleaning, Mass spectrometry, Metabolomics, Metadata, Python Package,
Publikační typ
časopisecké články MeSH

Článek

plantMASST - Community-driven chemotaxonomic digitization of plants

bioRxiv. 2024 May 14 ; () : . [epub] 20240514

ISSN 2692-8205
Zdroj

Understanding the distribution of hundreds of thousands of plant metabolites across the plant kingdom presents a challenge. To address this, we curated publicly available LC-MS/MS data from 19,075 plant extracts and developed the plantMASST reference database encompassing 246 botanical families, 1,469 genera, and 2,793 species. This taxonomically focused database facilitates the exploration of plant-derived molecules using tandem mass spectrometry (MS/MS) spectra. This tool will aid in drug discovery, biosynthesis, (chemo)taxonomy, and the evolutionary ecology of herbivore interactions.

Článek

Evaluation of Data-Dependent MS/MS Acquisition Parameters for Non-Targeted Metabolomics and Molecular Networking of Environmental Samples: Focus on the Q Exactive Platform

Analytical chemistry. 2023 Aug 29 ; 95 (34) : 12673-12682. [epub] 20230814

Anal Chem
ISSN 1520-6882 | 0003-2700
Zdroj

Non-targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a widely used tool for metabolomics analysis, enabling the detection and annotation of small molecules in complex environmental samples. Data-dependent acquisition (DDA) of product ion spectra is thereby currently one of the most frequently applied data acquisition strategies. The optimization of DDA parameters is central to ensuring high spectral quality, coverage, and number of compound annotations. Here, we evaluated the influence of 10 central DDA settings of the Q Exactive mass spectrometer on natural organic matter samples from ocean, river, and soil environments. After data analysis with classical and feature-based molecular networking using MZmine and GNPS, we compared the total number of network nodes, multivariate clustering, and spectrum quality-related metrics such as annotation and singleton rates, MS/MS placement, and coverage. Our results show that automatic gain control, microscans, mass resolving power, and dynamic exclusion are the most critical parameters, whereas collision energy, TopN, and isolation width had moderate and apex trigger, monoisotopic selection, and isotopic exclusion minor effects. The insights into the data acquisition ergonomics of the Q Exactive platform presented here can guide new users and provide them with initial method parameters, some of which may also be transferable to other sample types and MS platforms.

MeSH
chromatografie kapalinová metody MeSH
metabolomika * metody MeSH
tandemová hmotnostní spektrometrie * metody MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

Integrative analysis of multimodal mass spectrometry data in MZmine 3

Nature biotechnology. 2023 Apr ; 41 (4) : 447-449.

Nat Biotechnol
ISSN 1546-1696 | 1087-0156
Zdroj

Článek

Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment

Nature communications. 2021 Jun 22 ; 12 (1) : 3832. [epub] 20210622

Nat Commun
ISSN 2041-1723
Zdroj

Molecular networking connects mass spectra of molecules based on the similarity of their fragmentation patterns. However, during ionization, molecules commonly form multiple ion species with different fragmentation behavior. As a result, the fragmentation spectra of these ion species often remain unconnected in tandem mass spectrometry-based molecular networks, leading to redundant and disconnected sub-networks of the same compound classes. To overcome this bottleneck, we develop Ion Identity Molecular Networking (IIMN) that integrates chromatographic peak shape correlation analysis into molecular networks to connect and collapse different ion species of the same molecule. The new feature relationships improve network connectivity for structurally related molecules, can be used to reveal unknown ion-ligand complexes, enhance annotation within molecular networks, and facilitate the expansion of spectral reference libraries. IIMN is integrated into various open source feature finding tools and the GNPS environment. Moreover, IIMN-based spectral libraries with a broad coverage of ion species are publicly available.

MeSH
hmotnostní spektrometrie metody MeSH
internet MeSH
ionty chemie metabolismus MeSH
metabolické sítě a dráhy * MeSH
metabolomika metody MeSH
molekulární struktura MeSH
reprodukovatelnost výsledků MeSH
software MeSH
výpočetní biologie metody MeSH
zvířata MeSH
Check Tag
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
Research Support, N.I.H., Extramural MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH
Názvy látek
ionty MeSH

Článek

Feature-based molecular networking in the GNPS analysis environment

Nature methods. 2020 Sep ; 17 (9) : 905-908. [epub] 20200824

Nat Methods
ISSN 1548-7105 | 1548-7091
Zdroj

Molecular networking has become a key method to visualize and annotate the chemical space in non-targeted mass spectrometry data. We present feature-based molecular networking (FBMN) as an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools. FBMN enables quantitative analysis and resolution of isomers, including from ion mobility spectrometry.

MeSH
biologické přípravky chemie MeSH
databáze faktografické MeSH
hmotnostní spektrometrie * MeSH
metabolomika metody MeSH
software MeSH
výpočetní biologie metody MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
Research Support, N.I.H., Extramural MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH
Názvy látek
biologické přípravky MeSH

* Zobrazit nápovědu

Upřesnit dle MeSH