Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport (PILOT)
Jazyk angličtina Země Německo Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
KFO 5011
Deutsche Forschungsgemeinschaft (DFG)
IDs 322900939,454024652,432698239 445703531
Deutsche Forschungsgemeinschaft (DFG)
DFG-GE2811/3
Deutsche Forschungsgemeinschaft (DFG)
E:med Consortia Fibromap
Bundesministerium für Bildung und Forschung (BMBF)
STOP-FSGS-01GM2202C
Bundesministerium für Bildung und Forschung (BMBF)
No 101001791
EC | ERC | HORIZON EUROPE European Research Council (ERC)
PubMed
38177382
PubMed Central
PMC10883279
DOI
10.1038/s44320-023-00003-8
PII: 10.1038/s44320-023-00003-8
Knihovny.cz E-zdroje
- Klíčová slova
- Clustering, Disease Progression, Multi-scale Analysis, Optimal Transport, Wasserstein Distance,
- MeSH
- algoritmy * MeSH
- genomika * metody MeSH
- lidé MeSH
- shluková analýza MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Although clinical applications represent the next challenge in single-cell genomics and digital pathology, we still lack computational methods to analyze single-cell or pathomics data to find sample-level trajectories or clusters associated with diseases. This remains challenging as single-cell/pathomics data are multi-scale, i.e., a sample is represented by clusters of cells/structures, and samples cannot be easily compared with each other. Here we propose PatIent Level analysis with Optimal Transport (PILOT). PILOT uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. This allows us to perform unsupervised analysis at the sample level and uncover trajectories or cellular clusters associated with disease progression. We evaluate PILOT and competing approaches in single-cell genomics or pathomics studies involving various human diseases with up to 600 samples/patients and millions of cells or tissue structures. Our results demonstrate that PILOT detects disease-associated samples from large and complex single-cell or pathomics data. Moreover, PILOT provides a statistical approach to find changes in cell populations, gene expression, and tissue structures related to the trajectories or clusters supporting interpretation of predictions.
Department of Cardiovascular Sciences University of Leicester Leicester UK
Fondazione Ricerca Molinette Regina Margherita Children's University Hospital Torino Italy
Institute of Experimental Medicine and Systems Biology RWTH Aachen University Aachen Germany
Institute of Pathology RWTH Aachen University Medical School Aachen Germany
John Walls Renal Unit University Hospital of Leicester National Health Service Trust Leicester UK
Zobrazit více v PubMed
Albergante L, Mirkes E, Bac J, Chen H, Martin A, Faure L, Barillot E, Pinello L, Gorban A, Zinovyev A. Robust and scalable learning of complex intrinsic dataset geometry via ElPiGraph. Entropy. 2020;3:296. doi: 10.3390/e22030296. PubMed DOI PMC
Baghy K, Dezso K, László V, Fullár A, Péterfia B, Paku S, Nagy P, Schaff Z, Iozzo RV, Kovalszky I. Ablation of the decorin gene enhances experimental hepatic fibrosis and impairs hepatic healing in mice. Lab Invest. 2011;3:439–451. doi: 10.1038/labinvest.2010.172. PubMed DOI PMC
Bonneel N, Van De Panne M, Paris S, Heidrich W (2011) Displacement interpolation using Lagrangian mass transport. In: Proceedings of the 2011 SIGGRAPH Asia conference, pp 1–12
Bülow RD, Hölscher DL, Costa IG, Boor P. Extending the landscape of omics technologies by pathomics. npj Syst Biol Appl. 2023;1:38. doi: 10.1038/s41540-023-00301-9. PubMed DOI PMC
Berry T, Harlim J. Variable bandwidth diffusion kernels. Appl Comput Harmon Anal. 2016;1:68–96. doi: 10.1016/j.acha.2015.01.001. DOI
Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci USA. 2005;21:7426–7431. doi: 10.1073/pnas.0500334102. PubMed DOI PMC
Cao J, O’Day DR, Pliner HA, Kingsley PD, Deng M, Daza RM, Zager MA, Aldinger KA, Blecher-Gonen R, Zhang F. A human cell atlas of fetal gene expression. Science. 2020;6518:eaba7721. doi: 10.1126/science.aba7721. PubMed DOI PMC
Cain A, Taga M, McCabe C, Green GS, Hekselman I, White CC, Lee DI, Gaur P, Rozenblatt-Rosen O, Zhang F et al (2023) Multicellular communities are perturbed in the aging human brain and Alzheimer’s disease. Nat Neurosci 26:1267–1280 PubMed PMC
Coifman RR, Lafon S. Diffusion maps. Appl Comput Harmon Anal. 2006;1:5–30. doi: 10.1016/j.acha.2006.04.006. DOI
Coppo R, Troyanov S, Bellur S, Cattran D, Cook HT, Feehally J, Roberts ISD, Morando L, Camilla R, Tesar V. Validation of the Oxford classification of IgA nephropathy in cohorts with different presentations and treatments. Kidney Int. 2014;4:828–836. doi: 10.1038/ki.2014.63. PubMed DOI PMC
Chen WS, Zivanovic N, van DD, Wolf G, Bodenmiller B, Krishnaswamy S. Uncovering axes of variation among single-cell cancer specimens. Nat Methods. 2020;3:302–310. doi: 10.1038/s41592-019-0689-z. PubMed DOI PMC
Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
Flamary R, Courty N, Gramfort A, Alaya MZ, Boisbunon A, Chambon S, Chapel L, Corenflos A, Fatras K. POT: python optimal transport. J Mach Learn Res. 2021;78:1–8.
Flores, ROR, Lanzer JD, Dimitrov D, Velten B, Saez-Rodruiguez J (2023) Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12:e93161. 10.7554/eLife.93161 PubMed PMC
Hie B, Bryson B, Berger B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol. 2019;6:685–691. doi: 10.1038/s41587-019-0113-3. PubMed DOI PMC
Hölscher DL, Bouteldja N, Joodaki M, Russo ML, Lan YC, Sadr AV, Cheng M, Tesar V, Stillfried SV, Klinkhammer BM. Next-Generation Morphometry for pathomics-data mining in histopathology. Nat Commun. 2023;1:470. doi: 10.1038/s41467-023-36173-0. PubMed DOI PMC
Han G, Deng Q, Marques-Piubelli ML, Dai E, Dang M, Ma MCJ, Li X, Yang H, Henderson J, Kudryashova O. Follicular lymphoma microenvironment characteristics associated with tumor cell mutations and MHC class II expression. Blood Cancer Discov. 2022;5:428–443. doi: 10.1158/2643-3230.BCD-21-0075. PubMed DOI PMC
Hrovatin K, Bastidas-Ponce A, Bakhti M, Zappia L, Buttner M, Sallino C, Sterr M, Bottcher A, Migliorini A, Lickert H et al (2022) Delineating mouse β-cell identity during lifetime and in diabetes with a single cell atlas. Nature Metabolism 5:1615–1637. 10.1038/s42255-023-00876-x PubMed PMC
Hill KE, Lovett BM, Schwarzbauer JE (2022) Heparan sulfate is necessary for the early formation of nascent fibronectin and collagen I fibrils at matrix assembly sites. J Biol Chem 298(1):101479. 10.1016/j.jbc.2021.101479 PubMed PMC
Huber PJ (1965) A robust version of the probability ratio test. Ann Math Stat 36:1753–1758
Huber PJ (1992) Robust estimation of a location parameter. In: Breakthroughs in statistics, pp 492–518
Hershberger RE, Norton N, Morales A, Li D, Siegfried JD, Gonzalez-Quintana J. Coding sequence rare variants identified in MYBPC3, MYH6, TPM1, TNNC1, and TNNI3 from 312 patients with familial or idiopathic dilated cardiomyopathy. Circ Cardiovasc Genet. 2010;2:155–161. doi: 10.1161/CIRCGENETICS.109.912345. PubMed DOI PMC
Hao Y, Hao S, Andersen-Nissen E, Mauck IIIWM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M. Integrated analysis of multimodal single-cell data. Cell. 2021;13:3573–3587. doi: 10.1016/j.cell.2021.04.048. PubMed DOI PMC
Harrell EF (2001) Regression modeling strategies. Springer-Verlag, Berlin, Heidelberg
Isaka Y, Brees DK, Ikegaya K, Kaneda Y, Imai E, Noble NA, Border WA (1996) Gene therapy by skeletal muscle expression of decorin prevents fibrotic disease in rat kidney. Nat Med 2:418–423 PubMed
Jiang J, Burgon PG, Wakimoto H, Onoue K, Gorham JM, O’Meara CC, Fomovsky G, McConnell BK, Lee RT, Seidman JG. Cardiac myosin binding protein C regulates postnatal myocyte cytokinesis. Proc Natl Acad Sci USA. 2015;29:9046–9051. doi: 10.1073/pnas.1511004112. PubMed DOI PMC
Kuppe C, Ramirez FloresRO, Li Z, Hannani M, Tanevski J, Halder M, Cheng M, Ziegler S, Zhang X, Preisker F. Spatial multi-omic map of human myocardial infarction. Nature. 2020;6987:766–777. PubMed PMC
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;12:1289–1296. doi: 10.1038/s41592-019-0619-0. PubMed DOI PMC
Kuchroo M, Huang J, Wong P, Grenier JC, Shung D, Tong A, Lucas C, Klein J, Burkhardt DB, Gigante S. Multiscale PHATE identifies multimodal signatures of COVID-19. Nat Biotechnol. 2022;5:681–691. doi: 10.1038/s41587-021-01186-x. PubMed DOI PMC
Lublin FD, Reingold SC. Defining the clinical course of multiple sclerosis: results of an international survey. Neurology. 1996;4:907–911. doi: 10.1212/WNL.46.4.907. PubMed DOI
Lake BB, Menon R, Winfree S, Hu Q, Ferreira RM, Kalhor K, Barwinska D, Otto EA, Ferkowicz M, Diep D et al (2023) An atlas of healthy and injured cell states and niches in the human kidney. Nature 619:585–594. 10.1038/s41586-023-05769-3 PubMed PMC
Liu J, Vinck M. Improved visualization of high-dimensional data using the distance-of-distance transformation. PLoS Comput Biol. 2022;12:e1010764. doi: 10.1371/journal.pcbi.1010764. PubMed DOI PMC
Lamber EP, Guicheney P, Pinotsis N. The role of the M-band myomesin proteins in muscle integrity and cardiac disease. J Biomed Sci. 2022;1:18. doi: 10.1186/s12929-022-00801-6. PubMed DOI PMC
Moon KR, van DD, Wang Z, Gigante S, Burkhardt DB, Chen WS, Yim K, van denElzenA, Hirn MJ, Coifman RR, Ivanova NB, Wolf G, Krishnaswamy S. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol. 2019;12:1482–1492. doi: 10.1038/s41587-019-0336-3. PubMed DOI PMC
Marx V. How single-cell multi-omics builds relationships. Nat Methods. 2022;2:142–146. doi: 10.1038/s41592-022-01392-8. PubMed DOI PMC
Perez RK, Gordon MG, Subramaniam M, Kim MC, Hartoularos GC, Targ S, Sun Y, Ogorodnikov A, Bueno R, Lu A. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science. 2022;6589:eabf1970. doi: 10.1126/science.abf1970. PubMed DOI PMC
Peyré G, Cuturi M. Computational optimal transport. Found Trend Mach Learn. 2019;5-6:1–257.
Peng J, Sun B-F, Chen C-Y, Zhou J-Y, Chen Y-S, Chen H, Liu L, Huang D, Jiang J, Cui G-S. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;9:725–738. doi: 10.1038/s41422-019-0195-y. PubMed DOI PMC
Polanski K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park JE. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics. 2020;3:964–965. doi: 10.1093/bioinformatics/btz625. PubMed DOI PMC
Ravindra N, Sehanobish A, Pappalardo JL, Hafler DA, van Dijk D (2020) Disease state prediction from single-cell data using graph attention networks. In: Proceedings of the ACM conference on health, inference, and learning, pp 121–130
Reimand, J, Kull, M, Peterson, H, Hansen, J, Vilo, J (2007) g: Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res (Suppl 2) W193–W200 PubMed PMC
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;7:e47. doi: 10.1093/nar/gkv007. PubMed DOI PMC
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;336:846–850. doi: 10.1080/01621459.1971.10482356. DOI
Ren X, Wen W, Fan X, Hou W, Su B, Cai P, Li J, Liu Y, Tang F, Zhang F. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell. 2021;7:1895–1913. doi: 10.1016/j.cell.2021.01.053. PubMed DOI PMC
Rubner Y, Tomasi C, Guibas LJ. The earth mover’s distance as a metric for image retrieval. Int J Comput Vis. 2000;2:99–121. doi: 10.1023/A:1026543900054. DOI
Sikkema L, Ramírez-Suástegui C, Strobl DC, Gillett TE, Zappia L, Madissoon E, Markov NS, Zaragosi L-E, Ji Y, Ansari M. An integrated cell atlas of the lung in health and disease. Nat Med. 2023;6:1563–1577. doi: 10.1038/s41591-023-02327-2. PubMed DOI PMC
Sklavenitis-Pistofidis R, Getz G, Ghobrial I. Single-cell RNA sequencing: one step closer to the clinic. Nat Med. 2021;3:375–376. doi: 10.1038/s41591-021-01276-y. PubMed DOI
Stephenson E, Reynolds G, Botting RA, Calero-Nieto FJ, Morgan MD, Tuong ZK, Bach K, Sungnak W, Worlock KB, Yoshida M. Single-cell multi-omics analysis of the immune response in COVID-19. Nat Med. 2021;5:904–916. doi: 10.1038/s41591-021-01329-2. PubMed DOI PMC
Salcher S, Sturm G, Horvath L, Untergasser G, Kuempers C, Fotakis G, Panizzolo E, Martowicz A, Trebo M, Pall G. High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer Cell. 2022;12:1503–1520. doi: 10.1016/j.ccell.2022.10.008. PubMed DOI PMC
Shah VM, Sheppard BC, Sears RC, Alani AWG. Hypoxia: friend or foe for drug delivery in pancreatic cancer. Cancer Lett. 2020;1:63–70. doi: 10.1016/j.canlet.2020.07.041. PubMed DOI PMC
Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;4:381–386. doi: 10.1038/nbt.2859. PubMed DOI PMC
Taniguchi K, Takeya R, Suetsugu S, Kan-o M, Narusawa M, Shiose A, Tominaga R, Sumimoto H. Mammalian formin Fhod3 regulates actin assembly and sarcomere organization in striated muscles. J Biol Chem. 2009;43:29873–29881. doi: 10.1074/jbc.M109.059303. PubMed DOI PMC
Tabula Sapiens Consortium. Jones RC, Karkanias J, Krasnow MA, Pisco AO, Quake SR, Salzman J, Yosef N, Bulthaup B, Brown P. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022;6594:eabl4896. doi: 10.1126/science.abl4896. PubMed DOI PMC
Traag VA, Waltman L, Van EckNJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;1:5233. doi: 10.1038/s41598-019-41695-z. PubMed DOI PMC
Van den Berge K, Roux de Bézieux H, Street K, Saelens W, Cannoodt R, Saeys Y, Dudoit S, Clement L (2020) Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun 11:1201 PubMed PMC
Witten DM (2011) Classification and clustering of sequencing data using a Poisson model. Ann Appl Stat 5:2493–2518
Zhang Q, Wang L, Wang S, Cheng H, Xu L, Pei G, Wang Y, Fu C, Jiang Y, He C, Wei Q. Signaling pathways and targeted therapy for myocardial infarction. Signal Transduct Target Ther. 2022;1:78. doi: 10.1038/s41392-022-00925-z. PubMed DOI PMC