Comparative evaluation of set-level techniques in predictive classification of gene expression samples
Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
22759420
PubMed Central
PMC3382436
DOI
10.1186/1471-2105-13-s10-s15
PII: 1471-2105-13-S10-S15
Knihovny.cz E-zdroje
- MeSH
- algoritmy * MeSH
- Bayesova věta MeSH
- rozhodovací stromy MeSH
- stanovení celkové genové exprese metody MeSH
- support vector machine MeSH
- umělá inteligence * MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.
Zobrazit více v PubMed
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gilette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43):15545–50. doi: 10.1073/pnas.0506580102. PubMed DOI PMC
Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007. PubMed
Dinu I. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics. 2007. PubMed PMC
Holec M, Zelezny F, Klema J, Tolar J. The 5th International Symposium on Bioinformatics Research and Applications (ISBRA 2009) Springer; 2009. Integrating Multiple-Platform Expression Data through Gene Set Features.
Mootha V, Lindgren C. et al.SL: PGC-1-alpha-responsive genes involved in oxidative phosphorylation are coorinately down regulated in human diabetes. Nature Genetics. 2003;34:267–273. doi: 10.1038/ng1180. PubMed DOI
Huang DWW, Sherman BTT, Lempicki RAA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research. 2008. PubMed PMC
Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics. 2005;6:225. doi: 10.1186/1471-2105-6-225. PubMed DOI PMC
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; 2001.
Golub TR, Slonim DK, Tamayo P, C Huard MG, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science. 1999;286(5439):531–537. doi: 10.1126/science.286.5439.531. PubMed DOI
Mitchell T. Machine Learning. McGraw Hill; 1997.
Vapnik VN. The Nature of Statistical Learning. Springer; 2000.
Gamberger D, Lavrac N, Zelezny F, Tolar J. Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. Journal of Biomedical Informatics. 2004;34(4):269–284. PubMed
Zintzaras E, Kowald A. Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data. Cell Cycle. 2010;40(5):519–24. PubMed
Huang J, Fang H, Tong W, X XF. Decision forest for classification of gene expression data. Cell Cycle. 2010. in press . PubMed
Liu J, Hughes-Oliver JM, Menius JA Jr. Domain-enhanced analysis of microarray data using GO annotations. Bioinformatics. 2007;23(10):1225–34. doi: 10.1093/bioinformatics/btm092. PubMed DOI
Chen X, Wang L, Smith JD, Zhang B. Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes. Bioinformatics. 2008;24(21):2474–81. doi: 10.1093/bioinformatics/btn458. PubMed DOI PMC
Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Rao S. Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics. 2005;6:58+. doi: 10.1186/1471-2105-6-58. PubMed DOI PMC
Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JA, Marks JR, Dressman HK, West M, Nevins JR. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2005;439(7074):353–357. PubMed
Wong DJ, Liu H, Ridky TW, Cassarino D, Segal E, Chang HY. Module map of stem cell genes guides creation of epithelial cancer stem cells. Cell stem cell. 2008;2(4):333–344. doi: 10.1016/j.stem.2008.02.009. PubMed DOI PMC
Lee E, Chuang HYY, Kim JWW, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS computational biology. 2008;4(11):e1000217+. PubMed PMC
Abraham G, Kowalczyk A, Loi S, Haviv I, Zobel J. Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinformatics. 2010;11:277+. doi: 10.1186/1471-2105-11-277. PubMed DOI PMC
Mramor M, Toplak M, Leban G, Curk T, Demsar J, Zupan B. On utility of gene set signatures in gene expression-based cancer class prediction. JMLR Workshop and Conference Proceedings Volume 8: Machine Learning in Systems Biology. 2010. pp. 55–64.
Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining. Kluwer; 1998.
Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nature reviews. Genetics. 2006;7:55–65. doi: 10.1038/nrg1749. PubMed DOI
Demšar J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research. 2006;7:1–30.
Guyon I, Weston J, Barnhill S, Vapnik V. Gene Selection for Cancer Classification using Support Vector Machines. mlj. 2002;46:389–422.
Huang DW, Sherman BT, Lempick RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols. 2009;4:44–57. PubMed
Ho T. The random subspace method for constructing decision forests. Transactions on Pattern Analysis and Machine Intelligence. 1997;20(8):832–44.
Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R. A novel signaling pathway impact analysis. Bioinformatics. 2009;25:77–82. doi: 10.1093/bioinformatics/btp195. PubMed DOI PMC
Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. 2. Morgan Kaufmann, San Francisco; 2005.
Laiho P, Kokko A, Vanharanta S, Salovaara R, Sammalkorpi H, Järvinen H, Mecklin JP, Karttunen TJ, Tuppurainen K, Davalos V, Schwartz S, Arango D, Mäkinen MJ, Aaltonen LA. Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene. 2007;26(2):312–20. doi: 10.1038/sj.onc.1209778. PubMed DOI
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics. 2002;30:41–7. doi: 10.1038/ng765. http://www.ncbi.nlm.nih.gov/pubmed/11731795 PubMed DOI
Farmer P, Bonnefoi H, Becette V, Tubiana-Hulin M, Fumoleau P, Larsimont D, Macgrogan G, Bergh J, Cameron D, Goldstein D, Duss S, Nicoulaz AL, Brisken C, Fiche M, Delorenzi M, Iggo R. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene. 2005;24(29):4660–71. doi: 10.1038/sj.onc.1208561. PubMed DOI
Cutcliffe C, Kersey D, Huang CC, Zeng Y, Walterhouse D, Perlman EJ. Clear cell sarcoma of the kidney: up-regulation of neural markers with activation of the sonic hedgehog and Akt pathways. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(22):7986–94. doi: 10.1158/1078-0432.CCR-05-1354. PubMed DOI
Burczynski ME, Peterson RL, Twine NC, Zuberek KA, Brodeur BJ, Casciotti L, Maganti V, Reddy PS, Strahs A, Immermann F, Spinelli W, Schwertschlag U, Slager AM, Cotreau MM, Dorner AJ. Molecular classification of Crohn's disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. The Journal of molecular diagnostics : JMD. 2006;8:51–61. doi: 10.2353/jmoldx.2006.050079. PubMed DOI PMC
Hippo Y, Taniguchi H, Tsutsumi S, Machida N, Chong JM, Fukayama M, Kodama T, Aburatani H. Global Gene Expression Analysis of Gastric Cancer by Oligonucleotide Microarrays. Cancer Res. 2002;62:233–240. PubMed
Freije WA, Castro-Vargas FE, Fang Z, Horvath S, Cloughesy T, Liau LM, Mischel PS, Nelson SF. Gene expression profiling of gliomas strongly predicts survival. Cancer research. 2004;64(18):6503–10. doi: 10.1158/0008-5472.CAN-04-0452. PubMed DOI
Sun L, Hui AM, Su Q, Vortmeyer A, Kotliarov Y, Pastorino S, Passaniti A, Menon J, Walling J, Bailey R, Rosenblum M, Mikkelsen T, Fine HA. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell. 2006;9(4):287–300. doi: 10.1016/j.ccr.2006.03.003. PubMed DOI
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(24):13790–1375. doi: 10.1073/pnas.191502998. PubMed DOI PMC
Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8(8):816–824. PubMed
Spira A, Beane JE, Shah V, Steiling K, Liu G, Schembri F, Gilman S, Dumas YM, Calner P, Sebastiani P, Sridhar S, Beamis J, Lamb C, Anderson T, Gerry N, Keane J, Lenburg ME, Brody JS. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature medicine. 2007;13(3):361–6. doi: 10.1038/nm1556. PubMed DOI
Talantov D, Mazumder A, Yu JX, Briggs T, Jiang Y, Backus J, Atkins D, Wang Y. Novel genes associated with malignant melanoma but not benign melanocytic lesions. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(20):7234–42. doi: 10.1158/1078-0432.CCR-05-0683. PubMed DOI
Scherzer CR, Eklund AC, Morse LJ, Liao Z, Locascio JJ, Fefer D, Schwarzschild MA, Schlossmacher MG, Hauser MA, Vance JM, Sudarsky LR, Standaert DG, Growdon JH, Jensen RV, Gullans SR. Molecular markers of early Parkinson's disease based on gene expression in blood. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(3):955–60. doi: 10.1073/pnas.0610204104. PubMed DOI PMC
Dahia PLM, Ross KN, Wright ME, Hayashida CY, Santagata S, Barontini M, Kung AL, Sanso G, Powers JF, Tischler AS, Hodin R, Heitritter S, Moore F, Dluhy R, Sosa JA, Ocal IT, Benn DE, Marsh DJ, Robinson BG, Schneider K, Garber J, Arum SM, Korbonits M, Grossman A, Pigny P, Toledo SPA, Nosé V, Li C, Stiles CD. A HIF1alpha regulatory loop links hypoxia and mitochondrial signals in pheochromocytomas. PLoS genetics. 2005;1:72–80. PubMed PMC
Gordon GJ. Transcriptional profiling of mesothelioma using microarrays. Lung cancer (Amsterdam, Netherlands) 2005;49(Suppl 1):S99–S103. PubMed
Libalova H, Dostal MPR, Jr, Topinka J, Sram RJ. Gene Expression Profiling in Blood of Asthmatic Children Living in Polluted Region of the Czech Republic (Project AIRGEN) 10th International Conference on Environmental Mutagens. 2010.
Best CJM, Gillespie JW, Yi Y, Chandramouli GVR, Perlmutter MA, Gathright Y, Erickson HS, Georgevich L, Tangrea MA, Duray PH, González S, Velasco A, Linehan WM, Matusik RJ, Price DK, Figg WD, Emmert-Buck MR, Chuaqui RF. Molecular alterations in primary prostate cancer after androgen ablation therapy. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(19 Pt 1):6823–34. PubMed PMC
Yoon SS, Segal NH, Park PJ, Detwiller KY, Fernando NT, Ryeom SW, Brennan MF, Singer S. Angiogenic profile of soft tissue sarcomas based on analysis of circulating factors and microarray gene expression. The Journal of surgical research. 2006;135(2):282–90. doi: 10.1016/j.jss.2006.01.023. PubMed DOI
Carolan BJ, Heguy A, Harvey BG, Leopold PL, Ferris B, Crystal RG. Up-regulation of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene in human airway epithelium of cigarette smokers. Cancer research. 2006;66(22):10729–40. doi: 10.1158/0008-5472.CAN-06-2224. PubMed DOI
Kuriakose MA, Chen WT, He ZM, Sikora AG, Zhang P, Zhang ZY, Qiu WL, Hsu DF, McMunn-Coffran C, Brown SM, Elango EM, Delacure MD, Chen FA. Selection and validation of differentially expressed genes in head and neck cancer. Cellular and molecular life sciences : CMLS. 2004;61(11):1372–83. doi: 10.1007/s00018-004-4069-0. PubMed DOI PMC
Gashaw I, Grümmer R, Klein-Hitpass L, Dushaj O, Bergmann M, Brehm R, Grobholz R, Kliesch S, Neuvians TP, Schmid KW, von Ostau C, Winterhager E. Gene signatures of testicular seminoma with emphasis on expression of ets variant gene 4. Cellular and molecular life sciences : CMLS. 2005;62(19-20):2359–68. doi: 10.1007/s00018-005-5250-9. PubMed DOI PMC
Novel gene sets improve set-level classification of prokaryotic gene expression data