Comparative evaluation of set-level techniques in predictive classification of gene expression samples

. 2012 Jun 25 ; 13 Suppl 10 (Suppl 10) : S15. [epub] 20120625

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid22759420
Odkazy

PubMed 22759420
PubMed Central PMC3382436
DOI 10.1186/1471-2105-13-s10-s15
PII: 1471-2105-13-S10-S15
Knihovny.cz E-zdroje

BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.

Zobrazit více v PubMed

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gilette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43):15545–50. doi: 10.1073/pnas.0506580102. PubMed DOI PMC

Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007. PubMed

Dinu I. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics. 2007. PubMed PMC

Holec M, Zelezny F, Klema J, Tolar J. The 5th International Symposium on Bioinformatics Research and Applications (ISBRA 2009) Springer; 2009. Integrating Multiple-Platform Expression Data through Gene Set Features.

Mootha V, Lindgren C. et al.SL: PGC-1-alpha-responsive genes involved in oxidative phosphorylation are coorinately down regulated in human diabetes. Nature Genetics. 2003;34:267–273. doi: 10.1038/ng1180. PubMed DOI

Huang DWW, Sherman BTT, Lempicki RAA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research. 2008. PubMed PMC

Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics. 2005;6:225. doi: 10.1186/1471-2105-6-225. PubMed DOI PMC

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; 2001.

Golub TR, Slonim DK, Tamayo P, C Huard MG, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science. 1999;286(5439):531–537. doi: 10.1126/science.286.5439.531. PubMed DOI

Mitchell T. Machine Learning. McGraw Hill; 1997.

Vapnik VN. The Nature of Statistical Learning. Springer; 2000.

Gamberger D, Lavrac N, Zelezny F, Tolar J. Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. Journal of Biomedical Informatics. 2004;34(4):269–284. PubMed

Zintzaras E, Kowald A. Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data. Cell Cycle. 2010;40(5):519–24. PubMed

Huang J, Fang H, Tong W, X XF. Decision forest for classification of gene expression data. Cell Cycle. 2010. in press . PubMed

Liu J, Hughes-Oliver JM, Menius JA Jr. Domain-enhanced analysis of microarray data using GO annotations. Bioinformatics. 2007;23(10):1225–34. doi: 10.1093/bioinformatics/btm092. PubMed DOI

Chen X, Wang L, Smith JD, Zhang B. Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes. Bioinformatics. 2008;24(21):2474–81. doi: 10.1093/bioinformatics/btn458. PubMed DOI PMC

Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Rao S. Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics. 2005;6:58+. doi: 10.1186/1471-2105-6-58. PubMed DOI PMC

Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JA, Marks JR, Dressman HK, West M, Nevins JR. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2005;439(7074):353–357. PubMed

Wong DJ, Liu H, Ridky TW, Cassarino D, Segal E, Chang HY. Module map of stem cell genes guides creation of epithelial cancer stem cells. Cell stem cell. 2008;2(4):333–344. doi: 10.1016/j.stem.2008.02.009. PubMed DOI PMC

Lee E, Chuang HYY, Kim JWW, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS computational biology. 2008;4(11):e1000217+. PubMed PMC

Abraham G, Kowalczyk A, Loi S, Haviv I, Zobel J. Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinformatics. 2010;11:277+. doi: 10.1186/1471-2105-11-277. PubMed DOI PMC

Mramor M, Toplak M, Leban G, Curk T, Demsar J, Zupan B. On utility of gene set signatures in gene expression-based cancer class prediction. JMLR Workshop and Conference Proceedings Volume 8: Machine Learning in Systems Biology. 2010. pp. 55–64.

Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining. Kluwer; 1998.

Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nature reviews. Genetics. 2006;7:55–65. doi: 10.1038/nrg1749. PubMed DOI

Demšar J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research. 2006;7:1–30.

Guyon I, Weston J, Barnhill S, Vapnik V. Gene Selection for Cancer Classification using Support Vector Machines. mlj. 2002;46:389–422.

Huang DW, Sherman BT, Lempick RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols. 2009;4:44–57. PubMed

Ho T. The random subspace method for constructing decision forests. Transactions on Pattern Analysis and Machine Intelligence. 1997;20(8):832–44.

Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R. A novel signaling pathway impact analysis. Bioinformatics. 2009;25:77–82. doi: 10.1093/bioinformatics/btp195. PubMed DOI PMC

Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. 2. Morgan Kaufmann, San Francisco; 2005.

Laiho P, Kokko A, Vanharanta S, Salovaara R, Sammalkorpi H, Järvinen H, Mecklin JP, Karttunen TJ, Tuppurainen K, Davalos V, Schwartz S, Arango D, Mäkinen MJ, Aaltonen LA. Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene. 2007;26(2):312–20. doi: 10.1038/sj.onc.1209778. PubMed DOI

Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics. 2002;30:41–7. doi: 10.1038/ng765. http://www.ncbi.nlm.nih.gov/pubmed/11731795 PubMed DOI

Farmer P, Bonnefoi H, Becette V, Tubiana-Hulin M, Fumoleau P, Larsimont D, Macgrogan G, Bergh J, Cameron D, Goldstein D, Duss S, Nicoulaz AL, Brisken C, Fiche M, Delorenzi M, Iggo R. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene. 2005;24(29):4660–71. doi: 10.1038/sj.onc.1208561. PubMed DOI

Cutcliffe C, Kersey D, Huang CC, Zeng Y, Walterhouse D, Perlman EJ. Clear cell sarcoma of the kidney: up-regulation of neural markers with activation of the sonic hedgehog and Akt pathways. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(22):7986–94. doi: 10.1158/1078-0432.CCR-05-1354. PubMed DOI

Burczynski ME, Peterson RL, Twine NC, Zuberek KA, Brodeur BJ, Casciotti L, Maganti V, Reddy PS, Strahs A, Immermann F, Spinelli W, Schwertschlag U, Slager AM, Cotreau MM, Dorner AJ. Molecular classification of Crohn's disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. The Journal of molecular diagnostics : JMD. 2006;8:51–61. doi: 10.2353/jmoldx.2006.050079. PubMed DOI PMC

Hippo Y, Taniguchi H, Tsutsumi S, Machida N, Chong JM, Fukayama M, Kodama T, Aburatani H. Global Gene Expression Analysis of Gastric Cancer by Oligonucleotide Microarrays. Cancer Res. 2002;62:233–240. PubMed

Freije WA, Castro-Vargas FE, Fang Z, Horvath S, Cloughesy T, Liau LM, Mischel PS, Nelson SF. Gene expression profiling of gliomas strongly predicts survival. Cancer research. 2004;64(18):6503–10. doi: 10.1158/0008-5472.CAN-04-0452. PubMed DOI

Sun L, Hui AM, Su Q, Vortmeyer A, Kotliarov Y, Pastorino S, Passaniti A, Menon J, Walling J, Bailey R, Rosenblum M, Mikkelsen T, Fine HA. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell. 2006;9(4):287–300. doi: 10.1016/j.ccr.2006.03.003. PubMed DOI

Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(24):13790–1375. doi: 10.1073/pnas.191502998. PubMed DOI PMC

Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8(8):816–824. PubMed

Spira A, Beane JE, Shah V, Steiling K, Liu G, Schembri F, Gilman S, Dumas YM, Calner P, Sebastiani P, Sridhar S, Beamis J, Lamb C, Anderson T, Gerry N, Keane J, Lenburg ME, Brody JS. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature medicine. 2007;13(3):361–6. doi: 10.1038/nm1556. PubMed DOI

Talantov D, Mazumder A, Yu JX, Briggs T, Jiang Y, Backus J, Atkins D, Wang Y. Novel genes associated with malignant melanoma but not benign melanocytic lesions. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(20):7234–42. doi: 10.1158/1078-0432.CCR-05-0683. PubMed DOI

Scherzer CR, Eklund AC, Morse LJ, Liao Z, Locascio JJ, Fefer D, Schwarzschild MA, Schlossmacher MG, Hauser MA, Vance JM, Sudarsky LR, Standaert DG, Growdon JH, Jensen RV, Gullans SR. Molecular markers of early Parkinson's disease based on gene expression in blood. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(3):955–60. doi: 10.1073/pnas.0610204104. PubMed DOI PMC

Dahia PLM, Ross KN, Wright ME, Hayashida CY, Santagata S, Barontini M, Kung AL, Sanso G, Powers JF, Tischler AS, Hodin R, Heitritter S, Moore F, Dluhy R, Sosa JA, Ocal IT, Benn DE, Marsh DJ, Robinson BG, Schneider K, Garber J, Arum SM, Korbonits M, Grossman A, Pigny P, Toledo SPA, Nosé V, Li C, Stiles CD. A HIF1alpha regulatory loop links hypoxia and mitochondrial signals in pheochromocytomas. PLoS genetics. 2005;1:72–80. PubMed PMC

Gordon GJ. Transcriptional profiling of mesothelioma using microarrays. Lung cancer (Amsterdam, Netherlands) 2005;49(Suppl 1):S99–S103. PubMed

Libalova H, Dostal MPR, Jr, Topinka J, Sram RJ. Gene Expression Profiling in Blood of Asthmatic Children Living in Polluted Region of the Czech Republic (Project AIRGEN) 10th International Conference on Environmental Mutagens. 2010.

Best CJM, Gillespie JW, Yi Y, Chandramouli GVR, Perlmutter MA, Gathright Y, Erickson HS, Georgevich L, Tangrea MA, Duray PH, González S, Velasco A, Linehan WM, Matusik RJ, Price DK, Figg WD, Emmert-Buck MR, Chuaqui RF. Molecular alterations in primary prostate cancer after androgen ablation therapy. Clinical cancer research : an official journal of the American Association for Cancer Research. 2005;11(19 Pt 1):6823–34. PubMed PMC

Yoon SS, Segal NH, Park PJ, Detwiller KY, Fernando NT, Ryeom SW, Brennan MF, Singer S. Angiogenic profile of soft tissue sarcomas based on analysis of circulating factors and microarray gene expression. The Journal of surgical research. 2006;135(2):282–90. doi: 10.1016/j.jss.2006.01.023. PubMed DOI

Carolan BJ, Heguy A, Harvey BG, Leopold PL, Ferris B, Crystal RG. Up-regulation of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene in human airway epithelium of cigarette smokers. Cancer research. 2006;66(22):10729–40. doi: 10.1158/0008-5472.CAN-06-2224. PubMed DOI

Kuriakose MA, Chen WT, He ZM, Sikora AG, Zhang P, Zhang ZY, Qiu WL, Hsu DF, McMunn-Coffran C, Brown SM, Elango EM, Delacure MD, Chen FA. Selection and validation of differentially expressed genes in head and neck cancer. Cellular and molecular life sciences : CMLS. 2004;61(11):1372–83. doi: 10.1007/s00018-004-4069-0. PubMed DOI PMC

Gashaw I, Grümmer R, Klein-Hitpass L, Dushaj O, Bergmann M, Brehm R, Grobholz R, Kliesch S, Neuvians TP, Schmid KW, von Ostau C, Winterhager E. Gene signatures of testicular seminoma with emphasis on expression of ets variant gene 4. Cellular and molecular life sciences : CMLS. 2005;62(19-20):2359–68. doi: 10.1007/s00018-005-5250-9. PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Novel gene sets improve set-level classification of prokaryotic gene expression data

. 2015 Oct 28 ; 16 () : 348. [epub] 20151028

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...