- 
             Je něco špatně v tomto záznamu ?
 
Comparative evaluation of set-level techniques in predictive classification of gene expression samples
M. Holec, J. Kléma, F. Zelezný, J. Tolar,
Jazyk angličtina Země Anglie, Velká Británie
Typ dokumentu časopisecké články, práce podpořená grantem
 NLK 
   
      BioMedCentral
   
    od 2000-01-12
   
      BioMedCentral Open Access
   
    od 2000
   
      Directory of Open Access Journals
   
    od 2000
   
      Free Medical Journals
   
    od 2000
   
      PubMed Central
   
    od 2000
   
      Europe PubMed Central
   
    od 2000
   
      ProQuest Central
   
    od 2009-01-01
   
      Open Access Digital Library
   
    od 2000-07-01
   
      Open Access Digital Library
   
    od 2000-01-01
   
      Open Access Digital Library
   
    od 2000-01-01
   
      Medline Complete (EBSCOhost)
   
    od 2000-01-01
   
      Health & Medicine (ProQuest)
   
    od 2009-01-01
   
      ROAD: Directory of Open Access Scholarly Resources
   
    od 2000
   
      Springer Nature OA/Free Journals
   
    od 2000-12-01
    
- MeSH
 - algoritmy * MeSH
 - Bayesova věta MeSH
 - rozhodovací stromy MeSH
 - stanovení celkové genové exprese metody MeSH
 - support vector machine MeSH
 - umělá inteligence * MeSH
 - výpočetní biologie metody MeSH
 - Publikační typ
 - časopisecké články MeSH
 - práce podpořená grantem MeSH
 
BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.
Citace poskytuje Crossref.org
- 000
 - 00000naa a2200000 a 4500
 
- 001
 - bmc13024385
 
- 003
 - CZ-PrNML
 
- 005
 - 20170110101941.0
 
- 007
 - ta
 
- 008
 - 130703s2012 enk f 000 0|eng||
 
- 009
 - AR
 
- 024 7_
 - $a 10.1186/1471-2105-13-S10-S15 $2 doi
 
- 035 __
 - $a (PubMed)22759420
 
- 040 __
 - $a ABA008 $b cze $d ABA008 $e AACR2
 
- 041 0_
 - $a eng
 
- 044 __
 - $a enk
 
- 100 1_
 - $a Holec, Matěj $u Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, 166 27, Czech Republic. $7 xx0209624
 
- 245 10
 - $a Comparative evaluation of set-level techniques in predictive classification of gene expression samples / $c M. Holec, J. Kléma, F. Zelezný, J. Tolar,
 
- 520 9_
 - $a BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.
 
- 650 12
 - $a algoritmy $7 D000465
 
- 650 12
 - $a umělá inteligence $7 D001185
 
- 650 _2
 - $a Bayesova věta $7 D001499
 
- 650 _2
 - $a výpočetní biologie $x metody $7 D019295
 
- 650 _2
 - $a rozhodovací stromy $7 D003663
 
- 650 _2
 - $a stanovení celkové genové exprese $x metody $7 D020869
 
- 650 _2
 - $a support vector machine $7 D060388
 
- 655 _2
 - $a časopisecké články $7 D016428
 
- 655 _2
 - $a práce podpořená grantem $7 D013485
 
- 700 1_
 - $a Kléma, Jiří $u -
 
- 700 1_
 - $a Zelezný, Filip $u -
 
- 700 1_
 - $a Tolar, Jakub $u -
 
- 773 0_
 - $w MED00008167 $t BMC bioinformatics $x 1471-2105 $g Roč. 13 Suppl 10(2012), s. S15
 
- 856 41
 - $u https://pubmed.ncbi.nlm.nih.gov/22759420 $y Pubmed
 
- 910 __
 - $a ABA008 $b sig $c sign $y a $z 0
 
- 990 __
 - $a 20130703 $b ABA008
 
- 991 __
 - $a 20170110102038 $b ABA008
 
- 999 __
 - $a ok $b bmc $g 988065 $s 822765
 
- BAS __
 - $a 3
 
- BAS __
 - $a PreBMC
 
- BMC __
 - $a 2012 $b 13 Suppl 10 $d S15 $i 1471-2105 $m BMC bioinformatics $n BMC Bioinformatics $x MED00008167
 
- LZP __
 - $a Pubmed-20130703