Detail
Článek
Článek online
FT
Medvik - BMČ
  • Je něco špatně v tomto záznamu ?

Comparative evaluation of set-level techniques in predictive classification of gene expression samples

M. Holec, J. Kléma, F. Zelezný, J. Tolar,

. 2012 ; 13 Suppl 10 () : S15.

Jazyk angličtina Země Anglie, Velká Británie

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/bmc13024385

BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc13024385
003      
CZ-PrNML
005      
20170110101941.0
007      
ta
008      
130703s2012 enk f 000 0|eng||
009      
AR
024    7_
$a 10.1186/1471-2105-13-S10-S15 $2 doi
035    __
$a (PubMed)22759420
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Holec, Matěj $u Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, 166 27, Czech Republic. $7 xx0209624
245    10
$a Comparative evaluation of set-level techniques in predictive classification of gene expression samples / $c M. Holec, J. Kléma, F. Zelezný, J. Tolar,
520    9_
$a BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.
650    12
$a algoritmy $7 D000465
650    12
$a umělá inteligence $7 D001185
650    _2
$a Bayesova věta $7 D001499
650    _2
$a výpočetní biologie $x metody $7 D019295
650    _2
$a rozhodovací stromy $7 D003663
650    _2
$a stanovení celkové genové exprese $x metody $7 D020869
650    _2
$a support vector machine $7 D060388
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Kléma, Jiří $u -
700    1_
$a Zelezný, Filip $u -
700    1_
$a Tolar, Jakub $u -
773    0_
$w MED00008167 $t BMC bioinformatics $x 1471-2105 $g Roč. 13 Suppl 10(2012), s. S15
856    41
$u https://pubmed.ncbi.nlm.nih.gov/22759420 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20130703 $b ABA008
991    __
$a 20170110102038 $b ABA008
999    __
$a ok $b bmc $g 988065 $s 822765
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2012 $b 13 Suppl 10 $d S15 $i 1471-2105 $m BMC bioinformatics $n BMC Bioinformatics $x MED00008167
LZP    __
$a Pubmed-20130703

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...