• Je něco špatně v tomto záznamu ?

Network-constrained forest for regularized classification of omics data

M. Anděl, J. Kléma, Z. Krejčík,

. 2015 ; 83 (-) : 88-97. [pub] 20150411

Jazyk angličtina Země Spojené státy americké

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/bmc16020881

Grantová podpora
NT14539 MZ0 CEP - Centrální evidence projektů

Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc16020881
003      
CZ-PrNML
005      
20190605085911.0
007      
ta
008      
160722s2015 xxu f 000 0|eng||
009      
AR
024    7_
$a 10.1016/j.ymeth.2015.04.006 $2 doi
024    7_
$a 10.1016/j.ymeth.2015.04.006 $2 doi
035    __
$a (PubMed)25872185
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxu
100    1_
$a Anděl, Michael $u Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic. Electronic address: andelmi2@fel.cvut.cz. $7 xx0209623
245    10
$a Network-constrained forest for regularized classification of omics data / $c M. Anděl, J. Kléma, Z. Krejčík,
520    9_
$a Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
650    _2
$a umělá inteligence $7 D001185
650    _2
$a výpočetní biologie $x metody $7 D019295
650    _2
$a genové regulační sítě $7 D053263
650    _2
$a lidé $7 D006801
650    _2
$a mikro RNA $x genetika $7 D035683
650    _2
$a messenger RNA $x genetika $7 D012333
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Kléma, Jiří, $u Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic. Electronic address: klema@fel.cvut.cz. $d 1971- $7 ntka172916
700    1_
$a Krejčík, Zdeněk $u Department of Molecular Genetics, Institute of Hematology and Blood Transfusion, U Nemocnice 1, Prague, Czech Republic. Electronic address: zdenek.krejcik@uhkt.cz. $7 xx0125786
773    0_
$w MED00005029 $t Methods (San Diego, Calif.) $x 1095-9130 $g Roč. 83, č. - (2015), s. 88-97
856    41
$u https://pubmed.ncbi.nlm.nih.gov/25872185 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20160722 $b ABA008
991    __
$a 20190605090047 $b ABA008
999    __
$a ok $b bmc $g 1155551 $s 945409
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2015 $b 83 $c - $d 88-97 $e 20150411 $i 1095-9130 $m Methods $n Methods $x MED00005029
GRA    __
$a NT14539 $p MZ0
LZP    __
$a Pubmed-20160722

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace