-
Je něco špatně v tomto záznamu ?
Network-constrained forest for regularized classification of omics data
M. Anděl, J. Kléma, Z. Krejčík,
Jazyk angličtina Země Spojené státy americké
Typ dokumentu časopisecké články, práce podpořená grantem
Grantová podpora
NT14539
MZ0
CEP - Centrální evidence projektů
- MeSH
- genové regulační sítě MeSH
- lidé MeSH
- messenger RNA genetika MeSH
- mikro RNA genetika MeSH
- umělá inteligence MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc16020881
- 003
- CZ-PrNML
- 005
- 20190605085911.0
- 007
- ta
- 008
- 160722s2015 xxu f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.ymeth.2015.04.006 $2 doi
- 024 7_
- $a 10.1016/j.ymeth.2015.04.006 $2 doi
- 035 __
- $a (PubMed)25872185
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxu
- 100 1_
- $a Anděl, Michael $u Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic. Electronic address: andelmi2@fel.cvut.cz. $7 xx0209623
- 245 10
- $a Network-constrained forest for regularized classification of omics data / $c M. Anděl, J. Kléma, Z. Krejčík,
- 520 9_
- $a Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
- 650 _2
- $a umělá inteligence $7 D001185
- 650 _2
- $a výpočetní biologie $x metody $7 D019295
- 650 _2
- $a genové regulační sítě $7 D053263
- 650 _2
- $a lidé $7 D006801
- 650 _2
- $a mikro RNA $x genetika $7 D035683
- 650 _2
- $a messenger RNA $x genetika $7 D012333
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Kléma, Jiří, $u Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic. Electronic address: klema@fel.cvut.cz. $d 1971- $7 ntka172916
- 700 1_
- $a Krejčík, Zdeněk $u Department of Molecular Genetics, Institute of Hematology and Blood Transfusion, U Nemocnice 1, Prague, Czech Republic. Electronic address: zdenek.krejcik@uhkt.cz. $7 xx0125786
- 773 0_
- $w MED00005029 $t Methods (San Diego, Calif.) $x 1095-9130 $g Roč. 83, č. - (2015), s. 88-97
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/25872185 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20160722 $b ABA008
- 991 __
- $a 20190605090047 $b ABA008
- 999 __
- $a ok $b bmc $g 1155551 $s 945409
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2015 $b 83 $c - $d 88-97 $e 20150411 $i 1095-9130 $m Methods $n Methods $x MED00005029
- GRA __
- $a NT14539 $p MZ0
- LZP __
- $a Pubmed-20160722