Finding semantic patterns in omics data using concept rule learning with an ontology-based refinement operator

. 2020 ; 13 () : 13. [epub] 20200901

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid32905086

BACKGROUND: Identification of non-trivial and meaningful patterns in omics data is one of the most important biological tasks. The patterns help to better understand biological systems and interpret experimental outcomes. A well-established method serving to explain such biological data is Gene Set Enrichment Analysis. However, this type of analysis is restricted to a specific type of evaluation. Abstracting from details, the analyst provides a sorted list of genes and ontological annotations of the individual genes; the method outputs a subset of ontological terms enriched in the gene list. Here, in contrary to enrichment analysis, we introduce a new tool/framework that allows for the induction of more complex patterns of 2-dimensional binary omics data. This extension allows to discover and describe semantically coherent biclusters. RESULTS: We present a new rapid method called sem1R that reveals interpretable hidden rules in omics data. These rules capture semantic differences between two classes: a target class as a collection of positive examples and a non-target class containing negative examples. The method is inspired by the CN2 rule learner and introduces a new refinement operator that exploits prior knowledge in the form of ontologies. In our work this knowledge serves to create accurate and interpretable rules. The novel refinement operator uses two reduction procedures: Redundant Generalization and Redundant Non-potential, both of which help to dramatically prune the rule space and consequently, speed-up the entire process of rule induction in comparison with the traditional refinement operator as is presented in CN2. CONCLUSIONS: Efficiency and effectivity of the novel refinement operator were tested on three real different gene expression datasets. Concretely, the Dresden Ovary Dataset, DISC, and m2816 were employed. The experiments show that the ontology-based refinement operator speeds-up the pattern induction drastically. The algorithm is written in C++ and is published as an R package available at http://github.com/fmalinka/sem1r.

Zobrazit více v PubMed

Stevens R, Goble CA, Bechhofer S. Ontology-based knowledge representation for bioinformatics. Brief Bioinform. 2000;1(4):398–414. doi: 10.1093/bib/1.4.398. PubMed DOI

Österlund T, Cvijovic M, Kristiansson E. Integrative analysis of omics data. Syst Biol. 2017;6:1.

Rajasundaram D, Selbig J. More effort—more results: recent advances in integrative ’omics’ data analysis. Curr Opin Plant Biol. 2016;30:57–61. doi: 10.1016/j.pbi.2015.12.010. PubMed DOI

Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, et al. The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25(11):1251. doi: 10.1038/nbt1346. PubMed DOI PMC

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Nat Acad Sci. 2005;102(43):15545–50. doi: 10.1073/pnas.0506580102. PubMed DOI PMC

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25. doi: 10.1038/75556. PubMed DOI PMC

Consortium GO. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2016;45(D1):331–8. PubMed PMC

Fuerkranz J., Gamberger D., Lavrac N. Foundations of Rule Learning. Heidelberg: Springer; 2012.

Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques. Emerg Artif Intell Appl Comput Eng. 2007;160:3–24.

Hvidsten TR, Lægreid A, Komorowski J. Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics. 2003;19(9):1116–23. doi: 10.1093/bioinformatics/btg047. PubMed DOI

Calzone L, Chabrier-Rivier N, Fages F, Soliman S. Transactions on Computational Systems Biology VI. Berlin: Springer; 2006. Machine learning biochemical networks from temporal logic properties.

Bellazzi R, Zupan B. Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform. 2008;77(2):81–97. doi: 10.1016/j.ijmedinf.2006.11.006. PubMed DOI

Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. Kegg: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2016;45(D1):353–61. doi: 10.1093/nar/gkw1092. PubMed DOI PMC

Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. Kegg as a reference resource for gene and protein annotation. Nucleic Acids Res. 2015;44(D1):457–62. doi: 10.1093/nar/gkv1070. PubMed DOI PMC

Kanehisa M, Goto S. Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. PubMed DOI PMC

Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, Feng G, Kibbe WA. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2011;40(D1):940–6. doi: 10.1093/nar/gkr972. PubMed DOI PMC

Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, Mungall CJ, Binder JX, Malone J, Vasant D, et al. Disease ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 2014;43(D1):1071–8. doi: 10.1093/nar/gku1011. PubMed DOI PMC

Miller GA. Wordnet: a lexical database for english. Commun ACM. 1995;38(11):39–41. doi: 10.1145/219717.219748. DOI

Suchanek FM, Kasneci G, Weikum G. Proceedings of the 16th International Conference on World Wide Web. New York: ACM; 2007. Yago: a core of semantic knowledge.

Clark P, Niblett T. The cn2 induction algorithm. Mach Learn. 1989;3(4):261–83.

Cohen WW. Machine Learning Proceedings 1995. San Francisco: Morgan Kaufmann; 1995. Fast effective rule induction.

Kléma J, Malinka F, železný F. Semantic biclustering for finding local, interpretable and predictive expression patterns. BMC Genomics. 2017;18(7):41. PubMed PMC

Clark P, Boswell R. Rule induction with cn2: Some recent improvements. In: European Working Session on Learning. Springer: 1991. p. 151–63. 10.1007/bfb0017011.

Friedman JH, Fisher NI. Bump hunting in high-dimensional data. Stat Comput. 1999;9(2):123–43. doi: 10.1023/A:1008894516817. DOI

De Raedt L. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. San Rafael: Morgan & Claypool Publishers; 2016.

žáková M, železný F. Exploiting term, predicate, and feature taxonomies in propositionalization and propositional rule learning. In: Machine Learning: ECML 2007. Springer: 2007. p. 798–805. 10.1007/978-3-540-74958-5_82.

Svatoš M, Šourek G, železnỳ F, Schockaert S, Kuželka O. International Conference on Inductive Logic Programming. Cham: Springer; 2017. Pruning hypothesis spaces using learned domain theories.

Russell SJ, Norvig P. Artificial Intelligence: A Modern Approach (2nd Edition) Upper Saddle River: Prentice Hall; 2002.

Michalski RS. Proceedings of the 5th International Symposium on Information Processing (FCIP-69) Bled: Vol. A3 (Switching Circuits); 1969. On the quasi-minimal solution of the general covering problem.

Borovec J, Kybic J. Asian Conference on Computer Vision. Cham: Springer; 2016. Binary pattern dictionary learning for gene expression representation in drosophila imaginal discs.

Costa M, Reeve S, Grumbling G, Osumi-Sutherland D. The drosophila anatomy ontology. J Biomed Semant. 2013;4(1):32. doi: 10.1186/2041-1480-4-32. PubMed DOI PMC

Jambor H, Surendranath V, Kalinka AT, Mejstrik P, Saalfeld S, Tomancak P. Systematic imaging reveals features and changing localization of mrnas in drosophila development. Elife. 2015; 4. 10.7554/elife.05003. PubMed PMC

Dresden Ovary Table. http://tomancak-srv1.mpi-cbg.de/DOT/main. Accessed 15 Feb 2016.

Petryszak R, Keays M, Tang YA, Fonseca NA, Barrera E, Burdett T, Füllgrabe A, Fuentes AM-P, Jupp S, Koskinen S, et al. Expression atlas update—an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 2015;44(D1):746–52. doi: 10.1093/nar/gkv1045. PubMed DOI PMC

Merkin J, Russell C, Chen P, Burge CB. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science. 2012;338(6114):1593–9. doi: 10.1126/science.1228186. PubMed DOI PMC

Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, Zhukova A, Brazma A, Parkinson H. Modeling sample variables with an experimental factor ontology. Bioinformatics. 2010;26(8):1112–8. doi: 10.1093/bioinformatics/btq099. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...