Semantic biclustering for finding local, interpretable and predictive expression patterns

. 2017 Oct 16 ; 18 (Suppl 7) : 752. [epub] 20171016

Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid29513193
Odkazy

PubMed 29513193
PubMed Central PMC5657082
DOI 10.1186/s12864-017-4132-5
PII: 10.1186/s12864-017-4132-5
Knihovny.cz E-zdroje

BACKGROUND: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern. RESULTS: We propose a method called semantic biclustering with the aim to detect interpretable rectangular patterns in binary data matrices. As usual in biclustering, we seek homogeneous submatrices, however, we also require that the included elements can be jointly described in terms of semantic annotations pertaining to both rows (genes) and columns (samples). To find such interpretable biclusters, we explore two strategies. The first endows an existing biclustering algorithm with the semantic ingredients. The other is based on rule and tree learning known from machine learning. CONCLUSIONS: The two alternatives are tested in experiments with two Drosophila melanogaster gene expression datasets. Both strategies are shown to detect sets of compact biclusters with semantic descriptions that also remain largely valid for unseen (testing) data. This desirable generalization aspect is more emphasized in the strategy stemming from conventional biclustering although this is traded off by the complexity of the descriptions (number of ontology terms employed), which, on the other hand, is lower for the alternative strategy.

Zobrazit více v PubMed

van Mechelen I, Bock HH, De Boeck P. Two-mode clustering methods: a structured overview. Stat Methods Med Res. 2004;13(5):363–94. doi: 10.1191/0962280204sm373ra. PubMed DOI

Madeira SC, Oliveira AL. Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE Trans Comput Biol Bioinforma. 2004;1(1):24–45. doi: 10.1109/TCBB.2004.2. PubMed DOI

Kluger Y, Basri R, Chang JT, Gerstein M. Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions. Genome Res. 2003;13(4):703–16. doi: 10.1101/gr.648603. PubMed DOI PMC

Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002;18(suppl 1):S136–S44. doi: 10.1093/bioinformatics/18.suppl_1.S136. PubMed DOI

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545–50. doi: 10.1073/pnas.0506580102. PubMed DOI PMC

Krejnik M, Klema J. Empirical evidence of the applicability of functional clustering through gene expression classification. IEEE/ACM Trans Comput Biol Bioinforma (TCBB) 2012;9(3):788–98. doi: 10.1109/TCBB.2012.23. PubMed DOI

Verbanck M, Lê S, Pagès J. A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data. BMC Bioinforma. 2013;14(1):1. doi: 10.1186/1471-2105-14-42. PubMed DOI PMC

Zelezny F, Lavrac N. Propositionalization-Based Relational Subgroup Discovery with RSD. Mach Learn. 2006;62(1-2):33–63. doi: 10.1007/s10994-006-5834-0. DOI

Kuhna A, Ducasseb S, Girbaa T. Semantic clustering: Identifying topics in source code. Inf Softw Technol. 2007;49(3):230–43. doi: 10.1016/j.infsof.2006.10.017. DOI

Dresden Ovary Table. [Online; Accessed 15 Feb 2016]. http://tomancak-srv1.mpi-cbg.de/DOT/main.

Jambor H, Surendranath V, Kalinka AT, Mejstrik P, Saalfeld S, Tomancak P. Systematic imaging reveals features and changing localization of mRNAs in Drosophila development. eLife. 2015; 4(e05003). PubMed PMC

Džeroski S, Struyf J, editors. Efficient Mining Under Rich Constraints Derived from Various Datasets. Berlin, Heidelberg: Springer Berlin Heidelberg; 2007.

Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Comput Methods Prog Biomed. 2015;119(3):163–80. doi: 10.1016/j.cmpb.2015.02.010. PubMed DOI

Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. International Conference on Hybrid Artificial Intelligence Systems. Cham: Springer; 2016. Biclustering of Gene Expression Data Based on SimUI Semantic Similarity Measure.

Gusenleitner D, Howe EA, Bentink S, Quackenbush J, Culhane AC. iBBiG: iterative binary bi-clustering of gene sets. Bioinformatics. 2012;28(19):2484–92. doi: 10.1093/bioinformatics/bts438. PubMed DOI PMC

Miettinen P, Vreeken J. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM; 2011. Model order selection for Boolean matrix factorization.

Lucchese C, Orlando S, Perego R. A Unifying Framework for Mining Approximate Top-Binary Patterns. IEEE Trans Knowl Data Eng. 2014;26(12):2900–13. doi: 10.1109/TKDE.2013.181. DOI

Russell SJ, Norvig P, Davis E. Artificial intelligence, 3rd ed. Upper Saddle River: Prentice Hall; c2010.

Miettinen P, Mielikainen T, Gionis A, Das G, Mannila H. The discrete basis problem. IEEE Trans Knowl Data Eng. 2008;20(10):1348–62. doi: 10.1109/TKDE.2008.53. DOI

Xiang Y, Jin R, Fuhry D, Dragan FF. Summarizing transactional databases with overlapped hyperrectangles. Data Min Knowl Disc. 2011;23(2):215–51. doi: 10.1007/s10618-010-0203-9. DOI

Zhang ZY, Li T, Ding C, Ren XW, Zhang XS. Binary matrix factorization for analyzing gene expression data. Data Min Knowl Disc. 2010;20(1):28–52. doi: 10.1007/s10618-009-0145-2. DOI

žitnik M, Zupan B. Nimfa: A python library for nonnegative matrix factorization. J Mach Learn Res. 2012;13(1):849–53.

Dhillon IS. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM; 2001. Co-clustering documents and words using bipartite spectral graph partitioning.

Chen HC, Zou W, Tien YJ, Chen JJ. Identification of bicluster regions in a binary matrix and its applications. PLoS ONE. 2013;8(8):e71680. doi: 10.1371/journal.pone.0071680. PubMed DOI PMC

Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22(9):1122–9. doi: 10.1093/bioinformatics/btl060. PubMed DOI

van Uitert M, Meuleman W, Wessels L. Biclustering sparse binary genomic data. J Comput Biol. 2008;15(10):1329–45. doi: 10.1089/cmb.2008.0066. PubMed DOI

Rodriguez-Baena DS, Perez-Pulido AJ, Aguilar JS, et al. A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics. 2011;27(19):2738–45. doi: 10.1093/bioinformatics/btr464. PubMed DOI

Frequent Itemset Mining Implementations Repository. [Online; Accessed 15 Feb 2016]. http://fimi.ua.ac.be/.

Gene Ontology Consortium. [Online; Accessed 15 Feb 2016]. http://geneontology.org/.

Consortium GO, et al. Gene ontology consortium: going forward. Nucleic Acids Res. 2015;43(D1):D1049–D56. doi: 10.1093/nar/gku1179. PubMed DOI PMC

Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2015;44(D1):gkv1070. PubMed PMC

Costa M, Reeve S, Grumbling G, Osumi-Sutherland D. The Drosophila anatomy ontology. J Biomed Semant. 2013; 4(1):1–11. Available from:10.1186/2041-1480-4-32. PubMed DOI PMC

Alexa A, Rahnenfuhrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–1607. doi: 10.1093/bioinformatics/btl140. PubMed DOI

Witten IH, Frank E, Hall MA. Data mining, 3rd ed. Burlington: Morgan Kaufmann; c2011.

Cohen WW. Proceedings of the twelfth international conference on machine learning. San Francisco: Morgan Kaufmann; 1995. Fast effective rule induction.

Quinlan JR. C4.5. San Mateo, Calif.: Morgan Kaufmann Publishers; c1993.

Martin JK, Hirschberg D. On the complexity of learning decision trees. In: International Symposium on Artificial Intelligence and Mathematics.1996. p. 112–115. Fort Lauderdale.

Semantic Biclustering Project. [Online; Accessed 30 Jan 2017]. http://github.com/IDActu/semantic-biclustering.

Kléma J, Malinka F, Zelezny F. Semantic biclustering: a new way to analyze and interpret gene expression data. Bioinformatics Research and Applications, Minsk, Belarus, Springer. 2016:332–3.

Gomez-Skarmeta JL, Campuzano S, Modolell J. Half a century of neural prepatterning: the story of a few bristles and many genes. Nat Rev Neurosci. 2003;4(7):587. doi: 10.1038/nrn1142. PubMed DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...