InCHlib - interactive cluster heatmap for web applications
Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
25264459
PubMed Central
PMC4173117
DOI
10.1186/s13321-014-0044-4
PII: 44
Knihovny.cz E-zdroje
- Klíčová slova
- Big data, Client-side scripting, Cluster heatmap, Data clustering, Exploration, JavaScript library, Scientific visualization, Web integration,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. RESULTS: We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust. CONCLUSIONS: The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Zobrazit více v PubMed
Xu R, Wunsch D., 2nd Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–678. doi: 10.1109/TNN.2005.845141. PubMed DOI
MacCuish JD, MacCuish NE. Chemoinformatics applications of cluster analysis. Wiley Interdiscip Rev Comput Mol Sci. 2013;4(1):34–48. doi: 10.1002/wcms.1152. DOI
Downs GM, Barnard JM. Clustering methods and their uses in computational chemistry. In: Lipkowitz KB, Boyd DB, editors. Reviews in Computational Chemistry. New York: VCH; 2002. pp. 1–40.
Gagarin A, Makarenkov V, Zentilli P. Using clustering techniques to improve hit selection in high-throughput screening. J Biomol Screen. 2006;11(8):903–914. doi: 10.1177/1087057106293590. PubMed DOI
Pu M, Hayashi T, Cottam H, Mulvaney J, Arkin M, Corr M, Carson D, Messer K. Analysis of high-throughput screening assays using cluster enrichment. Stat Med. 2012;31(30):4175–4189. doi: 10.1002/sim.5455. PubMed DOI PMC
Stanton DT, Morris TW, Roychoudhury S, Parker CN. Application of nearest-neighbor and cluster analyses in pharmaceutical lead discovery. J Chem Inf Comput Sci. 1999;39(1):21–27. doi: 10.1021/ci9801015. PubMed DOI
Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2(22):3204–3218. doi: 10.1039/b409813g. PubMed DOI
Perez JJ. Managing molecular diversity. Chem Soc Rev. 2005;34(2):143–152. doi: 10.1039/b209064n. PubMed DOI
Petrone PM, Wassermann AM, Lounkine E, Kutchukian P, Simms B, Jenkins J, Selzer P, Glick M. Biodiversity of small molecules–a new perspective in screening set selection. Drug Discov Today. 2013;18(13–14):674–680. doi: 10.1016/j.drudis.2013.02.005. PubMed DOI
Schuffenhauer A, Popov M, Schopfer U, Acklin P, Stanek J, Jacoby E. Molecular diversity management strategies for building and enhancement of diverse and focused lead discovery compound screening collections. Comb Chem High Throughput Screen. 2004;7(8):771–781. doi: 10.2174/1386207043328238. PubMed DOI
Olah MM, Bologa CG, Oprea TI. Strategies for compound selection. Curr Drug Discov Technol. 2004;1(3):211–220. doi: 10.2174/1570163043334965. PubMed DOI
Xu R, Wunsch DC., 2nd Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng. 2010;3:120–154. doi: 10.1109/RBME.2010.2083647. PubMed DOI
Weinstein JN, Myers TG, O'Connor PM, Friend SH, Fornace AJ, Jr, Kohn KW, Fojo T, Bates SE, Rubinstein LV, Anderson NL, Buolamwini JK, van Osdol WW, Monks AP, Scudiero DA, Sausville EA, Zaharevitz DW, Bunow B, Viswanadhan VN, Johnson GS, Wittes RE, Paull KD. An information-intensive approach to the molecular pharmacology of cancer. Science. 1997;275(5298):343–349. doi: 10.1126/science.275.5298.343. PubMed DOI
Wilkinson L, Friendly M. The history of the cluster heat map. Am Stat. 2009;63(2):179–184. doi: 10.1198/tas.2009.0033. DOI
Weinstein JN. Biochemistry. A postgenomic visual icon Science. 2008;319(5871):1772–1773. PubMed
Team. RDC: R: a language and environment for statistical computing. Vienna, Austria: R Foundation for statistical computing; 2010. ., [http://www.gbif.org/resources/2585]
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. doi: 10.1186/gb-2004-5-10-r80. PubMed DOI PMC
CIMminer.., [http://discover.nci.nih.gov/cimminer/home.do]
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95(25):14863–14868. doi: 10.1073/pnas.95.25.14863. PubMed DOI PMC
TreeView.., [http://rana.lbl.gov/EisenSoftware.htm]
Schroeder MP, Gonzalez-Perez A, Lopez-Bigas N. Visualizing multidimensional cancer genomics data. Genome Med. 2013;5(1):9. doi: 10.1186/gm413. PubMed DOI PMC
Dudoit S, Gentleman RC, Quackenbush J: Open source software for the analysis of microarray data.Biotechniques 2003, 34(Supp):45–51. ., [http://www.biotechniques.com/multimedia/archive/00072/Mar03Dudoit_72037a.pdf] PubMed
Saldanha AJ. Java Treeview-extensible visualization of microarray data. Bioinformatics. 2004;20(17):3246–3248. doi: 10.1093/bioinformatics/bth349. PubMed DOI
Zeeberg BR, Qin H, Narasimhan S, Sunshine M, Cao H, Kane DW, Reimers M, Stephens RM, Bryant D, Burt SK, Elnekave E, Hari DM, Wynn TA, Cunningham-Rundles C, Stewart DM, Nelson D, Weinstein JN. High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID) BMC Bioinformatics. 2005;6:168. doi: 10.1186/1471-2105-6-168. PubMed DOI PMC
Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003;34(2):374–378. PubMed
Sturn A, Quackenbush J, Trajanoski Z. Genesis: cluster analysis of microarray data. Bioinformatics. 2002;18(1):207–208. doi: 10.1093/bioinformatics/18.1.207. PubMed DOI
Usadel B, Nagel A, Steinhauser D, Gibon Y, Blasing OE, Redestig H, Sreenivasulu N, Krall L, Hannah MA, Poree F, Fernie AR, Stitt M. PageMan: an interactive ontology tool to generate, display, and annotate overview graphs for profiling experiments. BMC Bioinformatics. 2006;7:535. doi: 10.1186/1471-2105-7-535. PubMed DOI PMC
Floratos A, Smith K, Ji Z, Watkinson J, Califano A. geWorkbench: an open source platform for integrative genomics. Bioinformatics. 2010;26(14):1779–1780. doi: 10.1093/bioinformatics/btq282. PubMed DOI PMC
Lex A, Streit M, Schulz HJ, Partl C, Schmalstieg D, Park PJ, Gehlenborg N. StratomeX: visual analysis of large-scale heterogeneous genomics data for cancer subtype characterization. Comput Graph Forum. 2012;31(3):1175–1184. doi: 10.1111/j.1467-8659.2012.03110.x. PubMed DOI PMC
GENE-E.., [http://www.broadinstitute.org/cancer/software/GENE-E/]
Kim N, Park H, He N, Lee HY, Yoon S. QCanvas: an advanced tool for data clustering and visualization of genomics data. Genomics Inform. 2012;10(4):263–265. doi: 10.5808/GI.2012.10.4.263. PubMed DOI PMC
Perez-Llamas C, Lopez-Bigas N. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One. 2011;6(5):e19541. doi: 10.1371/journal.pone.0019541. PubMed DOI PMC
Zhu J, Sanborn JZ, Benz S, Szeto C, Hsu F, Kuhn RM, Karolchik D, Archie J, Lenburg ME, Esserman LJ, Kent WJ, Haussler D, Wang T. The UCSC cancer genomics browser. Nat Methods. 2009;6(4):239–240. doi: 10.1038/nmeth0409-239. PubMed DOI PMC
Goldman M, Craft B, Swatloski T, Ellrott K, Cline M, Diekhans M, Ma S, Wilks C, Stuart J, Haussler D, Zhu J. The UCSC cancer genomics browser: update 2013. Nucleic Acids Res. 2013;41(Database issue):D949–D954. doi: 10.1093/nar/gks1008. PubMed DOI PMC
Kapushesky M, Kemmeren P, Culhane AC, Durinck S, Ihmels J, Korner C, Kull M, Torrente A, Sarkans U, Vilo J, Brazma A. Expression Profiler: next generation–an online platform for analysis of microarray data. Nucleic Acids Res. 2004;32(Web Server issue):W465–W470. doi: 10.1093/nar/gkh470. PubMed DOI PMC
Medina I, Carbonell J, Pulido L, Madeira SC, Goetz S, Conesa A, Tarraga J, Pascual-Montano A, Nogales-Cadenas R, Santoyo J, García F, Marbà M, Montaner D, Dopazo J. Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res. 2010;38(Web Server issue):W210–W213. doi: 10.1093/nar/gkq388. PubMed DOI PMC
Next-generation clustered heatmaps.., [http://bioinformatics.mdanderson.org/main/NG-CHM:Overview] PubMed
Xia J, Lyle NH, Mayer ML, Pena OM, Hancock RE. INVEX–a web-based tool for integrative visualization of expression data. Bioinformatics. 2013;29(24):3232–3234. doi: 10.1093/bioinformatics/btt562. PubMed DOI PMC
Deu-Pons J, Schroeder MP, Lopez-Bigas N. jHeatmap: an interactive heatmap viewer for the web. Bioinformatics. 2014;30(12):2. doi: 10.1093/bioinformatics/btu094. PubMed DOI
Yachdav G, Hecht M, Pasmanik-Chor M, Yeheskel A, Rost B. HeatMapViewer: interactive display of 2D data in biology. F1000Res. 2014;3:48. PubMed PMC
CanvasXpress.., [http://www.canvasxpress.org/]
KineticJS.., [http://kineticjs.com/]
jQuery.., [http://jquery.com]
JSON (JavaScript Object Notation).., [http://json.org/]
Müllner D. Fastcluster: fast hierarchical, agglomerative clustering routines for r and python. J Stat Softw. 2013;53(9):1–18. doi: 10.18637/jss.v053.i09. DOI
Blatt M, Wiseman S, Domany E. Superparamagnetic clustering of data. Phys Rev Lett. 1996;76(18):3251–3254. doi: 10.1103/PhysRevLett.76.3251. PubMed DOI
Tetko IV, Facius A, Ruepp A, Mewes HW. Super paramagnetic clustering of protein sequences. BMC Bioinformatics. 2005;6:82. doi: 10.1186/1471-2105-6-82. PubMed DOI PMC
Mangelsdorf DJ, Thummel C, Beato M, Herrlich P, Schutz G, Umesono K, Blumberg B, Kastner P, Mark M, Chambon P, Evans RM. The nuclear receptor superfamily: the second decade. Cell. 1995;83(6):835–839. doi: 10.1016/0092-8674(95)90199-X. PubMed DOI PMC
Katzenellenbogen JA, Katzenellenbogen BS. Nuclear hormone receptors: ligand-activated regulators of transcription and diverse cell responses. Chem Biol. 1996;3(7):529–536. doi: 10.1016/S1074-5521(96)90143-X. PubMed DOI
Whitfield GK, Jurutka PW, Haussler CA, Haussler MR. Steroid hormone receptors: evolution, ligands, and molecular basis of biologic function. J Cell Biochem. 1999;33(Suppl 32):110–122. doi: 10.1002/(SICI)1097-4644(1999)75:32+<110::AID-JCB14>3.0.CO;2-T. PubMed DOI
Ali S, Coombes RC. Estrogen receptor alpha in human breast cancer: occurrence and significance. J Mammary Gland Biol Neoplasia. 2000;5(3):271–281. doi: 10.1023/A:1009594727358. PubMed DOI
Heldring N, Pike A, Andersson S, Matthews J, Cheng G, Hartman J, Tujague M, Strom A, Treuter E, Warner M, Gustafsson JA. Estrogen receptors: how do they signal and what are their targets. Physiol Rev. 2007;87(3):905–931. doi: 10.1152/physrev.00026.2006. PubMed DOI
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(Database issue):D1100–D1107. doi: 10.1093/nar/gkr777. PubMed DOI PMC
RDKit: cheminformatics and machine learning software.., [http://www.rdkit.org/]
Bemis GW, Murcko MA. The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996;39(15):2887–2893. doi: 10.1021/jm9602928. PubMed DOI
Krier M, Bret G, Rognan D. Assessing the scaffold diversity of screening libraries. J Chem Inf Model. 2006;46(2):512–524. doi: 10.1021/ci050352v. PubMed DOI
Medina-Franco JL, Martinez-Mayorga K, Bender A, Scior T. Scaffold diversity analysis of compound daft sets using an entropy-based measure. Qsar Comb Sci. 2009;28(11–12):1551–1560. doi: 10.1002/qsar.200960069. DOI
Hu Y, Bajorath J. Scaffold distributions in bioactive molecules, clinical trials compounds, and drugs. ChemMedChem. 2010;5(2):187–190. doi: 10.1002/cmdc.200900419. PubMed DOI
Varin T, Schuffenhauer A, Ertl P, Renner S. Mining for bioactive scaffolds with scaffold networks: improved compound set enrichment from primary screening data. J Chem Inf Model. 2011;51(7):1528–1538. doi: 10.1021/ci2000924. PubMed DOI
Grabowski K, Baringhaus KH, Schneider G. Scaffold diversity of natural products: inspiration for combinatorial library design. Nat Prod Rep. 2008;25(5):892–904. doi: 10.1039/b715668p. PubMed DOI
Lee ML, Schneider G. Scaffold architecture and pharmacophoric properties of natural products and trade drugs: application in the design of natural product-based combinatorial libraries. J Comb Chem. 2001;3(3):284–289. doi: 10.1021/cc000097l. PubMed DOI
Hu Y, Bajorath J. Structural and potency relationships between scaffolds of compounds active against human targets. ChemMedChem. 2010;5(10):1681–1685. doi: 10.1002/cmdc.201000272. PubMed DOI
Hu Y, Bajorath J. Systematic identification of scaffolds representing compounds active against individual targets and single or multiple target families. J Chem Inf Model. 2013;53(2):312–326. doi: 10.1021/ci300616s. PubMed DOI
Hu Y, Bajorath J. Many drugs contain unique scaffolds with varying structural relationships to scaffolds of currently available bioactive compounds. Eur J Med Chem. 2014;76:427–434. doi: 10.1016/j.ejmech.2014.02.040. PubMed DOI
Gomez J, Garcia LJ, Salazar GA, Villaveces J, Gore S, Garcia A, Martin MJ, Launay G, Alcantara R, Del-Toro N, Dumousseau M, Orchard S, Velankar S, Hermjakob H, Zong C, Ping P, Corpas M, Jiménez RC. BioJS: an open source JavaScript framework for biological data visualization. Bioinformatics. 2013;29(8):1103–1104. doi: 10.1093/bioinformatics/btt100. PubMed DOI PMC
ECBD: European chemical biology database