Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets

. 2016 Jan 01 ; 32 (1) : 9-16. [epub] 20150905

Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid26342231

MOTIVATION: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. RESULTS: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. AVAILABILITY AND IMPLEMENTATION: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. CONTACT: muller@mou.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Zobrazit více v PubMed

Altschul S. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. PubMed PMC

Andreatta M., et al. (2011) NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS One, 6, e26781. PubMed PMC

Andreatta M., et al. (2012) Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach. Bioinformatics, 29, 8–14. PubMed

Blankenberg D., et al. (2014) Dissemination of scientific software with galaxy ToolShed. Genome Biol., 15, 403. PubMed PMC

Bratkovič T. (2009) Progress in phage display: evolution of the technique and its applications. Cell. Mol. Life Sci., 67, 749–767. PubMed PMC

Crooks G.E. (2004) WebLogo: a sequence logo generator. Genome Res., 14, 1188–1190. PubMed PMC

Derda R., et al. (2011) Diversity of phage-displayed libraries of peptides during panning and amplification. Molecules, 16, 1776–1803. PubMed PMC

Dinkel H., et al. (2013) The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Res., 42, D259–D266. PubMed PMC

Finn R.D., et al. (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res., 39(suppl), W29–W37. PubMed PMC

Gfeller D., et al. (2011) The multiple-specificity landscape of modular peptide recognition domains. Mol. Syst. Biol., 7, 484. PubMed PMC

Giardine B. (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res., 15, 1451–1455. PubMed PMC

Goecks J., et al. (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol., 11, R86. PubMed PMC

Halperin R.F., et al. (2010) Exploring antibody recognition of sequence space through random-sequence peptide microarrays. Mol. Cell. Proteomics, 10, M110.000786. PubMed PMC

Huang J., et al. (2011) Bioinformatics resources and tools for phage display. Molecules, 16, 694–709. PubMed PMC

Kim I., et al. (2014) Linear motif-mediated interactions have contributed to the evolution of modularity in complex protein interaction networks. PLoS Comput. Biol., 10, e1003881. PubMed PMC

Kim T., et al. (2011) MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets. Nucleic Acids Res., 40, e47. PubMed PMC

Legutki J.B., et al. (2010) A general method for characterization of humoral immunity induced by a vaccine or infection. Vaccine, 28, 4529–4537. PubMed

Li W., et al. (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17, 282–283. PubMed

Matochko W.L., et al. (2012) Deep sequencing analysis of phage libraries using illumina platform. Methods, 58, 47–55. PubMed

Nielsen M., Lund O. (2009) NN-align. an artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics, 10, 296. PubMed PMC

Nielsen M., et al. (2004) Improved prediction of MHC class i and class II epitopes using a novel Gibbs sampling approach. Bioinformatics, 20, 1388–1397. PubMed

Noguchi H., et al. (2002) Hidden Markov model-based prediction of antigenic peptides that interact with MHC class II molecules. J. Biosci. Bioeng., 94, 264–270. PubMed

Sievers F., et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol., 7, 539. PubMed PMC

Soding J. (2004) Protein homology detection by HMM-HMM comparison. Bioinformatics, 21, 951–960. PubMed

Stephen C.W., et al. (1995) Characterisation of epitopes on human p53 using phage-displayed peptide libraries: insights into antibody-peptide interactions. J. Mol. Biol., 248, 58–78. PubMed

Stiffler M.A., et al. (2007) PDZ domain binding selectivity is optimized across the mouse proteome. Science, 317, 364–369. PubMed PMC

Vojtesek B., et al. (1992) An immunochemical analysis of the human nuclear phosphoprotein p53. J. Immunol. Methods, 151, 237–244. PubMed

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...