Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets
Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
26342231
PubMed Central
PMC4681989
DOI
10.1093/bioinformatics/btv522
PII: btv522
Knihovny.cz E-zdroje
- MeSH
- algoritmy * MeSH
- databáze proteinů * MeSH
- epitopy chemie MeSH
- interakční proteinové domény a motivy * MeSH
- lidé MeSH
- Markovovy řetězce MeSH
- molekulární sekvence - údaje MeSH
- monoklonální protilátky chemie MeSH
- peptidy chemie MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- shluková analýza MeSH
- software MeSH
- src homologní domény MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- epitopy MeSH
- monoklonální protilátky MeSH
- peptidy MeSH
MOTIVATION: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. RESULTS: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. AVAILABILITY AND IMPLEMENTATION: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. CONTACT: muller@mou.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Faculty of Informatics Masaryk University Botanicka 68a 60200 Brno Czech Republic
RECAMO Masaryk Memorial Cancer Institute Zluty kopec 7 65653 Brno Czech Republic
Zobrazit více v PubMed
Altschul S. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. PubMed PMC
Andreatta M., et al. (2011) NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS One, 6, e26781. PubMed PMC
Andreatta M., et al. (2012) Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach. Bioinformatics, 29, 8–14. PubMed
Blankenberg D., et al. (2014) Dissemination of scientific software with galaxy ToolShed. Genome Biol., 15, 403. PubMed PMC
Bratkovič T. (2009) Progress in phage display: evolution of the technique and its applications. Cell. Mol. Life Sci., 67, 749–767. PubMed PMC
Crooks G.E. (2004) WebLogo: a sequence logo generator. Genome Res., 14, 1188–1190. PubMed PMC
Derda R., et al. (2011) Diversity of phage-displayed libraries of peptides during panning and amplification. Molecules, 16, 1776–1803. PubMed PMC
Dinkel H., et al. (2013) The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Res., 42, D259–D266. PubMed PMC
Finn R.D., et al. (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res., 39(suppl), W29–W37. PubMed PMC
Gfeller D., et al. (2011) The multiple-specificity landscape of modular peptide recognition domains. Mol. Syst. Biol., 7, 484. PubMed PMC
Giardine B. (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res., 15, 1451–1455. PubMed PMC
Goecks J., et al. (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol., 11, R86. PubMed PMC
Halperin R.F., et al. (2010) Exploring antibody recognition of sequence space through random-sequence peptide microarrays. Mol. Cell. Proteomics, 10, M110.000786. PubMed PMC
Huang J., et al. (2011) Bioinformatics resources and tools for phage display. Molecules, 16, 694–709. PubMed PMC
Kim I., et al. (2014) Linear motif-mediated interactions have contributed to the evolution of modularity in complex protein interaction networks. PLoS Comput. Biol., 10, e1003881. PubMed PMC
Kim T., et al. (2011) MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets. Nucleic Acids Res., 40, e47. PubMed PMC
Legutki J.B., et al. (2010) A general method for characterization of humoral immunity induced by a vaccine or infection. Vaccine, 28, 4529–4537. PubMed
Li W., et al. (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17, 282–283. PubMed
Matochko W.L., et al. (2012) Deep sequencing analysis of phage libraries using illumina platform. Methods, 58, 47–55. PubMed
Nielsen M., Lund O. (2009) NN-align. an artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics, 10, 296. PubMed PMC
Nielsen M., et al. (2004) Improved prediction of MHC class i and class II epitopes using a novel Gibbs sampling approach. Bioinformatics, 20, 1388–1397. PubMed
Noguchi H., et al. (2002) Hidden Markov model-based prediction of antigenic peptides that interact with MHC class II molecules. J. Biosci. Bioeng., 94, 264–270. PubMed
Sievers F., et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol., 7, 539. PubMed PMC
Soding J. (2004) Protein homology detection by HMM-HMM comparison. Bioinformatics, 21, 951–960. PubMed
Stephen C.W., et al. (1995) Characterisation of epitopes on human p53 using phage-displayed peptide libraries: insights into antibody-peptide interactions. J. Mol. Biol., 248, 58–78. PubMed
Stiffler M.A., et al. (2007) PDZ domain binding selectivity is optimized across the mouse proteome. Science, 317, 364–369. PubMed PMC
Vojtesek B., et al. (1992) An immunochemical analysis of the human nuclear phosphoprotein p53. J. Immunol. Methods, 151, 237–244. PubMed
TAp73 and ΔTAp73 isoforms show cell-type specific distributions and alterations in cancer
Biochemical evidence for conformational variants in the anti-viral and pro-metastatic protein IFITM1
Comparative characterization of two monoclonal antibodies targeting canine PD-1