SProt: sphere-based protein structure similarity algorithm
Status PubMed-not-MEDLINE Language English Country England, Great Britain Media electronic
Document type Journal Article
PubMed
22166105
PubMed Central
PMC3289081
DOI
10.1186/1477-5956-9-s1-s20
PII: 1477-5956-9-S1-S20
Knihovny.cz E-resources
- Publication type
- Journal Article MeSH
BACKGROUND: Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field. RESULTS: We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt's features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster. CONCLUSIONS: The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.
See more in PubMed
Lathrop RH. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 1994;7(9):1059–1068. doi: 10.1093/protein/7.9.1059. PubMed DOI
Kabsch W. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr A. 1976;32(5):922–923. doi: 10.1107/S0567739476001873. DOI
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233:123–138. doi: 10.1006/jmbi.1993.1489. PubMed DOI
Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics. 2000;16(6):566–567. doi: 10.1093/bioinformatics/16.6.566. PubMed DOI
Aung Z, Tan KL. Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics. 2004;20(7):1045–1052. doi: 10.1093/bioinformatics/bth036. PubMed DOI
Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11(9):739–747. doi: 10.1093/protein/11.9.739. PubMed DOI
Taylor W, Flores T, Orengo C. Multiple protein structure alignment. Protein Sci. 1994;3(10):1858–1870. doi: 10.1002/pro.5560031025. PubMed DOI PMC
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. PubMed DOI
Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11(11):2606–2621. PubMed PMC
Sacan A, Toroslu HI, Ferhatosmanoglu H. Integrated search and alignment of protein structures. Bioinformatics. 2008;24(24):2872–2879. doi: 10.1093/bioinformatics/btn545. PubMed DOI
Birzele F, Gewehr JE, Csaba G, Zimmer R. Vorolign-fast structural alignment using Voronoi contacts. Bioinformatics. 2007;23(2):e205–e211. doi: 10.1093/bioinformatics/btl294. PubMed DOI
Csaba G, Birzele F, Zimmer R. Protein structure alignment considering phenotypic plasticity. Bioinformatics. 2008;24(16):i98–i104. doi: 10.1093/bioinformatics/btn271. PubMed DOI
Hoksza D, Galgonek J. In: BIBMW: 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop. Chen J, Chen X, Ely J, Hakkanitr D, He J and Hsu HH, editor. 2009. Density-Based Classification of Protein Structures Using Iterative TM-score; pp. 85–90.http://dx.doi.org/10.1109/BIBMW.2009.5332142 DOI
Tung CHH, Huang JWW, Yang JMM. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol. 2007;8(3):R31+. PubMed PMC
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. PubMed
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. PubMed DOI PMC
Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. PubMed DOI
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57(4):702–710. doi: 10.1002/prot.20264. PubMed DOI
Siew N, Elofsson A, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics. 2000;16(9):776–785. doi: 10.1093/bioinformatics/16.9.776. PubMed DOI
Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A. A study of quality measures for protein threading models. BMC Bioinformatics. 2001;2:5+. doi: 10.1186/1471-2105-2-5. PubMed DOI PMC
Chandonia J, Hon G, Walker N, Conte L, Koehl P, Levitt M, Brenner S. The ASTRAL Compendium in 2004. Nucleic Acids Res. 2004;32(Database issue):D189–D192. PubMed PMC
Chávez E, Navarro G, Baeza-Yates RA, Marroquín JL. Searching in metric spaces. ACM Comput. Surv. 2001;33(3):273–321. doi: 10.1145/502807.502808. DOI
Micó ML, Oncina J, Vidal E. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters. 1994;15:9–17. doi: 10.1016/0167-8655(94)90095-7. DOI
Moreno-Seco F, Micó L, Oncina J. Extending LAESA Fast Nearest Neighbour Algorithm to Find the k Nearest Neighbours. Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition. 2002. pp. 718–724.http://dl.acm.org/citation.cfm?id=671268
Skopal T, Lokoč J, Bustos B. D-cache: Universal Distance Cache for Metric Access Methods. IEEE Transactions on Knowledge and Data Engineering. 2011;99 http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.19 (PrePrints) DOI
Skopal T. Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Trans. Database Syst. 2007;32(4):19–28. http://dl.acm.org/citation.cfm?id=1292619
Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–540. PubMed
Baeza-Yates RA, Ribeiro-Neto BA. Modern Information Retrieval. ACM Press / Addison-Wesley; 1999.
Fischer D, Elofsson A, Rice D, Eisenberg D. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Pac Symp Biocomput. 1996. pp. 300–18. PubMed
Humphrey W, Dalke A, Schulten K. VMD: visual molecular dynamics. J Mol Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. PubMed DOI