• This record comes from PubMed

SProt: sphere-based protein structure similarity algorithm

. 2011 Oct 14 ; 9 Suppl 1 (Suppl 1) : S20. [epub] 20111014

Status PubMed-not-MEDLINE Language English Country England, Great Britain Media electronic

Document type Journal Article

Links

PubMed 22166105
PubMed Central PMC3289081
DOI 10.1186/1477-5956-9-s1-s20
PII: 1477-5956-9-S1-S20
Knihovny.cz E-resources

BACKGROUND: Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field. RESULTS: We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt's features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster. CONCLUSIONS: The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.

See more in PubMed

Lathrop RH. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 1994;7(9):1059–1068. doi: 10.1093/protein/7.9.1059. PubMed DOI

Kabsch W. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr A. 1976;32(5):922–923. doi: 10.1107/S0567739476001873. DOI

Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233:123–138. doi: 10.1006/jmbi.1993.1489. PubMed DOI

Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics. 2000;16(6):566–567. doi: 10.1093/bioinformatics/16.6.566. PubMed DOI

Aung Z, Tan KL. Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics. 2004;20(7):1045–1052. doi: 10.1093/bioinformatics/bth036. PubMed DOI

Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11(9):739–747. doi: 10.1093/protein/11.9.739. PubMed DOI

Taylor W, Flores T, Orengo C. Multiple protein structure alignment. Protein Sci. 1994;3(10):1858–1870. doi: 10.1002/pro.5560031025. PubMed DOI PMC

Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. PubMed DOI

Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11(11):2606–2621. PubMed PMC

Sacan A, Toroslu HI, Ferhatosmanoglu H. Integrated search and alignment of protein structures. Bioinformatics. 2008;24(24):2872–2879. doi: 10.1093/bioinformatics/btn545. PubMed DOI

Birzele F, Gewehr JE, Csaba G, Zimmer R. Vorolign-fast structural alignment using Voronoi contacts. Bioinformatics. 2007;23(2):e205–e211. doi: 10.1093/bioinformatics/btl294. PubMed DOI

Csaba G, Birzele F, Zimmer R. Protein structure alignment considering phenotypic plasticity. Bioinformatics. 2008;24(16):i98–i104. doi: 10.1093/bioinformatics/btn271. PubMed DOI

Hoksza D, Galgonek J. In: BIBMW: 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop. Chen J, Chen X, Ely J, Hakkanitr D, He J and Hsu HH, editor. 2009. Density-Based Classification of Protein Structures Using Iterative TM-score; pp. 85–90.http://dx.doi.org/10.1109/BIBMW.2009.5332142 DOI

Tung CHH, Huang JWW, Yang JMM. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol. 2007;8(3):R31+. PubMed PMC

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. PubMed

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. PubMed DOI PMC

Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. PubMed DOI

Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57(4):702–710. doi: 10.1002/prot.20264. PubMed DOI

Siew N, Elofsson A, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics. 2000;16(9):776–785. doi: 10.1093/bioinformatics/16.9.776. PubMed DOI

Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A. A study of quality measures for protein threading models. BMC Bioinformatics. 2001;2:5+. doi: 10.1186/1471-2105-2-5. PubMed DOI PMC

Chandonia J, Hon G, Walker N, Conte L, Koehl P, Levitt M, Brenner S. The ASTRAL Compendium in 2004. Nucleic Acids Res. 2004;32(Database issue):D189–D192. PubMed PMC

Chávez E, Navarro G, Baeza-Yates RA, Marroquín JL. Searching in metric spaces. ACM Comput. Surv. 2001;33(3):273–321. doi: 10.1145/502807.502808. DOI

Micó ML, Oncina J, Vidal E. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters. 1994;15:9–17. doi: 10.1016/0167-8655(94)90095-7. DOI

Moreno-Seco F, Micó L, Oncina J. Extending LAESA Fast Nearest Neighbour Algorithm to Find the k Nearest Neighbours. Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition. 2002. pp. 718–724.http://dl.acm.org/citation.cfm?id=671268

Skopal T, Lokoč J, Bustos B. D-cache: Universal Distance Cache for Metric Access Methods. IEEE Transactions on Knowledge and Data Engineering. 2011;99 http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.19 (PrePrints) DOI

Skopal T. Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Trans. Database Syst. 2007;32(4):19–28. http://dl.acm.org/citation.cfm?id=1292619

Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–540. PubMed

Baeza-Yates RA, Ribeiro-Neto BA. Modern Information Retrieval. ACM Press / Addison-Wesley; 1999.

Fischer D, Elofsson A, Rice D, Eisenberg D. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Pac Symp Biocomput. 1996. pp. 300–18. PubMed

Humphrey W, Dalke A, Schulten K. VMD: visual molecular dynamics. J Mol Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. PubMed DOI

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...