Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search
Language English Country Great Britain, England Media electronic
Document type Journal Article, Research Support, Non-U.S. Gov't
PubMed
22759427
PubMed Central
PMC3382442
DOI
10.1186/1471-2105-13-s10-s3
PII: 1471-2105-13-S10-S3
Knihovny.cz E-resources
- MeSH
- Algorithms MeSH
- Amino Acids analysis MeSH
- DNA analysis MeSH
- Monte Carlo Method MeSH
- Proteins analysis MeSH
- Protein Structure, Secondary MeSH
- Models, Theoretical MeSH
- Protein Binding MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Amino Acids MeSH
- DNA MeSH
- Proteins MeSH
We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids.
See more in PubMed
Ohlendorf DH, Matthew JB. Electrostatics and flexibility in protein-DNA interactions. Advances in Biophysics. 1985;20:137–151. PubMed
Stawiski EW, Gregoret LM, Mandel-Gutfreund Y. Annotating Nucleic Acid-Binding Function Based on Protein Structure. Journal of Molecular Biology. 2003;326(4):1065–1079. doi: 10.1016/S0022-2836(03)00031-7. PubMed DOI
Jones S, Shanahan HP, Berman HM, Thornton JM. Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Research. 2003;31(24):7189–7198. doi: 10.1093/nar/gkg922. PubMed DOI PMC
Tsuchiya Y, Kinoshita K, Nakamura H. Structure-based prediction of DNA-binding sites on proteins Using the empirical preference of electrostatic potential and the shape of molecular surfaces. Proteins: Structure, Function, and Bioinformatics. 2004;55(4):885–894. doi: 10.1002/prot.20111. PubMed DOI
Ahmad S, Sarai A. Moment-based Prediction of DNA-binding Proteins. Journal of Molecular Biology. 2004;341:65–71. doi: 10.1016/j.jmb.2004.05.058. PubMed DOI
Bhardwaj N, Langlois RE, Zhao G, Lu H. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Research. 2005;33(20):6486–6493. doi: 10.1093/nar/gki949. PubMed DOI PMC
Szilágyi A, Skolnick J. Efficient Prediction of Nucleic Acid Binding Function from Low-resolution Protein Structures. Journal of Molecular Biology. 2006;358(3):922–933. doi: 10.1016/j.jmb.2006.02.053. PubMed DOI
Nimrod G, Szilágyi A, Leslie C, Ben-Tal N. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. Journal of Molecular Biology. 2009;387(4):1040–53. doi: 10.1016/j.jmb.2009.02.023. http://www.ncbi.nlm.nih.gov/pubmed/19233205 PubMed DOI PMC
Cathomen T, Joung J. Zinc-Finger Nucleases: The Next Generation Emerges. Molecular Therapy. 2008;16 PubMed
Breiman L. Random Forests. Machine Learning. 2001;45:5–32. doi: 10.1023/A:1010933404324. DOI
Caruana R, Karampatziakis N, Yessenalina A. An empirical evaluation of supervised learning in high dimensions. International Conference on Machine Learning (ICML) 2008. pp. 96–103.
Lavrač N, Flach PA. An extended transformation approach to inductive logic programming. ACM Transactions on Computational Logic. 2001;2:458–494. doi: 10.1145/383779.383781. DOI
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; 2001.
Pabo CO, Sauer RT. Transcription factors: structural families and principles of DNA recognition. Annual review of biochemistry. 1992;61:1053–1095. doi: 10.1146/annurev.bi.61.070192.005201. PubMed DOI
Mandel-Gutfreund Y, Schueler O, Margalit H. Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. Journal of Molecular Biology. 1995;253(2):370–382. doi: 10.1006/jmbi.1995.0559. PubMed DOI
Jones S, van Heyningen P, Berman HM, Thornton JM. Protein-DNA interactions: a structural analysis. Journal of Molecular Biology. 1999;287(5):877–896. doi: 10.1006/jmbi.1999.2659. PubMed DOI
Szabóová A, Kuzelka O, Morales SE, Železný F, Tolar J. Prediction of DNA-binding Propensity of Proteins by the Ball-Histogram Method. ISBRA 2011: Bioinformatics Research and Applications 7th International Symposium. 2011. pp. 358–367.
Bhattacharyya A. On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society 35. 1943. pp. 99–109.
Burges CJC. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. 1998;2(2):121–167. doi: 10.1023/A:1009715923555. DOI
Hosmer DW, Lemeshow S. Applied logistic regression (Wiley Series in probability and statistics) Wiley-Interscience Publication; 2000.
Sathyapriya R, Vijayabaskar MS, Vishveshwara S. Insights into Protein-DNA Interactions through Structure Network Analysis. PLoS Comput Biol. 2008;4(9):e1000170. doi: 10.1371/journal.pcbi.1000170. PubMed DOI PMC
Moreland J, Gramada A, Buzko O, Zhang Q, Bourne P. The Molecular Biology Toolkit (MBT): A Modular Platform for Developing Molecular Visualization Applications. BMC Bioinformatics. 2005;6:21. doi: 10.1186/1471-2105-6-21. PubMed DOI PMC