Prediction of DNA-binding proteins from relational features

. 2012 Nov 12 ; 10 (1) : 66. [epub] 20121112

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid23146001

BACKGROUND: The process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins. RESULTS: Prediction based only on structural relational features already achieves competitive results to existing methods based on physicochemical properties on several protein datasets. Predictive performance is further improved when structural features are combined with physicochemical features. Moreover, the structural features provide some insights not revealed by physicochemical features. Our method is able to detect common spatial substructures. We demonstrate this in experiments with zinc finger proteins. CONCLUSIONS: We introduced a novel approach for DNA-binding propensity prediction using relational machine learning which could potentially be used also for protein function prediction in general.

Zobrazit více v PubMed

Zhao H, Yang Y, Zhou Y. Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics. 2010;26(15):1857–1863. doi: 10.1093/bioinformatics/btq295. PubMed DOI PMC

Gao M, Skolnick J. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res. 2008;36(12):3978–3992. doi: 10.1093/nar/gkn332. PubMed DOI PMC

Gao M, Skolnick J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. Plos Comput Biol. 2009;5(11):e1000567. doi: 10.1371/journal.pcbi.1000567. PubMed DOI PMC

Nimrod G, Szilágyi A, Leslie C, Ben-Tal N. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol. 2009;387(4):1040–1053. doi: 10.1016/j.jmb.2009.02.023. PubMed DOI PMC

Stawiski E, Gregoret L, Mandel-Gutfreund Y. Annotating nucleic acid-binding function based on protein structure. J Mol Biol. 2003;326:1065–1079. doi: 10.1016/S0022-2836(03)00031-7. PubMed DOI

Ahmad S, Sarai A. Moment-based prediction of DNA-binding proteins. J Mol Biol. 2004;341:65–71. doi: 10.1016/j.jmb.2004.05.058. PubMed DOI

Bhardwaj N, Langlois R, Zhao G, H L. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res. 2005;33(20):6486–6493. doi: 10.1093/nar/gki949. PubMed DOI PMC

Szilágyi A, Skolnick J. Efficient prediction of nucleic acid binding function from low-resolution protein structures. J Mol Biol. 2006;358(3):922–933. doi: 10.1016/j.jmb.2006.02.053. PubMed DOI

Patel A, Patel S, Naik P. Binary classification of uncharacterized proteins into DNA binding/non-DNA binding proteins from sequence derived features using ANN. Digest J Nanomaterials Biostructures. 2009;4(4):775–782.

Nassif H, Al-Ali H, Khuri S, Keirouz W, Page D. Proceedings of the 19th International Conference on ILP. Leuven: Springer-Verlag; 2009. An inductive logic programming approach to validate hexose biochemical knowledge; pp. 149–165.

Kuželka O, železný F. Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Mach Learn. 2011;83:163–192. doi: 10.1007/s10994-010-5208-5. DOI

Szabóová A, Kuželka O, železný F, Tolar J. MLSB 2010: 4th International Workshop on Machine Learning in Systems Biology. UK: University of Edinburgh; 2010. Prediction of DNA-binding proteins from structural features; pp. 71–74.

Siggers T, Honig B. Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry. Nucleic Acids Res. 2007;35(4):1085–1097. doi: 10.1093/nar/gkl1155. PubMed DOI PMC

De Raedt. , L. Logical and Relational Learning. Heidelberg: Springer-Verlag; 2008.

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag; 2001.

Lavrač N, Flach P. An extended transformation approach to inductive logic programming. ACM Trans Comput Logic. 2001;2:458–494. doi: 10.1145/383779.383781. DOI

Jones S, Shanahan H, Thornton J, Berman1 H. Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res. 2003;31:7189–7198. doi: 10.1093/nar/gkg922. PubMed DOI PMC

Witten I, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann; 2005.

Burges C. A tutorial on support vector machines for pattern Recognition. Data Min Knowl Discov. 1998;2(2):121–167. doi: 10.1023/A:1009715923555. DOI

Landwehr N, Hall M, Frank E. Logistic model trees. Mach Learn. 2005;59(1-2):161–205. doi: 10.1007/s10994-005-0466-3. DOI

Hilbe J. Logistic Regression Models. New York: Taylor & Francis, Inc.; 2009.

Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. London, UK: Springer-Verlag; 1995.

Breiman L. Random forests. Mach Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. DOI

Liu H, Setiono R. Proceedings of IEEE 7th International Conference on Tools with Artificial Intelligence. New Jersey, USA: IEEE Piscataway; 1995. Chi2: Feature selection and discretization of numeric attributes; pp. 338–391.

Desjarlais J, Berg J. Toward rules relating zinc finger protein sequences and DNA binding site preferences. Proc Nat Acad Sci. 1992;89(16):7345–7349. doi: 10.1073/pnas.89.16.7345. PubMed DOI PMC

Desjarlais J, Berg J. Use of a zinc-finger consensus sequence framework and specificity rules to design specific DNA binding proteins. Proc Nat Acad Sci. 1993;90(6):2256–2260. doi: 10.1073/pnas.90.6.2256. PubMed DOI PMC

Desjarlais J, Berg J. Length-encoded multiplex binding site determination: application to zinc finger proteins. Proc Nat Acad Sci. 1994;91(23):11099–11103. doi: 10.1073/pnas.91.23.11099. PubMed DOI PMC

Nardelli J, Gibson T, Charnay P. Zinc finger-DNA recognition: analysis of base specificity by site-directed mutagenesis. Nucleic Acids Res. 1992;20(16):4137–4144. doi: 10.1093/nar/20.16.4137. PubMed DOI PMC

Thukral S, Morrison M, Young E. Mutations in the zinc fingers of ADR1 that change the specificity of DNA binding and transactivation. Mol Cell Biol. 1992;12(6):2784–2792. PubMed PMC

Pavletich N, Pabo C. Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1Å. Science. 1991;252(5007):809–817. doi: 10.1126/science.2028256. PubMed DOI

Elrod-Erickson M, Rould M, Nekludova L, Pabo C. Zif268 protein-DNA complex refined at 1.6Å: a model system for understanding zinc finger-DNA interactions. Structure. 1996;4(10):1171–1180. doi: 10.1016/S0969-2126(96)00125-6. PubMed DOI

Wolfe S, Nekludova L, Pabo C. DNA recognition by Cys-2-His-2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183–212. doi: 10.1146/annurev.biophys.29.1.183. PubMed DOI

Moreland J, Gramada A, Buzko O, Zhang Q, Bourne P. The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications. BMC Bioinformatics. 2005;6(1):21+. doi: 10.1186/1471-2105-6-21. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...