Secondary Structures of Proteins Follow Menzerath-Altmann Law

. 2022 Jan 29 ; 23 (3) : . [epub] 20220129

Jazyk angličtina Země Švýcarsko Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid35163493

Grantová podpora
IGA_FF_2021_046 Ministry of Education, Youth and Sports, msmt.cz
LM2018131 ELIXIR CZ Research Infrastructure

This article examines the presence of the empirical tendency known as the Menzerath-Altmann Law (MAL) on protein secondary structures. MAL is related to optimization principles observed in natural languages and in genetic information on chromosomes or protein domains. The presence of MAL is examined on a non-redundant dataset of 4728 proteins by verifying significant, negative correlations and testing classical and newly proposed formulas by fitting the observed trend. We conclude that the lengths of secondary structures are specifically dependent on their number inside the protein sequence, while possibly reflecting the formula proposed in this paper. This behavior is observed on average but is individually avoidable and possibly driven by a latent cost function. The data suggest that MAL could provide a useful guiding principle in protein design.

Zobrazit více v PubMed

Menzerath P. Actes du Premier Congres International de Linguistes. Sijthoff; Leiden, The Netherlands: 1928. Über Einige Phonetische Probleme; pp. 104–105.

Altmann G. Prolegomena to Menzerath’s Law. Glottometrika. 1980;2:124–129.

Ferrer-I-Cancho R., Forns N. The self-organization of genomes. Complexity. 2009;15:34–36. doi: 10.1002/cplx.20296. DOI

Solé R.V. Genome size, self-organization and DNA’s dark matter. Complexity. 2010;16:20–23. doi: 10.1002/cplx.20326. DOI

Hernández-Fernández A., Baixeries J., Forns N., Ferrer-I-Cancho R. Size of the Whole versus Number of Parts in Genomes. Entropy. 2011;13:1465–1480. doi: 10.3390/e13081465. DOI

Baixeries J., Hernández-Fernández A., Forns N., Ferrer-I-Cancho R. The Parameters of the Menzerath-Altmann Law in Genomes. J. Quant. Linguist. 2013;20:94–104. doi: 10.1080/09296174.2013.773141. DOI

Ferrer-I-Cancho R., Forns N., Hernández-Fernández A., Bel-Enguix G., Baixeries J. The challenges of statistical patterns of language: The case of Menzerath’s law in genomes. Complexity. 2012;18:11–17. doi: 10.1002/cplx.21429. DOI

Li W. Menzerath’s law at the gene-exon level in the human genome. Complexity. 2011;17:49–53. doi: 10.1002/cplx.20398. DOI

Eroglu S. Language-like behavior of protein length distribution in proteomes. Complexity. 2014;20:12–21. doi: 10.1002/cplx.21498. DOI

Shahzad K., Mittenthal J.E., Caetano-Anollés G. The organization of domains in proteins obeys Menzerath-Altmann’s law of language. BMC Syst. Biol. 2015;9:44. doi: 10.1186/s12918-015-0192-9. PubMed DOI PMC

Baixeries J., Hernández-Fernández A., Ferrer-I-Cancho R. Random models of Menzerath–Altmann law in genomes. Biosystems. 2012;107:167–173. doi: 10.1016/j.biosystems.2011.11.010. PubMed DOI

Torre I.G., Dębowski Ł., Hernández-Fernández A. Can Menzerath’s Law Be a Criterion of Complexity in Communication? PLoS ONE. 2021;16:e0256133. doi: 10.1371/journal.pone.0256133. PubMed DOI PMC

Milička J. Menzerath’s Law: The Whole is Greater than the Sum of its Parts. J. Quant. Linguist. 2014;21:85–99. doi: 10.1080/09296174.2014.882187. DOI

Ferrer-I-Cancho R., Hernández-Fernández A., Baixeries J., Dębowski L., Mačutek J. When is Menzerath-Altmann law mathematically trivial? A new approach. Stat. Appl. Genet. Mol. Biol. 2014;13:633–644. doi: 10.1515/sagmb-2013-0034. PubMed DOI

Bowie J.U. Helix packing in membrane proteins. J. Mol. Biol. 1997;272:780–789. doi: 10.1006/jmbi.1997.1279. PubMed DOI

Sjöblom B., Salmazo A., Djinović-Carugo K. α-Actinin Structure and Regulation. Cell. Mol. Life Sci. 2008;65:2688. doi: 10.1007/s00018-008-8080-8. PubMed DOI PMC

The UniProt Consortium UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. PubMed DOI PMC

Dana J.M., Gutmanas A., Tyagi N., Qi G., O’Donovan C., Martin M.-J., Velankar S. SIFTS: Updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 2018;47:D482–D489. doi: 10.1093/nar/gky1114. PubMed DOI PMC

Gutmanas A., Alhroub Y., Battle G.M., Berrisford J.M., Bochet E., Conroy M.J., Dana J.M., Montecelo M.A.F., van Ginkel G., Gore S.P., et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 2013;42:D285–D291. doi: 10.1093/nar/gkt1180. PubMed DOI PMC

Steinegger M., Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017;35:1026–1028. doi: 10.1038/nbt.3988. PubMed DOI

Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. PubMed DOI PMC

Gao Y., Wang S., Deng M., Xu J. RaptorX-Angle: Real-Value Prediction of Protein Backbone Dihedral Angles through a Hybrid Method of Clustering and Deep Learning. BMC Bioinform. 2018;19:100. doi: 10.1186/s12859-018-2065-x. PubMed DOI PMC

Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J., et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2020;49:D412–D419. doi: 10.1093/nar/gkaa913. PubMed DOI PMC

Klausen M.S., Jespersen M.C., Nielsen H., Jensen K.K., Jurtz V.I., Sønderby C.K., Sommer M.O.A., Winther O., Nielsen M., Petersen B., et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins Struct. Funct. Bioinform. 2019;87:520–527. doi: 10.1002/prot.25674. PubMed DOI

Fu L., Niu B., Zhu Z., Wu S., Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. PubMed DOI PMC

Wang G., Dunbrack R.L. PISCES: A protein sequence culling server. Bioinformatics. 2003;19:1589–1591. doi: 10.1093/bioinformatics/btg224. PubMed DOI

Chambers J.M., Hastie T., Bates D.M. Statistical Models in S. Chapman & Hall/CRC; Boca Raton, FL, USA: 1992. Nonlinear Models; pp. 421–454.

Darragh A.J., Garrick D.J., Moughan P.J., Hendriks W.H. Correction for Amino Acid Loss during Acid Hydrolysis of a Purified Protein. Anal. Biochem. 1996;236:199–207. doi: 10.1006/abio.1996.0157. PubMed DOI

Rodgers G.M., Conn M.T. Homocysteine, an atherogenic stimulus, reduces protein C activation by arterial and venous endothelial cells. Blood. 1990;75:895–901. doi: 10.1182/blood.V75.4.895.895. PubMed DOI

Mertens D., Loften J. The Effect of Starch on Forage Fiber Digestion Kinetics In Vitro. J. Dairy Sci. 1980;63:1437–1446. doi: 10.3168/jds.S0022-0302(80)83101-8. PubMed DOI

Burnham K.P., Anderson D.R. Multimodel Inference: Understanding AIC and BIC in Model Selection. Sociol. Methods Res. 2004;33:261–304. doi: 10.1177/0049124104268644. DOI

Kogiso T., Moriyoshi Y., Shimizu S., Nagahara H., Shiratori K. High-sensitivity C-reactive protein as a serum predictor of nonalcoholic fatty liver disease based on the Akaike Information Criterion scoring system in the general Japanese population. J. Gastroenterol. 2009;44:313–321. doi: 10.1007/s00535-009-0002-5. PubMed DOI

Andres J., Benešová M., Chvosteková M., Fišerová E. Optimization of Parameters in the Menzerath–Altmann Law, II. Acta Univ. Palacki. Olomuc. Fac. Rerum Nat. Math. 2014;53:5–28.

Wang Y., Wagner N., Rondinelli J.M. Symbolic Regression in Materials Science. MRS Commun. 2019;9:793–805. doi: 10.1557/mrc.2019.85. DOI

Kim D.-H., Han K.-H. Transient Secondary Structures as General Target-Binding Motifs in Intrinsically Disordered Proteins. Int. J. Mol. Sci. 2018;19:3614. doi: 10.3390/ijms19113614. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...