Secondary Structures of Proteins Follow Menzerath-Altmann Law
Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
IGA_FF_2021_046
Ministry of Education, Youth and Sports, msmt.cz
LM2018131
ELIXIR CZ Research Infrastructure
PubMed
35163493
PubMed Central
PMC8836146
DOI
10.3390/ijms23031569
PII: ijms23031569
Knihovny.cz E-zdroje
- Klíčová slova
- Menzerath–Altmann law, empirical law, formula fitting, proteins, quantitative linguistics, secondary structures,
- MeSH
- algoritmy MeSH
- databáze proteinů MeSH
- molekulární modely * MeSH
- proteiny chemie MeSH
- sekundární struktura proteinů MeSH
- statistika jako téma MeSH
- subcelulární frakce metabolismus MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- proteiny MeSH
This article examines the presence of the empirical tendency known as the Menzerath-Altmann Law (MAL) on protein secondary structures. MAL is related to optimization principles observed in natural languages and in genetic information on chromosomes or protein domains. The presence of MAL is examined on a non-redundant dataset of 4728 proteins by verifying significant, negative correlations and testing classical and newly proposed formulas by fitting the observed trend. We conclude that the lengths of secondary structures are specifically dependent on their number inside the protein sequence, while possibly reflecting the formula proposed in this paper. This behavior is observed on average but is individually avoidable and possibly driven by a latent cost function. The data suggest that MAL could provide a useful guiding principle in protein design.
Department of Cell Biology Charles University 128 43 Prague 2 Czech Republic
Department of General Linguistics Palacky University 771 00 Olomouc Czech Republic
Department of Psychology Palacky University 771 00 Olomouc Czech Republic
Zobrazit více v PubMed
Menzerath P. Actes du Premier Congres International de Linguistes. Sijthoff; Leiden, The Netherlands: 1928. Über Einige Phonetische Probleme; pp. 104–105.
Altmann G. Prolegomena to Menzerath’s Law. Glottometrika. 1980;2:124–129.
Ferrer-I-Cancho R., Forns N. The self-organization of genomes. Complexity. 2009;15:34–36. doi: 10.1002/cplx.20296. DOI
Solé R.V. Genome size, self-organization and DNA’s dark matter. Complexity. 2010;16:20–23. doi: 10.1002/cplx.20326. DOI
Hernández-Fernández A., Baixeries J., Forns N., Ferrer-I-Cancho R. Size of the Whole versus Number of Parts in Genomes. Entropy. 2011;13:1465–1480. doi: 10.3390/e13081465. DOI
Baixeries J., Hernández-Fernández A., Forns N., Ferrer-I-Cancho R. The Parameters of the Menzerath-Altmann Law in Genomes. J. Quant. Linguist. 2013;20:94–104. doi: 10.1080/09296174.2013.773141. DOI
Ferrer-I-Cancho R., Forns N., Hernández-Fernández A., Bel-Enguix G., Baixeries J. The challenges of statistical patterns of language: The case of Menzerath’s law in genomes. Complexity. 2012;18:11–17. doi: 10.1002/cplx.21429. DOI
Li W. Menzerath’s law at the gene-exon level in the human genome. Complexity. 2011;17:49–53. doi: 10.1002/cplx.20398. DOI
Eroglu S. Language-like behavior of protein length distribution in proteomes. Complexity. 2014;20:12–21. doi: 10.1002/cplx.21498. DOI
Shahzad K., Mittenthal J.E., Caetano-Anollés G. The organization of domains in proteins obeys Menzerath-Altmann’s law of language. BMC Syst. Biol. 2015;9:44. doi: 10.1186/s12918-015-0192-9. PubMed DOI PMC
Baixeries J., Hernández-Fernández A., Ferrer-I-Cancho R. Random models of Menzerath–Altmann law in genomes. Biosystems. 2012;107:167–173. doi: 10.1016/j.biosystems.2011.11.010. PubMed DOI
Torre I.G., Dębowski Ł., Hernández-Fernández A. Can Menzerath’s Law Be a Criterion of Complexity in Communication? PLoS ONE. 2021;16:e0256133. doi: 10.1371/journal.pone.0256133. PubMed DOI PMC
Milička J. Menzerath’s Law: The Whole is Greater than the Sum of its Parts. J. Quant. Linguist. 2014;21:85–99. doi: 10.1080/09296174.2014.882187. DOI
Ferrer-I-Cancho R., Hernández-Fernández A., Baixeries J., Dębowski L., Mačutek J. When is Menzerath-Altmann law mathematically trivial? A new approach. Stat. Appl. Genet. Mol. Biol. 2014;13:633–644. doi: 10.1515/sagmb-2013-0034. PubMed DOI
Bowie J.U. Helix packing in membrane proteins. J. Mol. Biol. 1997;272:780–789. doi: 10.1006/jmbi.1997.1279. PubMed DOI
Sjöblom B., Salmazo A., Djinović-Carugo K. α-Actinin Structure and Regulation. Cell. Mol. Life Sci. 2008;65:2688. doi: 10.1007/s00018-008-8080-8. PubMed DOI PMC
The UniProt Consortium UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. PubMed DOI PMC
Dana J.M., Gutmanas A., Tyagi N., Qi G., O’Donovan C., Martin M.-J., Velankar S. SIFTS: Updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 2018;47:D482–D489. doi: 10.1093/nar/gky1114. PubMed DOI PMC
Gutmanas A., Alhroub Y., Battle G.M., Berrisford J.M., Bochet E., Conroy M.J., Dana J.M., Montecelo M.A.F., van Ginkel G., Gore S.P., et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 2013;42:D285–D291. doi: 10.1093/nar/gkt1180. PubMed DOI PMC
Steinegger M., Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017;35:1026–1028. doi: 10.1038/nbt.3988. PubMed DOI
Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. PubMed DOI PMC
Gao Y., Wang S., Deng M., Xu J. RaptorX-Angle: Real-Value Prediction of Protein Backbone Dihedral Angles through a Hybrid Method of Clustering and Deep Learning. BMC Bioinform. 2018;19:100. doi: 10.1186/s12859-018-2065-x. PubMed DOI PMC
Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J., et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2020;49:D412–D419. doi: 10.1093/nar/gkaa913. PubMed DOI PMC
Klausen M.S., Jespersen M.C., Nielsen H., Jensen K.K., Jurtz V.I., Sønderby C.K., Sommer M.O.A., Winther O., Nielsen M., Petersen B., et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins Struct. Funct. Bioinform. 2019;87:520–527. doi: 10.1002/prot.25674. PubMed DOI
Fu L., Niu B., Zhu Z., Wu S., Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. PubMed DOI PMC
Wang G., Dunbrack R.L. PISCES: A protein sequence culling server. Bioinformatics. 2003;19:1589–1591. doi: 10.1093/bioinformatics/btg224. PubMed DOI
Chambers J.M., Hastie T., Bates D.M. Statistical Models in S. Chapman & Hall/CRC; Boca Raton, FL, USA: 1992. Nonlinear Models; pp. 421–454.
Darragh A.J., Garrick D.J., Moughan P.J., Hendriks W.H. Correction for Amino Acid Loss during Acid Hydrolysis of a Purified Protein. Anal. Biochem. 1996;236:199–207. doi: 10.1006/abio.1996.0157. PubMed DOI
Rodgers G.M., Conn M.T. Homocysteine, an atherogenic stimulus, reduces protein C activation by arterial and venous endothelial cells. Blood. 1990;75:895–901. doi: 10.1182/blood.V75.4.895.895. PubMed DOI
Mertens D., Loften J. The Effect of Starch on Forage Fiber Digestion Kinetics In Vitro. J. Dairy Sci. 1980;63:1437–1446. doi: 10.3168/jds.S0022-0302(80)83101-8. PubMed DOI
Burnham K.P., Anderson D.R. Multimodel Inference: Understanding AIC and BIC in Model Selection. Sociol. Methods Res. 2004;33:261–304. doi: 10.1177/0049124104268644. DOI
Kogiso T., Moriyoshi Y., Shimizu S., Nagahara H., Shiratori K. High-sensitivity C-reactive protein as a serum predictor of nonalcoholic fatty liver disease based on the Akaike Information Criterion scoring system in the general Japanese population. J. Gastroenterol. 2009;44:313–321. doi: 10.1007/s00535-009-0002-5. PubMed DOI
Andres J., Benešová M., Chvosteková M., Fišerová E. Optimization of Parameters in the Menzerath–Altmann Law, II. Acta Univ. Palacki. Olomuc. Fac. Rerum Nat. Math. 2014;53:5–28.
Wang Y., Wagner N., Rondinelli J.M. Symbolic Regression in Materials Science. MRS Commun. 2019;9:793–805. doi: 10.1557/mrc.2019.85. DOI
Kim D.-H., Han K.-H. Transient Secondary Structures as General Target-Binding Motifs in Intrinsically Disordered Proteins. Int. J. Mol. Sci. 2018;19:3614. doi: 10.3390/ijms19113614. PubMed DOI PMC