AggreProt: a web server for predicting and engineering aggregation prone regions in proteins

. 2024 Jul 05 ; 52 (W1) : W159-W169.

Jazyk angličtina Země Anglie, Velká Británie Médium print

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38801076

Grantová podpora
857560 Horizon 2020 Framework Programme
FW03010208 Technology Agency of the Czech Republic
CETOCOEN EXCELLENCE CZ.02.1.01/0.0/0.0/17_043/0009 Ministry of Education
LX22NPO5 European Union - Next Generation EU

Recombinant proteins play pivotal roles in numerous applications including industrial biocatalysts or therapeutics. Despite the recent progress in computational protein structure prediction, protein solubility and reduced aggregation propensity remain challenging attributes to design. Identification of aggregation-prone regions is essential for understanding misfolding diseases or designing efficient protein-based technologies, and as such has a great socio-economic impact. Here, we introduce AggreProt, a user-friendly webserver that automatically exploits an ensemble of deep neural networks to predict aggregation-prone regions (APRs) in protein sequences. Trained on experimentally evaluated hexapeptides, AggreProt compares to or outperforms state-of-the-art algorithms on two independent benchmark datasets. The server provides per-residue aggregation profiles along with information on solvent accessibility and transmembrane propensity within an intuitive interface with interactive sequence and structure viewers for comprehensive analysis. We demonstrate AggreProt efficacy in predicting differential aggregation behaviours in proteins on several use cases, which emphasize its potential for guiding protein engineering strategies towards decreased aggregation propensity and improved solubility. The webserver is freely available and accessible at https://loschmidt.chemi.muni.cz/aggreprot/.

Zobrazit více v PubMed

Wodak  S.J., Vajda  S., Lensink  M.F., Kozakov  D., Bates  P.A.  Critical assessment of methods for predicting the 3D structure of proteins and protein complexes. Annu. Rev. Biophys.  2023; 52:183–206. PubMed PMC

Elofsson  A.  Progress at protein structure prediction, as seen in CASP15. Curr. Opin. Struct. Biol.  2023; 80:102594. PubMed

Jumper  J., Evans  R., Pritzel  A., Green  T., Figurnov  M., Ronneberger  O., Tunyasuvunakool  K., Bates  R., Žídek  A., Potapenko  A.  et al. .  Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. PubMed PMC

Baek  M., DiMaio  F., Anishchenko  I., Dauparas  J., Ovchinnikov  S., Lee  G.R., Wang  J., Cong  Q., Kinch  L.N., Schaeffer  R.D.  et al. .  Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021; 373:871–876. PubMed PMC

Lin  Z., Akin  H., Rao  R., Hie  B., Zhu  Z., Lu  W., Smetanin  N., Verkuil  R., Kabeli  O., Shmueli  Y.  et al. .  Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379:1123–1130. PubMed

Pinheiro  F., Santos  J., Ventura  S.  AlphaFold and the amyloid landscape. J. Mol. Biol.  2021; 433:167059. PubMed

Chakravarty  D., Porter  L.L.  AlphaFold2 fails to predict protein fold switching. Protein Sci.  2022; 31:e4353. PubMed PMC

Louros  N., Schymkowitz  J., Rousseau  F.  Mechanisms and pathology of protein misfolding and aggregation. Nat. Rev. Mol. Cell Biol.  2023; 24:912–933. PubMed

Soto  C., Pritzkow  S.  Protein misfolding, aggregation, and conformational strains in neurodegenerative diseases. Nat. Neurosci.  2018; 21:1332–1340. PubMed PMC

Sawaya  M.R., Sambashivan  S., Nelson  R., Ivanova  M.I., Sievers  S.A., Apostol  M.I., Thompson  M.J., Balbirnie  M., Wiltzius  J.J.W., McFarlane  H.T.  et al. .  Atomic structures of amyloid cross-β spines reveal varied steric zippers. Nature. 2007; 447:453–457. PubMed

Fändrich  M., Nyström  S., Nilsson  K.P.R., Böckmann  A., LeVine  H., Hammarström  P.  Amyloid fibril polymorphism: a challenge for molecular imaging and therapy. J. Intern. Med.  2018; 283:218–237. PubMed PMC

Lövestam  S., Li  D., Wagstaff  J.L., Kotecha  A., Kimanius  D., McLaughlin  S.H., Murzin  A.G., Freund  S.M.V., Goedert  M., Scheres  S.H.W.  Disease-specific tau filaments assemble via polymorphic intermediates. Nature. 2024; 625:119–125. PubMed PMC

Wang  H., Duo  L., Hsu  F., Xue  C., Lee  Y.K., Guo  Z.  Polymorphic Aβ42 fibrils adopt similar secondary structure but differ in cross-strand side chain stacking interactions within the same β-sheet. Sci. Rep.  2020; 10:5720. PubMed PMC

Sawaya  M.R., Hughes  M.P., Rodriguez  J.A., Riek  R., Eisenberg  D.S.  The expanding amyloid family: structure, stability, function, and pathogenesis. Cell. 2021; 184:4857–4873. PubMed PMC

van der Kant  R., Louros  N., Schymkowitz  J., Rousseau  F.  Thermodynamic analysis of amyloid fibril structures reveals a common framework for stability in amyloid polymorphs. Structure. 2022; 30:1178–1189. PubMed

Conchillo-Solé  O., de Groot  N.S., Avilés  F.X., Vendrell  J., Daura  X., Ventura  S.  AGGRESCAN: a server for the prediction and evaluation of ‘hot spots’ of aggregation in polypeptides. BMC Bioinf.  2007; 8:65. PubMed PMC

Sormanni  P., Aprile  F.A., Vendruscolo  M.  The CamSol method of rational design of protein mutants with enhanced solubility. J. Mol. Biol.  2015; 427:478–490. PubMed

Maurer-Stroh  S., Debulpaep  M., Kuemmerer  N., Lopez de la Paz  M., Martins  I.C., Reumers  J., Morris  K.L., Copland  A., Serpell  L., Serrano  L.  et al. .  Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat. Methods. 2010; 7:237–242. PubMed

Fernandez-Escamilla  A.-M., Rousseau  F., Schymkowitz  J., Serrano  L.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotechnol.  2004; 22:1302–1306. PubMed

Walsh  I., Seno  F., Tosatto  S.C.E., Trovato  A.  PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Res.  2014; 42:W301–W307. PubMed PMC

Zibaee  S., Makin  O.S., Goedert  M., Serpell  L.C.  A simple algorithm locates β-strands in the amyloid fibril core of α-synuclein, Aβ, and tau using the amino acid sequence alone. Protein Sci.  2007; 16:906–918. PubMed PMC

Garbuzynskiy  S.O., Lobanov  M.Y., Galzitskaya  O.V  FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Bioinformatics. 2010; 26:326–332. PubMed

Kuriata  A., Iglesias  V., Pujols  J., Kurcinski  M., Kmiecik  S., Ventura  S.  Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res.  2019; 47:W300–W307. PubMed PMC

Keresztes  L., Szögi  E., Varga  B., Farkas  V., Perczel  A., Grolmusz  V.  The budapest amyloid predictor and its applications. Biomolecules. 2021; 11:500. PubMed PMC

Niu  M., Li  Y., Wang  C., Han  K.  RFAmyloid: a web server for predicting amyloid proteins. Int. J. Mol. Sci.  2018; 19:2071. PubMed PMC

Burdukiewicz  M., Sobczyk  P., Rödiger  S., Duda-Madej  A., Mackiewicz  P., Kotulska  M.  Amyloidogenic motifs revealed by n-gram analysis. Sci. Rep.  2017; 7:12961. PubMed PMC

Navarro  S., Ventura  S.  Computational methods to predict protein aggregation. Curr. Opin. Struct. Biol.  2022; 73:102343. PubMed

Prabakaran  R., Rawat  P., Kumar  S., Michael Gromiha  M.  ANuPP: a versatile tool to predict aggregation nucleating regions in peptides and proteins. J. Mol. Biol.  2021; 433:166707. PubMed

Gasior  P., Kotulska  M.  FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids. BMC Bioinf.  2014; 15:54. PubMed PMC

Louros  N., Orlando  G., De Vleeschouwer  M., Rousseau  F., Schymkowitz  J.  Structure-based machine-guided mapping of amyloid sequence space reveals uncharted sequence clusters with higher solubilities. Nat. Commun.  2020; 11:3314. PubMed PMC

Louros  N., Konstantoulea  K., De Vleeschouwer  M., Ramakers  M., Schymkowitz  J., Rousseau  F.  WALTZ-DB 2.0: an updated database containing structural information of experimentally determined amyloid-forming peptides. Nucleic Acids Res.  2020; 48:D389–D393. PubMed PMC

Van Durme  J., Delgado  J., Stricher  F., Serrano  L., Schymkowitz  J., Rousseau  F.  A graphical interface for the FoldX forcefield. Bioinformatics. 2011; 27:1711–1712. PubMed

Varadi  M., De Baets  G., Vranken  W.F., Tompa  P., Pancsa  R.  AmyPro: a database of proteins with validated amyloidogenic regions. Nucleic Acids Res.  2018; 46:D387–D392. PubMed PMC

Rawat  P., Prabakaran  R., Sakthivel  R., Mary Thangakani  A., Kumar  S., Gromiha  M.M.  CPAD 2.0: a repository of curated experimental data on aggregating proteins and peptides. Amyloid. 2020; 27:128–133. PubMed

Cima  V., Kunka  A., Grakova  E., Planas-Iglesias  J., Havlasek  M., Subramanian  M., Beloch  M., Marek  M., Slaninova  K., Damborsky  J.  et al. .  Prediction of aggregation prone regions in proteins using deep neural networks and their suppression by computational design. 2024; bioRxiv doi:11 March 2024, preprint: not peer reviewed10.1101/2024.03.06.583680. DOI

Marcelino  A.M.C., Gierasch  L.M.  Roles of β-turns in protein folding: from peptide models to protein engineering. Biopolymers. 2008; 89:380–391. PubMed PMC

Barth  P., Senes  A.  Toward high-resolution computational design of the structure and function of helical membrane proteins. Nat. Struct. Mol. Biol.  2016; 23:475–480. PubMed PMC

Velecký  J., Hamsikova  M., Stourac  J., Musil  M., Damborsky  J., Bednar  D., Mazurenko  S.  SoluProtMutDB: a manually curated database of protein solubility changes upon mutations. Comput. Struct. Biotechnol. J.  2022; 20:6339–6347. PubMed PMC

Ruopp  M.D., Perkins  N.J., Whitcomb  B.W., Schisterman  E.F.  Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical Journal. 2008; 50:419–430. PubMed PMC

Abadi  M., Agarwal  A., Barham  P., Brevdo  E., Chen  Z., Citro  C., Corrado  G.S., Davis  A., Dean  J., Devin  M.  et al. .  TensorFlow: large-Scale machine learning on heterogeneous systems. 2015; Zenodo10.5281/zenodo.4724125. DOI

Zemla  A., Venclovas  Č., Fidelis  K., Rost  B.  A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins Struct. Funct. Genet.  1999; 34:220–223. PubMed

Varadi  M., Anyango  S., Deshpande  M., Nair  S., Natassia  C., Yordanova  G., Yuan  D., Stroe  O., Wood  G., Laydon  A.  et al. .  AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res.  2022; 50:D439–D444. PubMed PMC

Altschul  S.F., Gish  W., Miller  W., Myers  E.W., Lipman  D.J.  Basic local alignment search tool. J. Mol. Biol.  1990; 215:403–410. PubMed

Krogh  A., Larsson  B., von Heijne  G., Sonnhammer  E.L.L.  Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol.  2001; 305:567–580. PubMed

Kabsch  W., Sander  C.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983; 22:2577–2637. PubMed

Gohl  P., Bonet  J., Fornes  O., Planas-Iglesias  J., Fernandez-Fuentes  N., Oliva  B.  SBILib: a handle for protein modeling and engineering. Bioinformatics. 2023; 39:btad613. PubMed PMC

Lafita  A., Bliven  S., Prlić  A., Guzenko  D., Rose  P.W., Bradley  A., Pavan  P., Myers-Turnbull  D., Valasatava  Y., Heuer  M.  et al. .  BioJava 5: a community driven open-source bioinformatics library. PLoS Comput. Biol.  2019; 15:e1006791. PubMed PMC

Sehnal  D., Bittrich  S., Deshpande  M., Svobodová  R., Berka  K., Bazgier  V., Velankar  S., Burley  S.K., Koča  J., Rose  A.S.  Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res.  2021; 49:W431–W437. PubMed PMC

O’Rourke  T.W., Loya  T.J., Head  P.E., Horton  J.R., Reines  D.  Amyloid-like assembly of the low complexity domain of yeast Nab3. Prion. 2015; 9:34–47. PubMed PMC

Wittmer  Y., Jami  K.M., Stowell  R.K., Le  T., Hung  I., Murray  D.T.  Liquid droplet aging and seeded fibril formation of the cytotoxic granule associated RNA binding protein TIA1 low complexity domain. J. Am. Chem. Soc.  2023; 145:1580–1592. PubMed PMC

Si  K., Lindquist  S., Kandel  E.R.  A neuronal isoform of the aplysia CPEB has prion-like properties. Cell. 2003; 115:879–891. PubMed

Cserzo  M., Eisenhaber  F., Eisenhaber  B., Simon  I.  TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter. Bioinformatics. 2004; 20:136–137. PubMed

Schmidt  C., Macpherson  J.A., Lau  A.M., Tan  K.W., Fraternali  F., Politis  A.  Surface accessibility and dynamics of macromolecular assemblies probed by covalent labeling mass spectrometry and integrative modeling. Anal. Chem.  2017; 89:1459–1468. PubMed PMC

Markova  K., Chmelova  K., Marques  S.M., Carpentier  P., Bednar  D., Damborsky  J., Marek  M.  Decoding the intricate network of molecular interactions of a hyperstable engineered biocatalyst. Chem. Sci.  2020; 11:11162–11178. PubMed PMC

Buck  P.M., Kumar  S., Singh  S.K.  On the role of aggregation prone regions in protein evolution, stability, and enzymatic catalysis: insights from diverse analyses. PLoS Comput. Biol.  2013; 9:e1003291. PubMed PMC

Wrenbeck  E.E., Bedewitz  M.A., Klesmith  J.R., Noshin  S., Barry  C.S., Whitehead  T.A.  An automated data-driven pipeline for improving heterologous enzyme expression. ACS Synth. Biol.  2019; 8:474–481. PubMed PMC

Rosace  A., Bennett  A., Oeller  M., Mortensen  M.M., Sakhnini  L., Lorenzen  N., Poulsen  C., Sormanni  P.  Automated optimisation of solubility and conformational stability of antibodies and proteins. Nat. Commun.  2023; 14:1937. PubMed PMC

Klesmith  J.R., Bacik  J.-P., Wrenbeck  E.E., Michalczyk  R., Whitehead  T.A.  Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl. Acad. Sci. U.S.A.  2017; 114:2265–2270. PubMed PMC

Houben  B., Rousseau  F., Schymkowitz  J.  Protein structure and aggregation: a marriage of necessity ruled by aggregation gatekeepers. Trends Biochem. Sci. 2022; 47:194–205. PubMed

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...