SoluProtMutDB: A manually curated database of protein solubility changes upon mutations

. 2022 ; 20 () : 6339-6347. [epub] 20221109

Status PubMed-not-MEDLINE Jazyk angličtina Země Nizozemsko Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid36420168
Odkazy

PubMed 36420168
PubMed Central PMC9678803
DOI 10.1016/j.csbj.2022.11.009
PII: S2001-0370(22)00502-5
Knihovny.cz E-zdroje

Protein solubility is an attractive engineering target primarily due to its relation to yields in protein production and manufacturing. Moreover, better knowledge of the mutational effects on protein solubility could connect several serious human diseases with protein aggregation. However, we have limited understanding of the protein structural determinants of solubility, and the available data have mostly been scattered in the literature. Here, we present SoluProtMutDB - the first database containing data on protein solubility changes upon mutations. Our database accommodates 33 000 measurements of 17 000 protein variants in 103 different proteins. The database can serve as an essential source of information for the researchers designing improved protein variants or those developing machine learning tools to predict the effects of mutations on solubility. The database comprises all the previously published solubility datasets and thousands of new data points from recent publications, including deep mutational scanning experiments. Moreover, it features many available experimental conditions known to affect protein solubility. The datasets have been manually curated with substantial corrections, improving suitability for machine learning applications. The database is available at loschmidt.chemi.muni.cz/soluprotmutdb.

Zobrazit více v PubMed

Stourac J., Dubrava J., Musil M., Horackova J., Damborsky J., Mazurenko S., Bednar D. FireProtDB: database of manually curated protein stability data. Nucleic Acids Res. 2020;49(D1):D319–D324. doi: 10.1093/nar/gkaa981. PubMed DOI PMC

Kulandaisamy A., Sakthivel R., Gromiha M.M. MPTherm: database for membrane protein thermodynamics for understanding folding and stability. Briefings Bioinform. 2020;22(2):2119–2125. doi: 10.1093/bib/bbaa064. PubMed DOI

Wang X., Zhang X., Peng C., Shi Y., Li H., Xu Z., Zhu W. D3distalmutation: a database to explore the effect of distal mutations on enzyme activity. J Chem Inf Model. 2021;61(5):2499–2508. doi: 10.1021/acs.jcim.1c00318. PubMed DOI

Shire S.J., Shahrokh Z., Liu J. Challenges in the development of high protein concentration formulations. J Pharm Sci. 2004;93(6):1390–1402. doi: 10.1002/jps.20079. URL  https://www.sciencedirect.com/science/article/pii/S0022354916315234. PubMed DOI

Vázquez-Rey M., Lang D.A. Aggregates in monoclonal antibody manufacturing processes, Biotechnol Bioeng 108 (7) (2011) 1494–1508, eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/bit.23155. doi:10.1002/bit.23155. https://onlinelibrary.wiley.com/doi/abs/10.1002/bit.23155. PubMed DOI

W. Chen, X. Chen, Z. Hu, H. Lin, F. Zhou, L. Luo, X. Zhang, X. Zhong, Y. Yang, C. Wu, Z. Lin, S. Ye, Y. Liu, F. t. S.G.O. Ccpmoh, A Missense Mutation in CRYBB2 Leads to Progressive Congenital Membranous Cataract by Impacting the Solubility and Function of PubMed PMC

Tian Y., Deutsch C., Krishnamoorthy B. Scoring function to predict solubility mutagenesis. Algorith Mol Biol. 2010;5(1):33. doi: 10.1186/1748-7188-5-33. PubMed DOI PMC

Sormanni P., Aprile F.A., Vendruscolo M. The camsol method of rational design of protein mutants with enhanced solubility. J Mol Biol. 2015;427(2):478–490. doi: 10.1016/j.jmb.2014.09.026. PubMed DOI

Zambrano R., Jamroz M., Szczasiuk A., Pujols J., Kmiecik S., Ventura S. AGGRESCAN3d (a3d): server for prediction of aggregation properties of protein structures. Nucleic Acids Res. 2015;43(W1):W306–W313. doi: 10.1093/nar/gkv359. PubMed DOI PMC

Yang Y., Niroula A., Shen B., Vihinen M. PON-sol: prediction of effects of amino acid substitutions on protein solubility. Bioinformatics. 2016;32(13):2032–2034. doi: 10.1093/bioinformatics/btw066. PubMed DOI

Yang Y., Zeng L., Vihinen M. Pon-sol2: Prediction of effects of variants on protein solubility. Int J Mol Sci. 2021;22(15) doi: 10.3390/ijms22158027. URL  https://www.mdpi.com/1422-0067/22/15/8027. PubMed DOI PMC

Klesmith J.R., Bacik J.-P., Wrenbeck E.E., Michalczyk R., Whitehead T.A. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning, Proc of the Natl Acad of Sci USA 114 (9) (2017) 2265–2270. arXiv:https://www.pnas.org/content/114/9/2265.full.pdf, doi:10.1073/pnas.1614437114. https://www.pnas.org/content/114/9/2265. PubMed PMC

Wrenbeck E., Bedewitz M., Klesmith J., Noshin S., Barry C., Whitehead T. An automated data-driven pipeline for improving heterologous enzyme expression. ACS Synthet Biol. 2019;8(02) doi: 10.1021/acssynbio.8b00486. PubMed DOI PMC

Mazurenko S., Prokop Z., Damborsky J. ACS Catal. Vol. 10. publisher: American Chemical Society; 2020. Machine Learning in Enzyme Engineering; pp. 1210–1223. DOI

T.U. Consortium, UniProt: the universal protein knowledgebase in 2021, Nucleic Acids Res 49 (D1) (2020) D480–D489. doi:10.1093/nar/gkaa1100. URL 10.1093/nar/gkaa1100. PubMed DOI PMC

Sumbalova L., Stourac J., Martinek T., Bednar D., Damborsky J. HotSpot wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information, Nucleic Acids Res 46 (W1) (2018) W356–W362. 10.1093/nar/gky417. PubMed DOI PMC

Kaur J., Kumar A., Kaur J. Strategies for optimization of heterologous protein expression in E. coli: Roadblocks and reinforcements. Int J Biol Macromol. 2018;106:803–822. doi: 10.1016/j.ijbiomac.2017.08.080. PubMed DOI

Slanská K. Study of protein solubility [online] Master’s thesis, Faculty of Science, Masaryk University, Brno (2021). URL Availableat<https://is.muni.cz/th/e3jlf/>

Bendl J., Stourac J., Sebestova E., Vavra O., Musil M., Brezovsky J., Damborsky J. HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering, Nucleic Acids Res 44 (Web Server issue) (2016) W479–W487. doi:10.1093/nar/gkw416. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987947/. PubMed PMC

Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L. BLAST+: architecture and applications. BMC Bioinform. 2009;10:421. doi: 10.1186/1471-2105-10-421. PubMed DOI PMC

Suzek B.E., Wang Y., Huang H., McGarvey P.B., Wu C.H. UniProt Consortium, UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics (Oxford, England) 2015;31(6):926–932. doi: 10.1093/bioinformatics/btu739. PubMed DOI PMC

Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics (Oxford, England) 2010;26(19):2460–2461. doi: 10.1093/bioinformatics/btq461. PubMed DOI

Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539. doi: 10.1038/msb.2011.75. PubMed DOI PMC

Capra J.A., Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics (Oxford, England) 2007;23(15):1875–1882. doi: 10.1093/bioinformatics/btm270. PubMed DOI

Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22(12):2577–2637. doi: 10.1002/bip.360221211. PubMed DOI

Shrake A., Rupley J.A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol. 1973;79(2):351–371. doi: 10.1016/0022-2836(73)90011-9. PubMed DOI

Reetz M.T., Carballeira J.D., Vogel A. Iterative Saturation Mutagenesis on the Basis of B Factors as a Strategy for Increasing Protein Thermostability, Angewandte Chem Int Ed 45(46) (2006) 7745–7751, eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/anie.200602795. doi:10.1002/anie.200602795. https://onlinelibrary.wiley.com/doi/abs/10.1002/anie.200602795. PubMed DOI

Le Guilloux V., Schmidtke P., Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinform. 2009;10:168. doi: 10.1186/1471-2105-10-168. PubMed DOI PMC

Chovancova E., Pavelka A., Benes P., Strnad O., Brezovsky J., Kozlikova B., Gora A., Sustr V., Klvana M., Medek P., Biedermannova L., Sochor J., Damborsky J. CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput Biol. 2012;8(10) doi: 10.1371/journal.pcbi.1002708. PubMed DOI PMC

Velankar S., Dana J.M., Jacobsen J., van Ginkel G., Gane P.J., Luo J., Oldfield T.J., O’Donovan C., Martin M.-J., Kleywegt G.J. SIFTS: Structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 2012;41(D1):D483–D489. doi: 10.1093/nar/gks1258. PubMed DOI PMC

M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, B. Mons, The FAIR guiding principles for scientific data management and stewardship, Sci Data 3(1) (Mar. 2016). doi:10.1038/sdata.2016.18. URL 10.1038/sdata.2016.18. DOI

Watkins X., Garcia L.J., Pundir S., Martin M.J. the UniProt Consortium, Protvista: visualization of protein sequence annotations. Bioinformatics. 2017;33(13):2040–2041. doi: 10.1093/bioinformatics/btx120. PubMed DOI PMC

Sehnal D., Bittrich S., Deshpande M., Svobodova R., Berka K., Bazgier V., Velankar S., Burley S.K., Koca J., Rose A.S. Mol* viewer: modern web app for 3d visualization and analysis of large biomolecular structures, Nucleic Acids Res 49(W1) (2021) W431–W437. 10.1093/nar/gkab314. PubMed DOI PMC

Pucci F., Schwersensky M., Rooman M. Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol. 2022;72:161–168. doi: 10.1016/j.sbi.2021.11.001. URL  https://www.sciencedirect.com/science/article/pii/S0959440X21001445. PubMed DOI

Fang J. A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Briefings Bioinform. 2020;21(4):1285–1292. doi: 10.1093/bib/bbz071. PubMed DOI PMC

Sanavia T., Birolo G., Montanucci L., Turina P., Capriotti E., Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J. 2020;18:1968–1979. doi: 10.1016/j.csbj.2020.07.011. PubMed DOI PMC

Gustafsson C., Govindarajan S., Minshull J. Codon bias and heterologous protein expression. Trends Biotechnol. 2004;22(7):346–353. doi: 10.1016/j.tibtech.2004.04.006. URL  https://www.sciencedirect.com/science/article/pii/S0167779904001118. PubMed DOI

Kuroda Y. Biophysical studies of protein solubility and amorphous aggregation by systematic mutational analysis and a helical polymerization model. Biophys Rev. 2018;10(2):473–480. doi: 10.1007/s12551-017-0342-y. URL  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5899702/ PubMed DOI PMC

Kozlowski L.P. Proteome-pI: proteome isoelectric point database. Nucleic Acids Res. 2017;45(D1):D1112–D1116. doi: 10.1093/nar/gkw978. PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

AggreProt: a web server for predicting and engineering aggregation prone regions in proteins

. 2024 Jul 05 ; 52 (W1) : W159-W169.

Machine Learning-Guided Protein Engineering

. 2023 Nov 03 ; 13 (21) : 13863-13895. [epub] 20231013

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...