CoLiDe: Combinatorial Library Design tool for probing protein sequence space
Jazyk angličtina Země Anglie, Velká Británie Médium print
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
32956450
PubMed Central
PMC8088326
DOI
10.1093/bioinformatics/btaa804
PII: 5909645
Knihovny.cz E-zdroje
- MeSH
- algoritmy * MeSH
- genová knihovna MeSH
- proteinové inženýrství MeSH
- proteiny * genetika MeSH
- sekvence aminokyselin MeSH
- software MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
MOTIVATION: Current techniques of protein engineering focus mostly on re-designing small targeted regions or defined structural scaffolds rather than constructing combinatorial libraries of versatile compositions and lengths. This is a missed opportunity because combinatorial libraries are emerging as a vital source of novel functional proteins and are of interest in diverse research areas. RESULTS: Here, we present a computational tool for Combinatorial Library Design (CoLiDe) offering precise control over protein sequence composition, length and diversity. The algorithm uses evolutionary approach to provide solutions to combinatorial libraries of degenerate DNA templates. We demonstrate its performance and precision using four different input alphabet distribution on different sequence lengths. In addition, a model design and experimental pipeline for protein library expression and purification is presented, providing a proof-of-concept that our protocol can be used to prepare purified protein library samples of up to 1011-1012 unique sequences. CoLiDe presents a composition-centric approach to protein design towards different functional phenomena. AVAILABILITYAND IMPLEMENTATION: CoLiDe is implemented in Python and freely available at https://github.com/voracva1/CoLiDe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Department of Biochemistry Faculty of Science Charles University 128 00 Prague 2 Czech Republic
Department of Cell Biology Faculty of Science Charles University Biocev Prague Czech Republic
Earth Life Science Institute Tokyo Institute of Technology Tokyo 1528550 Japan
Zobrazit více v PubMed
Afgan E. et al. (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res., 46, W537–W544. PubMed PMC
Blanco C. et al. (2018) Analysis of evolutionarily independent protein–RNA complexes yields a criterion to evaluate the relevance of prebiotic scenarios. Curr. Biol., 28, 526–537.e5. PubMed
Bornberg-Bauer E., Heames B. (2019) Becoming a de novo gene. Nat. Ecol. Evol., 3, 524–525. PubMed
Cedano J. et al. (1997) Relation between amino acid composition and cellular location of proteins. J. Mol. Biol., 266, 594–600. PubMed
Chao F.-A. et al. (2013) Structure and dynamics of a primordial catalytic fold generated by in vitro evolution. Nat. Chem. Biol., 9, 81–83. PubMed PMC
Chiarabelli C. et al. (2006) Investigation of de novo Totally Random Biosequences. Chem. Biodivers., 3, 827–839. PubMed
Cho G. et al. (2000) Constructing high complexity synthetic libraries of long ORFs using in vitro selection. J. Mol. Biol., 297, 309–319. PubMed
Craig R.A. et al. (2009) Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm. Nucleic Acids Res., 38, 1–9. PubMed PMC
Davidson A.R., Sauer R.T. (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc. Natl. Acad. Sci. USA, 91, 2146–2150. PubMed PMC
Doi N. et al. (2005) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng. Des. Sel., 18, 279–284. PubMed
Donnelly A.E. et al. (2018) A de novo enzyme catalyzes a life-sustaining reaction in Escherichia coli. Nat. Chem. Biol., 14, 253–255. PubMed
Fisher M. a. et al. (2011) De novo designed proteins from a library of artificial sequences function in Escherichia Coli and enable cell growth. PLoS One, 6, e15364. PubMed PMC
Govindarajan S. et al. (1999) Estimating the total number of protein folds. Proteins Struct. Funct. Genet., 35, 408–414. PubMed
Guruprasad K. et al. (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. Des. Sel., 4, 155–161. PubMed
Jacobs T.M. et al. (2015) SwiftLib: rapid degenerate-codon-library optimization through dynamic programming. Nucleic Acids Res., 43, 1–10. PubMed PMC
Jaradat D.M.M. (2018) Thirteen decades of peptide synthesis: key developments in solid phase peptide synthesis and amide bond formation utilized in peptide ligation. Amino Acids, 50, 39–68. PubMed
Keefe A.D., Szostak J.W. (2001) Functional proteins from a random-sequence library. Nature, 410, 715–718. PubMed PMC
Kille S. et al. (2013) Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis. ACS Synth. Biol., 2, 83–92. PubMed
Labean T.H. et al. (2011) Protein folding absent selection. Genes (Basel), 2, 608–626. PubMed PMC
Liu C.C., Schultz P.G. (2010) Adding new chemistries to the genetic code. Annu. Rev. Biochem., 79, 413–444. PubMed
Luisi P.L. (2006) The Emergence of Life: From Chemical Origins to Synthetic Biology, 1st edn. Cambridge University Press, Cambridge, UK.
Murphy L.R. et al. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng. Des. Sel., 13, 149–152. PubMed
Neme R. et al. (2017) Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol., 1, 1–7. PubMed PMC
Newton M.S. et al. (2019) Genetic code evolution investigated through the synthesis and characterisation of proteins from reduced-alphabet libraries. ChemBioChem, 20, 846–856. PubMed
Ravarani C.N. et al. (2018) High-throughput discovery of functional disordered regions: investigation of transactivation domains. Mol. Syst. Biol., 14, e8190. PubMed PMC
Rebollo I.R. et al. (2014) Identification of target-binding peptide motifs by high-throughput sequencing of phage-selected peptides. Nucleic Acids Res., 42, e169. PubMed PMC
Riba A. et al. (2018) Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates. Proceedings of the National Academy of Sciences. 116, 15023–15032. PubMed PMC
Shimko T.C. et al. (2020) DeCoDe: degenerate codon design for complete protein-coding DNA libraries. Bioinformatics, 36, 3357–3357. PubMed PMC
Solis A.D. (2019) Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds. BMC Evol. Biol., 19, 1–19. PubMed PMC
Tang L. et al. (2012) Construction of ‘small-intelligent’ focused mutagenesis libraries using well-designed combinatorial degenerate primers. Biotechniques, 52, 149–158. PubMed
Tretyachenko V. et al. (2017) Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci. Rep., 7, 1–9. PubMed PMC
Virnekas B. et al. (1994) Trinucleotide phosphoramidites: ideal reagents for the synthesis of mixed oligonucleotides for random mutagenesis. Nucleic Acids Res., 22, 5600–5607. PubMed PMC
Vymětal J. et al. (2019) Sequence versus composition: what prescribes IDP biophysical properties? Entropy, 21, 654–658. PubMed PMC
Wang J. et al. (2018) A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell, 174, 688–699.e16. PubMed PMC
Weidmann L. et al. (2019) Where Natural Protein Sequences Stand out from Randomness. bioRxiv, 706119.
Wolf E., Kim P.S. (2008) Combinatorial codons: a computer program to approximate amino acid probabilities with biased nucleotide usage. Protein Sci., 8, 680–688. PubMed PMC