CoLiDe: Combinatorial Library Design tool for probing protein sequence space

. 2021 May 01 ; 37 (4) : 482-489.

Jazyk angličtina Země Anglie, Velká Británie Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid32956450

MOTIVATION: Current techniques of protein engineering focus mostly on re-designing small targeted regions or defined structural scaffolds rather than constructing combinatorial libraries of versatile compositions and lengths. This is a missed opportunity because combinatorial libraries are emerging as a vital source of novel functional proteins and are of interest in diverse research areas. RESULTS: Here, we present a computational tool for Combinatorial Library Design (CoLiDe) offering precise control over protein sequence composition, length and diversity. The algorithm uses evolutionary approach to provide solutions to combinatorial libraries of degenerate DNA templates. We demonstrate its performance and precision using four different input alphabet distribution on different sequence lengths. In addition, a model design and experimental pipeline for protein library expression and purification is presented, providing a proof-of-concept that our protocol can be used to prepare purified protein library samples of up to 1011-1012 unique sequences. CoLiDe presents a composition-centric approach to protein design towards different functional phenomena. AVAILABILITYAND IMPLEMENTATION: CoLiDe is implemented in Python and freely available at https://github.com/voracva1/CoLiDe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Zobrazit více v PubMed

Afgan E.  et al. (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res., 46, W537–W544. PubMed PMC

Blanco C.  et al. (2018) Analysis of evolutionarily independent protein–RNA complexes yields a criterion to evaluate the relevance of prebiotic scenarios. Curr. Biol., 28, 526–537.e5. PubMed

Bornberg-Bauer E., Heames B. (2019) Becoming a de novo gene. Nat. Ecol. Evol., 3, 524–525. PubMed

Cedano J.  et al. (1997) Relation between amino acid composition and cellular location of proteins. J. Mol. Biol., 266, 594–600. PubMed

Chao F.-A.  et al. (2013) Structure and dynamics of a primordial catalytic fold generated by in vitro evolution. Nat. Chem. Biol., 9, 81–83. PubMed PMC

Chiarabelli C.  et al. (2006) Investigation of de novo Totally Random Biosequences. Chem. Biodivers., 3, 827–839. PubMed

Cho G.  et al. (2000) Constructing high complexity synthetic libraries of long ORFs using in vitro selection. J. Mol. Biol., 297, 309–319. PubMed

Craig R.A.  et al. (2009) Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm. Nucleic Acids Res., 38, 1–9. PubMed PMC

Davidson A.R., Sauer R.T. (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc. Natl. Acad. Sci. USA, 91, 2146–2150. PubMed PMC

Doi N.  et al. (2005) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng. Des. Sel., 18, 279–284. PubMed

Donnelly A.E.  et al. (2018) A de novo enzyme catalyzes a life-sustaining reaction in Escherichia coli. Nat. Chem. Biol., 14, 253–255. PubMed

Fisher M. a.  et al. (2011) De novo designed proteins from a library of artificial sequences function in Escherichia Coli and enable cell growth. PLoS One, 6, e15364. PubMed PMC

Govindarajan S.  et al. (1999) Estimating the total number of protein folds. Proteins Struct. Funct. Genet., 35, 408–414. PubMed

Guruprasad K.  et al. (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. Des. Sel., 4, 155–161. PubMed

Jacobs T.M.  et al. (2015) SwiftLib: rapid degenerate-codon-library optimization through dynamic programming. Nucleic Acids Res., 43, 1–10. PubMed PMC

Jaradat D.M.M. (2018) Thirteen decades of peptide synthesis: key developments in solid phase peptide synthesis and amide bond formation utilized in peptide ligation. Amino Acids, 50, 39–68. PubMed

Keefe A.D., Szostak J.W. (2001) Functional proteins from a random-sequence library. Nature, 410, 715–718. PubMed PMC

Kille S.  et al. (2013) Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis. ACS Synth. Biol., 2, 83–92. PubMed

Labean T.H.  et al. (2011) Protein folding absent selection. Genes (Basel), 2, 608–626. PubMed PMC

Liu C.C., Schultz P.G. (2010) Adding new chemistries to the genetic code. Annu. Rev. Biochem., 79, 413–444. PubMed

Luisi P.L. (2006) The Emergence of Life: From Chemical Origins to Synthetic Biology, 1st edn. Cambridge University Press, Cambridge, UK.

Murphy L.R.  et al. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng. Des. Sel., 13, 149–152. PubMed

Neme R.  et al. (2017) Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol., 1, 1–7. PubMed PMC

Newton M.S.  et al. (2019) Genetic code evolution investigated through the synthesis and characterisation of proteins from reduced-alphabet libraries. ChemBioChem, 20, 846–856. PubMed

Ravarani C.N.  et al. (2018) High-throughput discovery of functional disordered regions: investigation of transactivation domains. Mol. Syst. Biol., 14, e8190. PubMed PMC

Rebollo I.R.  et al. (2014) Identification of target-binding peptide motifs by high-throughput sequencing of phage-selected peptides. Nucleic Acids Res., 42, e169. PubMed PMC

Riba A.  et al. (2018) Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates. Proceedings of the National Academy of Sciences. 116, 15023–15032. PubMed PMC

Shimko T.C.  et al. (2020) DeCoDe: degenerate codon design for complete protein-coding DNA libraries. Bioinformatics, 36, 3357–3357. PubMed PMC

Solis A.D. (2019) Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds. BMC Evol. Biol., 19, 1–19. PubMed PMC

Tang L.  et al. (2012) Construction of ‘small-intelligent’ focused mutagenesis libraries using well-designed combinatorial degenerate primers. Biotechniques, 52, 149–158. PubMed

Tretyachenko V.  et al. (2017) Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci. Rep., 7, 1–9. PubMed PMC

Virnekas B.  et al. (1994) Trinucleotide phosphoramidites: ideal reagents for the synthesis of mixed oligonucleotides for random mutagenesis. Nucleic Acids Res., 22, 5600–5607. PubMed PMC

Vymětal J.  et al. (2019) Sequence versus composition: what prescribes IDP biophysical properties?  Entropy, 21, 654–658. PubMed PMC

Wang J.  et al. (2018) A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell, 174, 688–699.e16. PubMed PMC

Weidmann L.  et al. (2019) Where Natural Protein Sequences Stand out from Randomness. bioRxiv, 706119.

Wolf E., Kim P.S. (2008) Combinatorial codons: a computer program to approximate amino acid probabilities with biased nucleotide usage. Protein Sci., 8, 680–688. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...