Automated Phylogenetic Analysis Using Best Reciprocal BLAST
Jazyk angličtina Země Spojené státy americké Médium print
Typ dokumentu časopisecké články, práce podpořená grantem
Grantová podpora
203134/Z/16/Z
Wellcome Trust - United Kingdom
- Klíčová slova
- Automation, Drug discovery, Evolution, Homology, Ortholog, Phylogenetics, Sequence searching,
- MeSH
- fylogeneze * MeSH
- molekulární evoluce MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Reconstruction of the evolutionary history of specific protein-coding genes is an essential component of the biological sciences toolkit and relies on identification of orthologs (a gene in different organisms related by vertical descent from a common ancestor and usually presumed to have the same or similar function) and paralogs (a gene related to another in the same organism by descent from a single ancestral gene which may, or may not, retain the same/similar function) across a range of taxa. While obviously essential for the reconstruction of evolutionary histories, ortholog identification is of importance for protein expression, modeling for drug discovery programs, identification of critical residues and other studies. Here we describe an automated system for searching for orthologs and paralogs in eukaryotic organisms. Unlike manual methods the system is fast, requiring minimal user input while still being highly configurable.
Zobrazit více v PubMed
Stamboulian M, Guerrero RF, Hahn MW et al (2020) The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 36(Supplement_1):i219–i226. https://doi.org/10.1093/bioinformatics/btaa468 PubMed DOI PMC
Baragaña B, Forte B, Choi R et al (2019) Lysyl-tRNA synthetase as a drug target in malaria and cryptosporidiosis. Proc Natl Acad Sci U S A 116(14):7015–7020. https://doi.org/10.1073/pnas.1814685116 PubMed DOI PMC
Klinger CM, Ramirez-Macias I, Herman EK et al (2016) Resolving the homology—function relationship through comparative genomics of membrane-trafficking machinery and parasite cell biology. Mol Biochem Parasitol 209:88–103. https://doi.org/10.1016/j.molbiopara.2016.07.003 PubMed DOI PMC
Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44(D1):D286–D293. https://doi.org/10.1093/nar/gkv1248 PubMed DOI
Aslett M, Aurrecoechea C, Berriman M et al (2009) TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 38(Database issue):D457–D462. https://doi.org/10.1093/nar/gkp851 PubMed DOI PMC
Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16(1):157–157. https://doi.org/10.1186/s13059-015-0721-2 PubMed DOI PMC
Altenhoff AM, Glover NM, Dessimoz C (eds) (2019) Inferring orthology and paralogy (vol. 1910). Evolutionary genomics. Methods in molecular biology. Springer, New York
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389 PubMed DOI PMC
Klute MJ, Melançon P, Dacks JB (2011) Evolution and diversity of the Golgi. Cold Spring Harb Perspect Biol 3:a007849 DOI
Shen W, Le S, Li Y et al (2016) SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11(10):e0163962. https://doi.org/10.1371/journal.pone.0163962 PubMed DOI PMC
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. https://doi.org/10.1093/nar/gkh340 PubMed DOI PMC
Lawrence TJ, Kauffman KT, Amrine KCH et al (2015) FAST: FAST analysis of sequences toolbox. Front Genet 6:172. https://doi.org/10.3389/fgene.2015.00172 PubMed DOI PMC
Price MN, Dehal PS, Arkin AP (2010) FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5(3):e9490. https://doi.org/10.1371/journal.pone.0009490 PubMed DOI PMC
Grüning B, Dale R, Sjödin A et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476. https://doi.org/10.1038/s41592-018-0046-7 PubMed DOI
Waterhouse AM, Procter JB, Martin DM et al (2009) Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9):1189–1191. https://doi.org/10.1093/bioinformatics/btp033 PubMed DOI PMC
Barlow LD (2018) AMOEBAE. https://github.com/laelbarlow/amoebae
Larson RT, Dacks JB, Barlow LD (2019) Recent gene duplications dominate evolutionary dynamics of adaptor protein complex subunits in embryophytes. Traffic 20(12):961–973. https://doi.org/10.1111/tra.12698 PubMed DOI
The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10.1093/nar/gky1049 DOI
NCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46(D1):D8–D13. https://doi.org/10.1093/nar/gkx1095 DOI
Yates AD, Achuthan P, Akanni W et al (2020) Ensembl 2020. Nucleic Acids Res 48(D1):D682–D688. https://doi.org/10.1093/nar/gkz966 DOI
Aurrecoechea C, Barreto A, Basenko EY et al (2017) EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Res 45(D1):D581–D591. https://doi.org/10.1093/nar/gkw1105 PubMed DOI
Nordberg H, Cantor M, Dusheyko S et al (2014) The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42(D1):D26–D31. https://doi.org/10.1093/nar/gkt1069 PubMed DOI
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8(3):275–282. https://doi.org/10.1093/bioinformatics/8.3.275 DOI
Jukes TH, Cantor CR (eds) (1969) Evolution of protein molecules, Mammalian protein metabolism, vol 3. Academic, New York
Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320. https://doi.org/10.1093/molbev/msn067 PubMed DOI
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699. https://doi.org/10.1093/oxfordjournals.molbev.a003851 PubMed DOI
Liu K, Linder CR, Warnow T (2011) RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One 6(11):e27731. https://doi.org/10.1371/journal.pone.0027731 PubMed DOI PMC
Smirnov V, Warnow T (2021) Phylogeny estimation given sequence length heterogeneity. Syst Biol 70(2):268–282. https://doi.org/10.1093/sysbio/syaa058 PubMed DOI
Guindon S, Dufayard J-F, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321. https://doi.org/10.1093/sysbio/syq010 DOI
Ronquist F, Teslenko M, van der Mark P et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542. https://doi.org/10.1093/sysbio/sys029 PubMed DOI PMC
Kerfeld CA, Scott KM (2011) Using BLAST to teach “E-value-tionary” concepts. PLoS Biol 9(2):e1001014. https://doi.org/10.1371/journal.pbio.1001014 PubMed DOI PMC
Amid C, Alako BTF, Balavenkataraman Kadhirvelu V et al (2020) The European nucleotide archive in 2019. Nucleic Acids Res 48(D1):D70–D76. https://doi.org/10.1093/nar/gkz1063 PubMed DOI
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinform 10:421. https://doi.org/10.1186/1471-2105-10-421 DOI
Bethesda (MD): National Center for Biotechnology Information (US) (2008) Appendices. https://www.ncbi.nlm.nih.gov/books/NBK279684/
Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56(4):564–577. https://doi.org/10.1080/10635150701472164 DOI
Brinkmann H, van der Giezen M, Zhou Y et al (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54(5):743–757. https://doi.org/10.1080/10635150500234609 PubMed DOI
Bergsten J (2005) A review of long-branch attraction. Cladistics 21(2):163–193 DOI
A lineage-specific protein network at the trypanosome nuclear envelope