Automated Phylogenetic Analysis Using Best Reciprocal BLAST

Jazyk angličtina Země Spojené státy americké Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid34313983

Grantová podpora
203134/Z/16/Z Wellcome Trust - United Kingdom

Reconstruction of the evolutionary history of specific protein-coding genes is an essential component of the biological sciences toolkit and relies on identification of orthologs (a gene in different organisms related by vertical descent from a common ancestor and usually presumed to have the same or similar function) and paralogs (a gene related to another in the same organism by descent from a single ancestral gene which may, or may not, retain the same/similar function) across a range of taxa. While obviously essential for the reconstruction of evolutionary histories, ortholog identification is of importance for protein expression, modeling for drug discovery programs, identification of critical residues and other studies. Here we describe an automated system for searching for orthologs and paralogs in eukaryotic organisms. Unlike manual methods the system is fast, requiring minimal user input while still being highly configurable.

Zobrazit více v PubMed

Stamboulian M, Guerrero RF, Hahn MW et al (2020) The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 36(Supplement_1):i219–i226. https://doi.org/10.1093/bioinformatics/btaa468 PubMed DOI PMC

Baragaña B, Forte B, Choi R et al (2019) Lysyl-tRNA synthetase as a drug target in malaria and cryptosporidiosis. Proc Natl Acad Sci U S A 116(14):7015–7020. https://doi.org/10.1073/pnas.1814685116 PubMed DOI PMC

Klinger CM, Ramirez-Macias I, Herman EK et al (2016) Resolving the homology—function relationship through comparative genomics of membrane-trafficking machinery and parasite cell biology. Mol Biochem Parasitol 209:88–103. https://doi.org/10.1016/j.molbiopara.2016.07.003 PubMed DOI PMC

Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44(D1):D286–D293. https://doi.org/10.1093/nar/gkv1248 PubMed DOI

Aslett M, Aurrecoechea C, Berriman M et al (2009) TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 38(Database issue):D457–D462. https://doi.org/10.1093/nar/gkp851 PubMed DOI PMC

Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16(1):157–157. https://doi.org/10.1186/s13059-015-0721-2 PubMed DOI PMC

Altenhoff AM, Glover NM, Dessimoz C (eds) (2019) Inferring orthology and paralogy (vol. 1910). Evolutionary genomics. Methods in molecular biology. Springer, New York

Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389 PubMed DOI PMC

Klute MJ, Melançon P, Dacks JB (2011) Evolution and diversity of the Golgi. Cold Spring Harb Perspect Biol 3:a007849 DOI

Shen W, Le S, Li Y et al (2016) SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11(10):e0163962. https://doi.org/10.1371/journal.pone.0163962 PubMed DOI PMC

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. https://doi.org/10.1093/nar/gkh340 PubMed DOI PMC

Lawrence TJ, Kauffman KT, Amrine KCH et al (2015) FAST: FAST analysis of sequences toolbox. Front Genet 6:172. https://doi.org/10.3389/fgene.2015.00172 PubMed DOI PMC

Price MN, Dehal PS, Arkin AP (2010) FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5(3):e9490. https://doi.org/10.1371/journal.pone.0009490 PubMed DOI PMC

Grüning B, Dale R, Sjödin A et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476. https://doi.org/10.1038/s41592-018-0046-7 PubMed DOI

Waterhouse AM, Procter JB, Martin DM et al (2009) Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9):1189–1191. https://doi.org/10.1093/bioinformatics/btp033 PubMed DOI PMC

Barlow LD (2018) AMOEBAE. https://github.com/laelbarlow/amoebae

Larson RT, Dacks JB, Barlow LD (2019) Recent gene duplications dominate evolutionary dynamics of adaptor protein complex subunits in embryophytes. Traffic 20(12):961–973. https://doi.org/10.1111/tra.12698 PubMed DOI

The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10.1093/nar/gky1049 DOI

NCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46(D1):D8–D13. https://doi.org/10.1093/nar/gkx1095 DOI

Yates AD, Achuthan P, Akanni W et al (2020) Ensembl 2020. Nucleic Acids Res 48(D1):D682–D688. https://doi.org/10.1093/nar/gkz966 DOI

Aurrecoechea C, Barreto A, Basenko EY et al (2017) EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Res 45(D1):D581–D591. https://doi.org/10.1093/nar/gkw1105 PubMed DOI

Nordberg H, Cantor M, Dusheyko S et al (2014) The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42(D1):D26–D31. https://doi.org/10.1093/nar/gkt1069 PubMed DOI

Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8(3):275–282. https://doi.org/10.1093/bioinformatics/8.3.275 DOI

Jukes TH, Cantor CR (eds) (1969) Evolution of protein molecules, Mammalian protein metabolism, vol 3. Academic, New York

Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320. https://doi.org/10.1093/molbev/msn067 PubMed DOI

Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699. https://doi.org/10.1093/oxfordjournals.molbev.a003851 PubMed DOI

Liu K, Linder CR, Warnow T (2011) RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One 6(11):e27731. https://doi.org/10.1371/journal.pone.0027731 PubMed DOI PMC

Smirnov V, Warnow T (2021) Phylogeny estimation given sequence length heterogeneity. Syst Biol 70(2):268–282. https://doi.org/10.1093/sysbio/syaa058 PubMed DOI

Guindon S, Dufayard J-F, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321. https://doi.org/10.1093/sysbio/syq010 DOI

Ronquist F, Teslenko M, van der Mark P et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542. https://doi.org/10.1093/sysbio/sys029 PubMed DOI PMC

Kerfeld CA, Scott KM (2011) Using BLAST to teach “E-value-tionary” concepts. PLoS Biol 9(2):e1001014. https://doi.org/10.1371/journal.pbio.1001014 PubMed DOI PMC

Amid C, Alako BTF, Balavenkataraman Kadhirvelu V et al (2020) The European nucleotide archive in 2019. Nucleic Acids Res 48(D1):D70–D76. https://doi.org/10.1093/nar/gkz1063 PubMed DOI

Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinform 10:421. https://doi.org/10.1186/1471-2105-10-421 DOI

Bethesda (MD): National Center for Biotechnology Information (US) (2008) Appendices. https://www.ncbi.nlm.nih.gov/books/NBK279684/

Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56(4):564–577. https://doi.org/10.1080/10635150701472164 DOI

Brinkmann H, van der Giezen M, Zhou Y et al (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54(5):743–757. https://doi.org/10.1080/10635150500234609 PubMed DOI

Bergsten J (2005) A review of long-branch attraction. Cladistics 21(2):163–193 DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

A lineage-specific protein network at the trypanosome nuclear envelope

. 2024 Dec ; 15 (1) : 2310452. [epub] 20240411

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...