Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life
Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu časopisecké články
Grantová podpora
R35 GM122592
NIGMS NIH HHS - United States
F32 GM135998
NIGMS NIH HHS - United States
R35 GM148244
NIGMS NIH HHS - United States
T32 HG000044
NHGRI NIH HHS - United States
R35 GM118165
NIGMS NIH HHS - United States
R35 GM137834
NIGMS NIH HHS - United States
K99 GM137041
NIGMS NIH HHS - United States
PubMed
39024225
PubMed Central
PMC11257246
DOI
10.1371/journal.pbio.3002697
PII: PBIOLOGY-D-23-02916
Knihovny.cz E-zdroje
- MeSH
- Drosophilidae * genetika klasifikace MeSH
- fylogeneze * MeSH
- genom hmyzu * MeSH
- genomika * metody MeSH
- sekvenční analýza DNA metody MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1 Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
Baylor College of Medicine Houston Texas United States of America
CZ Biohub Investigator San Francisco California United States of America
Daintree Rainforest Observatory James Cook University Townsville Australia
Department of Biological and Environmental Science University of Jyväskylä Jyväskylä Finland
Department of Biological Sciences Hokkaido University Sapporo Japan
Department of Biological Sciences Tokyo Metropolitan University Tokyo Japan
Department of Biological Sciences Virginia Tech Blacksburg Virginia United States of America
Department of Biology Case Western Reserve University Cleveland Ohio United States of America
Department of Biology Stanford University Stanford California United States of America
Department of Complexity Science and Engineering The University of Tokyo Tokyo Japan
Department of Developmental Biology Stanford University Stanford California United States of America
Department of Entomology Cornell University Ithaca New York United States of America
Department of Zoology The University of British Columbia Vancouver Canada
Hokkaido University Museum Hokkaido University Sapporo Japan
Institute of Ecology and Evolution University of Edinburgh Edinburgh United Kingdom
Institute of Entomology Biology Centre Czech Academy of Sciences České Budějovice Czech Republic
Pacific Biosciences Research Center University of Hawai'i Mānoa Hawaii United States of America
School of Environmental and Natural Sciences Bangor University Bangor United Kingdom
School of Life Sciences University of Nevada Las Vegas Las Vegas Nevada United States of America
Zobrazit více v PubMed
Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al.. The Genome Sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. doi: 10.1126/science.287.5461.2185 PubMed DOI
Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, et al.. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Res. 2005;15:1–18. doi: 10.1101/gr.3059305 PubMed DOI PMC
Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, et al.. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–218. doi: 10.1038/nature06341 PubMed DOI
modENCODE Consortium T, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, et al.. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science. 2010;330:1787–1797. doi: 10.1126/science.1198374 PubMed DOI PMC
Mackay TFC, Richards S, Stone EA, Barbadilla A, Ayroles JF, Zhu D, et al.. The Drosophila melanogaster Genetic Reference Panel. Nature. 2012;482:173–178. doi: 10.1038/nature10811 PubMed DOI PMC
Li H, Janssens J, De Waegeneer M, Kolluru SS, Davie K, Gardeux V, et al.. Fly Cell Atlas: A single-nucleus transcriptomic atlas of the adult fruit fly. Science. 2022;375:eabk2432. doi: 10.1126/science.abk2432 PubMed DOI PMC
Finet C, Kassner VA, Carvalho AB, Chung H, Day JP, Day S, et al.. DrosoPhyla: Resources for Drosophilid Phylogeny and Systematics. Genome Biol Evol. 2021;13:evab179. doi: 10.1093/gbe/evab179 PubMed DOI PMC
Kim BY, Wang JR, Miller DE, Barmina O, Delaney E, Thompson A, et al.. Highly contiguous assemblies of 101 drosophilid genomes. Coop G, Wittkopp PJ, Sackton TB, editors. eLife. 2021;10:e66405. doi: 10.7554/eLife.66405 PubMed DOI PMC
Miller DE, Staber C, Zeitlinger J, Hawley RS. Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing. G3 Genes Genomes Genet. 2018;8:3131–3141. doi: 10.1534/g3.118.200160 PubMed DOI PMC
Solares EA, Chakraborty M, Miller DE, Kalsow S, Hall K, Perera AG, et al.. Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing. G3 Genes Genomes Genet. 2018;8:3143–3154. doi: 10.1534/g3.118.200162 PubMed DOI PMC
O’Grady PM, DeSalle R. Phylogeny of the Genus Drosophila. Genetics. 2018;209:1–25. doi: 10.1534/genetics.117.300583 PubMed DOI PMC
Church SH, Extavour CG. Phylotranscriptomics Reveals Discordance in the Phylogeny of Hawaiian Drosophila and Scaptomyza (Diptera: Drosophilidae). Mol Biol Evol. 2022;39:msac012. doi: 10.1093/molbev/msac012 PubMed DOI PMC
Magnacca K, Price D. New species of Hawaiian picture wing Drosophila (Diptera: Drosophilidae), with a key to species. Zootaxa. 2012;3188:1–30. doi: 10.11646/zootaxa.3188.1.1 DOI
O’Grady P, Magnacca K, Lapoint R. Taxonomic relationships within the endemic Hawaiian Drosophilidae (Insecta: Diptera). Rec Hawaii Biol Surv. 2010;108:1–34.
Aury J-M, Istace B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genomics Bioinforma. 2021;3:lqab034. doi: 10.1093/nargab/lqab034 PubMed DOI PMC
Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36:2896–2898. doi: 10.1093/bioinformatics/btaa025 PubMed DOI PMC
Adams M, McBroome J, Maurer N, Pepper-Tunick E, Saremi NF, Green RE, et al.. One fly–one genome: chromosome-scale genome assembly of a single outbred Drosophila melanogaster. Nucleic Acids Res. 2020;48:e75–e75. doi: 10.1093/nar/gkaa450 PubMed DOI PMC
Kingan SB, Heaton H, Cudini J, Lambert CC, Baybayan P, Galvin BD, et al.. A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing. Genes. 2019;10:62. doi: 10.3390/genes10010062 PubMed DOI PMC
Obbard DJ, Wellcome Sanger Institute Tree of Life programme, Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium. The genome sequence of a drosophilid fruit fly, Hirtodrosophila cameraria (Haliday, 1833). 2023. [cited 2023 Sep 18]. Available from: https://wellcomeopenresearch.org/articles/8-361. PubMed PMC
dos Santos G, Schroeder AJ, Goodman JL, Strelets VB, Crosby MA, Thurmond J, et al.. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res. 2015;43:D690–D697. doi: 10.1093/nar/gku1099 PubMed DOI PMC
Suvorov A, Kim BY, Wang J, Armstrong EE, Peede D, D’Agostino ERR, et al.. Widespread introgression across a phylogeny of 155 Drosophila genomes. Curr Biol. 2022;32:111–123.e5. doi: 10.1016/j.cub.2021.10.052 PubMed DOI PMC
Bächli G. TaxoDros. Jul 2023. [cited 2023 Sep 18]. Available from: https://www.taxodros.uzh.ch/.
Grimaldi DA. A phylogenetic, revised classification of genera in the Drosophilidae (Diptera). Bull AMNH; no. 197. 1990. [cited 2024 May 14]. Available from: http://hdl.handle.net/2246/888.
Yassin A. Phylogenetic classification of the Drosophilidae Rondani (Diptera): the role of morphology in the postgenomic era. Syst Entomol. 2013;38:349–364. doi: 10.1111/j.1365-3113.2012.00665.x DOI
Spieth HT, Heed WB. The Drosophila pinicola species group. (Diptera: Drosophilidae). Pan-Pac Entomol. 1975;51:287–295.
Yin J, Zhang C, Mirarab S. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics. 2019;35:3961–3969. doi: 10.1093/bioinformatics/btz211 PubMed DOI
Hoskins RA, Carlson JW, Wan KH, Park S, Mendez I, Galle SE, et al.. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 2015;25:445–458. doi: 10.1101/gr.185579.114 PubMed DOI PMC
Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021. [cited 2021 Jun 6]. doi: 10.1093/bioinformatics/btaa1016 PubMed DOI PMC
Chang C-H, Larracuente AM. Heterochromatin-Enriched Assemblies Reveal the Sequence and Organization of the Drosophila melanogaster Y Chromosome. Genetics. 2019;211:333–348. doi: 10.1534/genetics.118.301765 PubMed DOI PMC
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, et al.. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592:737–746. doi: 10.1038/s41586-021-03451-0 PubMed DOI PMC
Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5 PubMed DOI PMC
Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9 PubMed DOI PMC
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, et al.. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. bioRxiv. 2023. doi: 10.1101/2023.01.12.523790 PubMed DOI PMC
Holley G, Beyter D, Ingimundardottir H, Møller PL, Kristmundsdottir S, Eggertsson HP, et al.. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol. 2021;22:28. doi: 10.1186/s13059-020-02244-4 PubMed DOI PMC
Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol. 2021;38:4647–4654. doi: 10.1093/molbev/msab199 PubMed DOI PMC
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree. Nat Biotechnol. 2023:1–9. doi: 10.1038/s41587-023-01753-4 PubMed DOI PMC
Zhang F, Ding Y, Zhu C-D, Zhou X, Orr MC, Scheu S, et al.. Phylogenomics from low-coverage whole-genome sequencing. Methods Ecol Evol. 2019;10:507–517. doi: 10.1111/2041-210X.13145 DOI
Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, et al.. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47:D807–D811. doi: 10.1093/nar/gky1053 PubMed DOI PMC
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, et al.. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020;587:246–251. doi: 10.1038/s41586-020-2871-y PubMed DOI PMC
Hickey G, Paten B, Earl D, Zerbino D, Haussler D. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics. 2013;29:1341–1342. doi: 10.1093/bioinformatics/btt128 PubMed DOI PMC
Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, et al.. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. 2018;28:1029–1038. doi: 10.1101/gr.233460.117 PubMed DOI PMC
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20:110–121. doi: 10.1101/gr.097857.109 PubMed DOI PMC
Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, et al.. Evolutionary constraint and innovation across hundreds of placental mammals. Science. 2023;380:eabn3943. doi: 10.1126/science.abn3943 PubMed DOI PMC
Werner T, Steenwinkel T, Jaenike J. The Encyclopedia of North American Drosophilids Volume 1: Drosophilids of the Midwest and Northeast. Open Access Books. 2018. Available from: https://digitalcommons.mtu.edu/oabooks/1.
Werner T, Steenwinkel T, Jaenike J. The Encyclopedia of North American Drosophilids Volume 2: Drosophilids of the Southeast. Open Access Books. 2020. Available from: https://digitalcommons.mtu.edu/oabooks/3.
Ratnasingham S, Hebert PDN. bold: The Barcode of Life Data System (http://www.barcodinglife.org). Mol Ecol Notes. 2007;7:355–364. doi: 10.1111/j.1471-8286.2007.01678.x PubMed DOI PMC
Weisman CM, Murray AW, Eddy SR. Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes. Curr Biol. 2022;32:2632–2639.e2. doi: 10.1016/j.cub.2022.04.085 PubMed DOI PMC
Shpak M, Ghanavi HR, Lange JD, Pool JE, Stensmyr MC. Genomes from 25 historical Drosophila melanogaster specimens illuminate adaptive and demographic changes across more than 200 years of evolution. bioRxiv. 2023:p. 2023.04.24.538033. doi: 10.1101/2023.04.24.538033 PubMed DOI PMC
Langley CH, Stevens K, Cardeno C, Lee YCG, Schrider DR, Pool JE, et al.. Genomic Variation in Natural Populations of Drosophila melanogaster. Genetics. 2012;192:533–598. doi: 10.1534/genetics.112.142018 PubMed DOI PMC
Machado HE, Bergland AO, O’Brien KR, Behrman EL, Schmidt PS, Petrov DA. Comparative population genomics of latitudinal variation in Drosophila simulans and Drosophila melanogaster. Mol Ecol. 2016;25:723–740. doi: 10.1111/mec.13446 PubMed DOI PMC
Ohta T. Amino acid substitution at the Adh locus of Drosophila is facilitated by small population size. Proc Natl Acad Sci U S A. 1993;90:4548–4551. doi: 10.1073/pnas.90.10.4548 PubMed DOI PMC
Zhao L, Begun DJ. Genomics of parallel adaptation at two timescales in Drosophila. PLoS Genet. 2017;13:e1007016. doi: 10.1371/journal.pgen.1007016 PubMed DOI PMC
Levine MT, Begun DJ. Comparative Population Genetics of the Immunity Gene, Relish: Is Adaptive Evolution Idiosyncratic? PLoS ONE. 2007;2:e442. doi: 10.1371/journal.pone.0000442 PubMed DOI PMC
Rolland J, Henao-Diaz LF, Doebeli M, Germain R, Harmon LJ, Knowles LL, et al.. Conceptual and empirical bridges between micro- and macroevolution. Nat Ecol Evol. 2023;7:1181–1193. doi: 10.1038/s41559-023-02116-7 PubMed DOI
Bushnell B. BBMap. 6 Oct 2022. [cited 2023 Sep 19]. Available from: https://sourceforge.net/projects/bbmap/.
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8 PubMed DOI
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al.. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963 PubMed DOI PMC
Astashyn A, Tvedte ES, Sweeney D, Sapojnikov V, Bouk N, Joukov V, et al.. Rapid and sensitive detection of genome contamination at scale with FCS-GX. bioRxiv. 2023:p. 2023.06.02.543519. doi: 10.1101/2023.06.02.543519 PubMed DOI PMC
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al.. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117 PubMed DOI PMC
Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. In: RepeatMasker Open-4.0. 2015. 2013.
Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, et al.. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23:258. doi: 10.1186/s13059-022-02823-7 PubMed DOI PMC
Mahajan S, Wei KH-C, Nalley MJ, Gibilisco L, Bachtrog D. De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture. PLoS Biol. 2018;16:e2006348. doi: 10.1371/journal.pbio.2006348 PubMed DOI PMC
Gremme G, Steinbiss S, Kurtz S. GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations. IEEE/ACM Trans Comput Biol Bioinform. 2013;10:645–656. doi: 10.1109/TCBB.2013.68 PubMed DOI
Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics. 2016;32:2103–2110. doi: 10.1093/bioinformatics/btw152 PubMed DOI PMC
Shafin K, Pesout T, Chang P-C, Nattestad M, Kolesnikov A, Goel S, et al.. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods. 2021;18:1322–1332. doi: 10.1038/s41592-021-01299-w PubMed DOI PMC
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al.. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J Comput Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021 PubMed DOI PMC
Sturtevant AH, Novitski E. The Homologies of the Chromosome Elements in the Genus Drosophila. Genetics. 1941;26:517–541. doi: 10.1093/genetics/26.5.517 PubMed DOI PMC
Vicoso B, Bachtrog D. Numerous Transitions of Sex Chromosomes in Diptera. PLoS Biol. 2015;13:e1002078. doi: 10.1371/journal.pbio.1002078 PubMed DOI PMC
Van der Auwera GA, O’Connor BD. Genomics in the Cloud. 2020. Available from: https://www.oreilly.com/library/view/genomics-in-the/9781491975183/.
Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. doi: 10.1093/bioinformatics/btn013 PubMed DOI
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010 PubMed DOI PMC
Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–W612. doi: 10.1093/nar/gkl315 PubMed DOI PMC
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al.. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015 PubMed DOI PMC
Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 2011;12:41–51. doi: 10.1093/bib/bbq072 PubMed DOI PMC
Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301 PubMed DOI PMC
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033 PubMed DOI PMC
Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2522. doi: 10.1093/bioinformatics/bts480 PubMed DOI