A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses
Language English Country Great Britain, England Media print-electronic
Document type Journal Article, Review
Grant support
P20 GM103449
NIGMS NIH HHS - United States
EXC2124
Deutsche Forschungsgemeinschaft (DFG)
MOBTP198
European Regional Development Fund and the programme Mobilitas Pluss
Genome Canada and Ontario Genomics
U24 CA248454
NCI NIH HHS - United States
P20GM103449
NIGMS NIH HHS - United States
21-17749S
Grantová Agentura České Republiky
PubMed
37548515
PubMed Central
PMC10847385
DOI
10.1111/1755-0998.13847
Knihovny.cz E-resources
- Keywords
- amplicon data analysis, bioinformatics, environmental DNA, metabarcoding, pipeline, review,
- MeSH
- Data Analysis MeSH
- Archaea genetics classification MeSH
- Bacteria genetics classification MeSH
- DNA, Environmental genetics MeSH
- Eukaryota genetics classification MeSH
- Metagenomics methods MeSH
- Software * MeSH
- DNA Barcoding, Taxonomic * methods MeSH
- Computational Biology * methods MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Publication type
- Journal Article MeSH
- Review MeSH
- Names of Substances
- DNA, Environmental MeSH
Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.
Agroécologie INRAE Institut Agro Univ Bourgogne Franche Comté Dijon France
Aquatic Ecosystem Research University of Duisburg Essen Essen Germany
Department of Biological and Environmental Science University of Jyväskylä Jyväskylä Finland
Earlham Institute Norwich Research Park Norfolk UK
GenPhySE Université de Toulouse INRAE ENVT Castanet Tolosan France
Gut Microbes and Health Quadram Institute Bioscience Norfolk UK
INRAE AgroParisTech GABI Université Paris Saclay Jouy en Josas France
INRAE SIGENAE Jouy en Josas France
Institut de Biologie Intégrative et des Systèmes Université Laval Québec Québec Canada
Institute of Ecology and Earth Sciences University of Tartu Tartu Estonia
Quantitative Biology Center University of Tübingen Tübingen Germany
School of Biological Sciences University of Reading Reading UK
UK Centre for Ecology and Hydrology Oxfordshire UK
Unit of Computational Biology Research and Innovation Centre Fondazione Edmund Mach Italy
Vermont Biomedical Research Network University of Vermont Burlington Vermont USA
Zachary Gold NOAA Pacific Marine Environmental Laboratory Seattle Washington USA
See more in PubMed
Albanese D, Fontana P, De Filippo C, Cavalieri D, & Donati C (2015). MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Scientific Reports, 5(1), 1–7. 10.1038/srep09743 PubMed DOI PMC
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, & Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17), 3389–3402. 10.1093/nar/25.17.3389 PubMed DOI PMC
Andújar C, Creedy TJ, Arribas P, López H, Salces-Castellano A, Pérez-Delgado AJ, Vogler AP, & Emerson BC (2021). Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data. Molecular Ecology Resources, 21(6), 1772–1787. 10.1111/1755-0998.13337 PubMed DOI
Anslan S, & Tedersoo L (2015). Performance of cytochrome c oxidase subunit I (COI), ribosomal DNA Large Subunit (LSU) and Internal Transcribed Spacer 2 (ITS2) in DNA barcoding of Collembola. European Journal of Soil Biology, 69, 1–7. 10.1016/j.ejsobi.2015.04.001 DOI
Anslan S, Bahram M, Hiiesalu I, & Tedersoo L (2017). PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources, 17(6), e234–e240. 10.1111/1755-0998.12692 PubMed DOI
Anslan S, Mikryukov V, Armolaitis K, Ankuda J, Lazdina D, Makovskis K, … & Tedersoo L (2021). Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms. PeerJ, 9, e12254. 10.7717/peerj.12254 PubMed DOI PMC
Anslan S, Nilsson RH, Wurzbacher C, Baldrian P, Tedersoo Leho, & Bahram M (2018). Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding. MycoKeys, (39), 29–40. 10.3897/mycokeys.39.28109 PubMed DOI PMC
Ansorge R, Birolo G, James SA, & Telatin A (2021). Dadaist2: a toolkit to automate and simplify statistical analysis and plotting of metabarcoding experiments. International journal of molecular sciences, 22(10), 5309. 10.3390/ijms22105309 PubMed DOI PMC
Antich A, Palacin C, Wangensteen OS, & Turon X (2021). To denoise or to cluster, that is not the question: Optimizing pipelines for COI metabarcoding and metaphylogeography. BMC Bioinformatics, 22(1), 177. 10.1186/s12859-021-04115-6. PubMed DOI PMC
Antich A, Palacín C, Turon X, & Wangensteen OS (2022). DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets. PeerJ, 10, e12758. 10.7717/peerj.12758 PubMed DOI PMC
Asbun AA, Besseling MA, Balzano S, van Bleijswijk JDL, Witte HJ, Villanueva L, & Engelmann JC (2020). Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results. In Frontiers in Genetics (Vol. 11). 10.3389/fgene.2020.489357 PubMed DOI PMC
Bai J, Jhaney I, & Wells J (2019). Developing a reproducible microbiome data analysis pipeline using the Amazon web services cloud for a cancer research group: proof-of-concept study. JMIR medical informatics, 7(4), e14667. 10.2196/14667 PubMed DOI PMC
Bailet B, Apothéloz-Perret-Gentil L, Baričević A, Chonova T, Franc A, Frigerio JM, … & Kahlert M (2020). Diatom DNA metabarcoding for ecological assessment: Comparison among bioinformatics pipelines used in six European countries reveals the need for standardization. Science of the Total Environment, 745, 140948. 10.1016/j.scitotenv.2020.140948 PubMed DOI
Baloğlu B, Chen Z, Elbrecht V, Braukmann T, MacDonald S, & Steinke D (2021). A workflow for accurate metabarcoding using nanopore MinION sequencing. Methods in Ecology and Evolution, 12(5), 794–804. 10.1111/2041-210X.13561 DOI
Baltrušis P, Halvarsson P, & Höglund J (2022). Estimation of the impact of three different bioinformatic pipelines on sheep nemabiome analysis. Parasites & Vectors, 15(1), 1–12. 10.1186/s13071-022-05399-0 PubMed DOI PMC
Banchi E, Ametrano CG, Greco S, Stanković D, Muggia L, & Pallavicini A (2020). PLANiTS: A curated sequence reference dataset for plant ITS DNA metabarcoding. Database, 2020(baz155). 10.1093/database/baz155 PubMed DOI PMC
Ben-David T, Melamed S, Gerson U, & Morin S (2007). ITS2 sequences as barcodes for identifying and analyzing spider mites (Acari: Tetranychidae). Experimental and Applied Acarology, 41(3), 169–181. 10.1186/s13071-022-05399-0 PubMed DOI
Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, … & Nilsson RH (2013). Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods Ecol Evol. 2013; 4 (10): 914–9. 10.1111/2041-210X.12073 DOI
Bernard M, Rué O, Mariadassou M, & Pascal G (2021). FROGS: a powerful tool to analyse the diversity of fungi with special management of internal transcribed spacers. Briefings in Bioinformatics, 22(6). 10.1093/bib/bbab318 PubMed DOI
Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, … & Gregory Caporaso J (2018). Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome, 6(1), 1–17. 10.1186/s40168-018-0470-z PubMed DOI PMC
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, … Caporaso JG (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37(8), 852–857. 10.1038/s41587-019-0209-9 PubMed DOI PMC
Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, & Coissac E (2016). obitools: a unix-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16(1), 176–182. 10.1111/1755-0998.12428 PubMed DOI
Brandt MI, Trouche B, Quintric L, Günther B, Wincker P, Poulain J, & Arnaud‐Haond S (2021). Bioinformatic pipelines combining denoising and clustering tools allow for more comprehensive prokaryotic and eukaryotic metabarcoding. Molecular Ecology Resources, 21(6), 1904–1921. 10.1111/1755-0998.13398 PubMed DOI
Brown SP, Veach AM, Rigdon-Huss AR, Grond K, Lickteig SK, Lothamer K, … & Jumpponen A (2015). Scraping the bottom of the barrel: are rare high throughput sequences artifacts?. fungal ecology, 13, 221–225.
Bruce K, Blackman RC, Bourlat SJ, Hellström M, Bakker J, Bista I, … & Deiner K (2021). A practical guide to DNA-based methods for biodiversity assessment. 10.3897/ab.e68634 DOI
Buchner D, Macher T-H, & Leese F (2022). APSCALE: advanced pipeline for simple yet comprehensive analyses of DNA metabarcoding data. Bioinformatics , 38(20), 4817–4819. 10.1093/bioinformatics/btac588 PubMed DOI PMC
Callahan BJ, Grinevich D, Thakur S, Balamotis MA, & Yehezkel TB (2021). Ultra-accurate microbial amplicon sequencing with synthetic long reads. Microbiome, 9(1), 130. 10.1186/s40168-021-01072-3 PubMed DOI PMC
Callahan BJ, McMurdie PJ, & Holmes SP (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME journal, 11(12), 2639–2643. 10.1038/ismej.2017.119 PubMed DOI PMC
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, & Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581–583. 10.1038/nmeth.3869 PubMed DOI PMC
Callahan BJ, Wong J, Heiner C, Oh S, Theriot CM, Gulati AS, … & Dougherty MK (2019). High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution. Nucleic acids research, 47(18), e103. 10.1093/nar/gkz569 PubMed DOI PMC
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, … & Knight R (2010). QIIME allows analysis of high-throughput community sequencing data. Nature methods, 7(5), 335–336. 10.1038/nmeth.f.303 PubMed DOI PMC
Carlsen T, Aas AB, Lindner D, Vrålstad T, Schumacher T, & Kauserud H (2012). Don’t make a mista (g) ke: is tag switching an overlooked source of error in amplicon pyrosequencing studies?. Fungal Ecology, 5(6), 747–749.
Carøe C, & Bohmann K (2020). Tagsteady: a metabarcoding library preparation protocol to avoid false assignment of sequences to samples. Molecular Ecology Resources, 20(6), 1620–1631. 10.1111/1755-0998.13227 PubMed DOI
Castaño C, Berlin A, Brandström Durling M, Ihrmark K, Lindahl BD, Stenlid J, … & Olson Å (2020). Optimized metabarcoding with Pacific Biosciences enables semi‐quantitative analysis of fungal communities. New Phytologist, 228(3). 10.1111/nph.16731 PubMed DOI
CBOL Plant Working Group 1, Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, … & Little DP (2009). A DNA barcode for land plants. Proceedings of the National Academy of Sciences, 106(31), 12794–12797. 10.1073/pnas.0905845106 PubMed DOI PMC
Chen S, Zhou Y, Chen Y, & Gu J (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17), i884–i890. 10.1093/bioinformatics/bty560 PubMed DOI PMC
Community, G. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic acids research, 50(W1), W345–W351. 10.1093/nar/gkac247 PubMed DOI PMC
Compson ZG, McClenaghan B, Singer GA, Fahner NA, & Hajibabaei M (2020). Metabarcoding from microbes to mammals: comprehensive bioassessment on a global scale. Frontiers in Ecology and Evolution, 8, 581835. 10.3389/fevo.2020.581835 DOI
Copeland M, Soh J, Puca A, Manning M, & Gollob D (2015). Microsoft azure. New York, NY, USA:: Apress, 3–26.
Couton M, Baud A, Daguin‐Thiébaut C, Corre E, Comtet T, & Viard F (2021). High‐throughput sequencing on preservative ethanol is effective at jointly examining infraspecific and taxonomic diversity, although bioinformatics pipelines do not perform equally. Ecology and evolution, 11(10), 5533–5546. 10.1002/ece3.7453 PubMed DOI PMC
Creedy TJ, Andujar C, Meramveliotakis E, Noguerales V, Overcast I, Papadopoulou A, … & Arribas P (2022). Coming of age for COI metabarcoding of whole organism community DNA: towards bioinformatic harmonisation. Molecular Ecology Resources, 22(3), 847–861. 10.1111/1755-0998.13502 PubMed DOI PMC
Curd EE, Gold Z, Kandlikar GS, Gomer J, Ogden M, O’Connell T, … & Meyer RS (2019). Anacapa Toolkit: An environmental DNA toolkit for processing multilocus metabarcode datasets. Methods in Ecology and Evolution, 10(9), 1469–1475. 10.1111/2041-210X.13214 DOI
De Santiago A, Pereira TJ, Mincks SL, & Bik HM (2022). Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies. Environmental DNA, 4(2), 363–384. 10.1002/edn3.255 DOI
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, & Notredame C (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. 10.1038/nbt.3820 PubMed DOI
Djemiel C, Dequiedt S, Karimi B, Cottin A, Girier T, El Djoudi Y, Wincker P, Lelièvre M, Mondy S, Chemidlin Prévost-Bouré N, Maron P-A, Ranjard L, & Terrat S (2020). BIOCOM-PIPE: a new user-friendly metabarcoding pipeline for the characterization of microbial diversity from 16S, 18S and 23S rRNA gene amplicons. BMC Bioinformatics, 21(1), 492. 10.1186/s12859-020-03829-3 PubMed DOI PMC
Djemiel C, Plassard D, Terrat S, Crouzet O, Sauze J, Mondy S, … & Maron PA (2020). μgreen-db: a reference database for the 23S rRNA gene of eukaryotic plastids and cyanobacteria. Scientific reports, 10(1), 1–11. 10.1038/s41598-020-62555-1 PubMed DOI PMC
Durling MB, Clemmensen KE, Stenlid J, & Lindahl B (2011). SCATA-An efficient bioinformatic pipeline for species identification and quantification after high-throughput sequencing of tagged amplicons.
Edgar RC (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460–2461. 10.1093/bioinformatics/btq461 PubMed DOI
Edgar RC (2016). SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences. biorxiv, 074161. 10.1101/074161 DOI
Edgar RC (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. BioRxiv, 081257. 10.1101/081257 DOI
Edgar RC (2017). Accuracy of microbial community diversity estimated by closed-and open-reference OTUs. PeerJ, 5, e3889. 10.7717/peerj.3889 PubMed DOI PMC
Edgar RC (2018). Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences. PeerJ, 6, e4652. 10.7717/peerj.4652 PubMed DOI PMC
Edgar RC (2018). UNCROSS2: identification of cross-talk in 16S rRNA OTU tables. BioRxiv, 400762. 10.1101/400762 DOI
Edgar RC, & Flyvbjerg H (2015). Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics, 31(21), 3476–3482. 10.1093/bioinformatics/btv401 PubMed DOI
Edgar RC, Haas BJ, Clemente JC, Quince C, & Knight R (2011). UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27(16), 2194–2200. 10.1093/bioinformatics/btr381 PubMed DOI PMC
Elbrecht V, Taberlet P, Dejean T, Valentini A, Usseglio-Polatera P, Beisel JN, … & Leese F (2016). Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects. PeerJ, 4, e1966. 10.7717/peerj.1966 PubMed DOI PMC
Escudié F, Auer L, Bernard M, Mariadassou M, Cauquil L, Vidal K, Maman S, Hernandez-Raquet G, Combes S, Pascal G. FROGS: Find, Rapidly, OTUs with Galaxy Solution. Bioinformatics. 2018. Apr 15;34(8):1287–1294. 10.1093/bioinformatics/btx791 PubMed DOI
Frøslev TG, Kjøller R, Bruun HH, Ejrnæs R, Brunbjerg AK, Pietroni C, & Hansen AJ (2017). Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature communications, 8(1), 1–11. 10.1038/s41467-017-01312-x PubMed DOI PMC
Furneaux B, Bahram M, Rosling A, Yorou NS, & Ryberg M (2021). Long‐and short‐read metabarcoding technologies reveal similar spatiotemporal structures in fungal communities. Molecular Ecology Resources, 21(6), 1833–1849. 10.1111/1755-0998.13387 PubMed DOI
Glassman SI, & Martiny JB (2018). Broadscale ecological patterns are robust to use of exact sequence variants versus operational taxonomic units. MSphere, 3(4), e00148–18. 10.1128/mSphere.00148-18 PubMed DOI PMC
Gold Z, Curd EE, Goodwin KD, Choi ES, Frable BW, Thompson AR, … & Barber PH (2021). Improving metabarcoding taxonomic assignment: A case study of fishes in a large marine ecosystem. Molecular ecology resources, 21(7), 2546–2564. 10.1111/1755-0998.13450 PubMed DOI
González A, Dubut V, Corse E, Mekdad R, Dechatre T, Castet U, … & Meglécz E (2023). VTAM: A robust pipeline for validating metabarcoding data using controls. Computational and Structural Biotechnology Journal. 10.1016/j.csbj.2023.01.034 PubMed DOI PMC
Gweon HS, Oliver A, Taylor J, Booth T, Gibbs M, Read DS, Griffiths RI, & Schonrogge K (2015). PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform. Methods in Ecology and Evolution / British Ecological Society, 6(8), 973–980. 10.1111/2041-210X.12399 PubMed DOI PMC
Hajibabaei M, Shokralla S, Zhou X, Singer GA, & Baird DJ (2011). Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS one, 6(4), e17497. 10.1371/journal.pone.0017497 PubMed DOI PMC
Harrison JP, Chronopoulou PM, Salonen IS, Jilbert T, & Koho KA (2021). 16S and 18S rRNA gene metabarcoding provide congruent information on the responses of sediment communities to eutrophication. Frontiers in Marine Science, 8, 708716. 10.3389/fmars.2021.708716 DOI
Hebert Paul DN, et al. “Biological identifications through DNA barcodes.” Proceedings of the Royal Society of London. Series B: Biological Sciences 270.1512 (2003): 313–321. 10.1098/rspb.2002.2218 PubMed DOI PMC
Heeger F, Bourne EC, Baschien C, Yurkov A, Bunk B, Spröer C, … & Monaghan MT (2018). Long‐read DNA metabarcoding of ribosomal RNA in the analysis of fungi from aquatic environments. Molecular Ecology Resources, 18(6), 1500–1514. 10.1111/1755-0998.12937 PubMed DOI
Hildebrand F, Tadeo R, Voigt AY, Bork P, & Raes J (2014). LotuS: an efficient and user-friendly OTU processing pipeline. Microbiome, 2(1), 1–7. 10.1186/2049-2618-2-30 PubMed DOI PMC
Hleap JS, Littlefair JE, Steinke D, Hebert PD, & Cristescu ME (2021). Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Molecular Ecology Resources, 21(7), 2190–2203. 10.1111/1755-0998.13407 PubMed DOI
Hupfauf S, Etemadi M, Juárez MF-D, Gómez-Brandón M, Insam H, & Podmirseg SM (2020). CoMA – an intuitive and user-friendly pipeline for amplicon-sequencing data analysis. In PLOS ONE (Vol. 15, Issue 12, p. e0243241). 10.1371/journal.pone.0243241 PubMed DOI PMC
Huse SM, Welch DM, Morrison HG, & Sogin ML (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental microbiology, 12(7), 1889–1898. 10.1111/j.1462-2920.2010.02193.x PubMed DOI PMC
Huson DH, Auch AF, Qi J, & Schuster SC (2007). MEGAN analysis of metagenomic data. Genome Research, 17(3), 377–386. 10.1101/gr.5969107 PubMed DOI PMC
Hussain A, & Aleem M (2018). GoCJ: Google cloud jobs dataset for distributed and cloud computing infrastructures. Data, 3(4), 38. 10.3390/data3040038 DOI
Kaehler BD, Bokulich NA, McDonald D, Knight R, Caporaso JG, & Huttley GA (2019). Species abundance information improves sequence taxonomy classification accuracy. Nature communications, 10(1), 4643. 10.1038/s41467-019-12669-6 PubMed DOI PMC
Kang W, Anslan S, Börner N, Schwarz A, Schmidt R, Künzel S, … & Schwalb A (2021). Diatom metabarcoding and microscopic analyses from sediment samples at Lake Nam Co, Tibet: The effect of sample-size and bioinformatics on the identified communities. Ecological Indicators, 121, 107070. 10.1016/j.ecolind.2020.107070 DOI
Knight R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J, Gonzalez A, Kosciolek T, McCall L-I, McDonald D, Melnik AV, Morton JT, Navas J, Quinn RA, Sanders JG, Swafford AD, Thompson LR, Tripathi A, Xu ZZ, … Dorrestein PC (2018). Best practices for analysing microbiomes. Nature Reviews. Microbiology, 16(7), 410–422. 10.1038/s41579-018-0029-9 PubMed DOI
Koster J, & Rahmann S (2012). Snakemake--a scalable bioinformatics workflow engine. In Bioinformatics (Vol. 28, Issue 19, pp. 2520–2522). 10.1093/bioinformatics/bts480 PubMed DOI
Kurtzer GM, Sochat V, & Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PloS one, 12(5), e0177459. 10.1371/journal.pone.0177459 PubMed DOI PMC
Laehnemann D, Borkhardt A, & McHardy AC (2016). Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction. Briefings in bioinformatics, 17(1), 154–179. 10.1093/bib/bbv029 PubMed DOI PMC
Lear G, Dickie I, Banks J, Boyer S, Buckley HL, Buckley TR, … & Holdaway R (2018). Methods for the extraction, storage, amplification and sequencing of DNA from environmental samples. New Zealand Journal of Ecology, 42(1), 10–50A. 10.20417/nzjecol.42.9 DOI
Lindgreen S (2012). AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC research notes, 5(1), 1–7. 10.1186/1756-0500-5-337 PubMed DOI PMC
Liu J, & Zhang H (2021). Combining multiple markers in environmental DNA metabarcoding to assess deep-sea benthic biodiversity. Frontiers in Marine Science, 8, 684955. 10.3389/fmars.2021.684955 DOI
Loos D, Zhang L, Beemelmanns C, Kurzai O, & Panagiotou G (2021). DAnIEL: A User-Friendly Web Server for Fungal ITS Amplicon Sequencing Data. Frontiers in Microbiology, 12, 720513. 10.3389/fmicb.2021.720513 PubMed DOI PMC
Mahé F, Czech L, Stamatakis A, Quince C, de Vargas C, Dunthorn M, & Rognes T (2022). Swarm v3: towards tera-scale amplicon clustering. Bioinformatics, 38(1), 267–269. 10.1093/bioinformatics/btab493 PubMed DOI PMC
Marquina D, Esparza‐Salas R, Roslin T, & Ronquist F (2019). Establishing arthropod community composition using metabarcoding: Surprising inconsistencies between soil samples and preservative ethanol and homogenate from Malaise trap catches. Molecular ecology resources, 19(6), 1516–1530. 10.1111/1755-0998.13071 PubMed DOI PMC
Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal, 17(1), 10–12. 10.14806/ej.17.1.200 DOI
Mathon L, Valentini A, Guérin PE, Normandeau E, Noel C, Lionnet C, … & Manel S (2021). Benchmarking bioinformatic tools for fast and accurate eDNA metabarcoding species identification. Molecular Ecology Resources, 21(7), 2565–2579. 10.1111/1755-0998.13430 PubMed DOI
McGee KM, Robinson CV, & Hajibabaei M (2019). Gaps in DNA-based biomonitoring across the globe. Frontiers in Ecology and Evolution, 7, 337. 10.3389/fevo.2019.00337 DOI
Mikryukov V, Anslan S, Tedersoo L NextITS: a pipeline for metabarcoding fungi and other eukaryotes with full-length ITS sequenced with PacBio. https://github.com/vmikk/NextITS
Minerovic AD, Potapova MG, Sales CM, Price JR, & Enache MD (2020). 18S-V9 DNA metabarcoding detects the effect of water-quality impairment on stream biofilm eukaryotic assemblages. Ecological Indicators, 113, 106225. 10.1016/j.ecolind.2020.106225 DOI
Miya M, Gotoh RO, & Sado T (2020). MiFish metabarcoding: a high-throughput approach for simultaneous detection of multiple fish species from environmental DNA and other samples. Fisheries Science, 86(6), 939–970. 10.1007/s12562-020-01461-x DOI
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, & Köster J (2021). Sustainable data analysis with Snakemake. F1000Research, 10, 33. 10.12688/f1000research.29032.2 PubMed DOI PMC
Mousavi‐Derazmahalleh M, Stott A, Lines R, Peverley G, Nester G, Simpson T, … & Christophersen CT (2021). eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity. Molecular Ecology Resources, 21(5), 1697–1704. 10.1111/1755-0998.13356 PubMed DOI
Nearing JT, Douglas GM, Comeau AM, & Langille MG (2018). Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches. PeerJ, 6, e5364. 10.7717/peerj.5364 PubMed DOI PMC
Nilsson RH, Anslan S, Bahram M, Wurzbacher C, Baldrian P, & Tedersoo L (2019). Mycobiome diversity: high-throughput sequencing and identification of fungi. Nature Reviews. Microbiology, 17(2), 95–109. 10.1038/s41579-018-0116-y PubMed DOI
Nilsson RH, Wurzbacher C, Bahram M, Coimbra VR, Larsson E, Tedersoo L, … & Abarenkov K (2016). Top 50 most wanted fungi. MycoKeys, (12), 29–40. 10.3897/mycokeys.12.7553 DOI
Özkurt E, Fritscher J, Soranzo N, Ng DYK, Davey RP, Bahram M, & Hildebrand F (2022). LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis. Microbiome, 10(1), 176. 10.1186/s40168-022-01365-1 PubMed DOI PMC
Palmer JM, Jusino MA, Banik MT, & Lindner DL (2018). Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data. PeerJ, 6, e4925. 10.7717/peerj.4925 PubMed DOI PMC
Pauvert C, Buee M, Laval V, Edel-Hermann V, Fauchery L, Gautier A, … & Vacher C (2019). Bioinformatics matters: The accuracy of plant and soil fungal community data is highly dependent on the metabarcoding pipeline. Fungal Ecology, 41, 23–33. 10.1016/j.funeco.2019.03.005 DOI
Plummer E, Twin J, Bulach DM, Garland SM, & Tabrizi SN (2015). A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. Journal of Proteomics & Bioinformatics, 8(12), 283–291. 10.3389/fmicb.2020.01262 DOI
Pollock J, Glendinning L, Wisedchanwet T, & Watson M (2018). The Madness of Microbiome: Attempting To Find Consensus “Best Practice” for 16S Microbiome Studies. Applied and Environmental Microbiology, 84(7). 10.1128/AEM.02627-17 PubMed DOI PMC
Porter TM, & Hajibabaei M (2018). Automated high throughput animal CO1 metabarcode classification. Scientific Reports, 8(1), 4226. 10.1038/s41598-018-22505-4 PubMed DOI PMC
Porter TM, & Hajibabaei M (2020). Putting COI metabarcoding in context: The utility of exact sequence variants (ESVs) in biodiversity analysis. Frontiers in Ecology and Evolution, 8, 248. 10.3389/fevo.2020.00248 DOI
Porter TM, & Hajibabaei M (2021). Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC bioinformatics, 22(1), 1–20. 10.1186/s12859-021-04180-x PubMed DOI PMC
Porter TM, & Hajibabaei M (2022). MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PloS One, 17(9), e0274260. 10.1371/journal.pone.0274260 PubMed DOI PMC
Prodan A, Tremaroli V, Brolin H, Zwinderman AH, Nieuwdorp M, & Levin E (2020). Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing. PLoS One, 15(1), e0227434. 10.1371/journal.pone.0227434 PubMed DOI PMC
Ratnasingham S, & Hebert PD (2007). BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Molecular ecology notes, 7(3), 355–364. 10.1111/j.1471-8286.2007.01678.x PubMed DOI PMC
Reeder J, & Knight R (2009). The’rare biosphere’: a reality check. Nature methods, 6(9), 636–637. 10.1038/nmeth0909-636 PubMed DOI
Reitmeier S, Hitch TC, Treichel N, Fikas N, Hausmann B, Ramer-Tait AE, … & Clavel T (2021). Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling. ISME Communications, 1(1), 1–12. 10.1038/s43705-021-00033-z PubMed DOI PMC
Richardson RT, Bengtsson‐Palme J, & Johnson RM (2017). Evaluating and optimizing the performance of software commonly used for the taxonomic classification of DNA metabarcoding sequence data. Molecular Ecology Resources, 17(4), 760–769. 10.1111/1755-0998.12628 PubMed DOI
Rimet F, Gusev E, Kahlert M, Kelly MG, Kulikovskiy M, Maltsev Y, … & Bouchez A (2019). Diat. barcode, an open-access curated barcode library for diatoms. Scientific Reports, 9(1), 15116. 10.1038/s41598-019-51500-6 PubMed DOI PMC
Rivers AR, Weber KC, Gardner TG, Liu S, & Armstrong SD (2018). ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis. F1000Research, 7. 10.12688/f1000research.15704.1 PubMed DOI PMC
Rodriguez‐Martinez S, Klaminder J, Morlock MA, Dalén L, & Huang DT (2022). The topological nature of tag jumping in environmental DNA metabarcoding studies. Molecular Ecology Resources. 10.1111/1755-0998.13745 PubMed DOI
Rognes T, Flouri T, Nichols B, Quince C, & Mahé F (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ, 4, e2584. 10.7717/peerj.2584 PubMed DOI PMC
Rosen GL, Reichenberger ER, & Rosenfeld AM (2011). NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics, 27(1), 127–129. 10.1093/bioinformatics/btq619 PubMed DOI PMC
Sato M, Sugaya N, Murakami H, Imaizumi A, Aburatani S, Akutsu T, & Horimoto K (2004). Remote homolog detection by match-node profile in hidden Markov model. In Callaos N, Horimoto K, Chen J, & Chan AKS (Eds.), 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol Vii, Proceedings: Applications of Informatics and Cybernetics in Science and Engineering (pp. 27–34). Int Inst Informatics & Systemics. http://www.webofscience.com/wos/alldb/full-record/WOS:000227682900005
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, … & Weber CF (2009). Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology, 75(23), 7537–7541. 10.1128/AEM.01541-09 PubMed DOI PMC
Schnell IB, Bohmann K, & Gilbert MTP (2015). Tag jumps illuminated–reducing sequence‐to‐sample misidentifications in metabarcoding studies. Molecular ecology resources, 15(6), 1289–1303. 10.1111/1755-0998.12402 PubMed DOI
Singer GAC, Fahner NA, Barnes JG, McCarthy A, & Hajibabaei M (2019). Comprehensive biodiversity analysis via ultra-deep patterned flow cell technology: a case study of eDNA metabarcoding seawater. Scientific reports, 9(1), 5991. 10.1038/s41598-019-42455-9 PubMed DOI PMC
Song H, Buhay JE, Whiting MF, & Crandall KA (2008). Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified. Proceedings of the National Academy of Sciences of the United States of America, 105(36), 13486–13491. 10.1073/pnas.0803076105 PubMed DOI PMC
Staats M, Arulandhu AJ, Gravendeel B, Holst-Jensen A, Scholtens I, Peelen T, Prins TW, & Kok E (2016). Advances in DNA metabarcoding for food and wildlife forensic species identification. Analytical and Bioanalytical Chemistry, 408(17), 4615–4630. 10.1007/s00216-016-9595-8 PubMed DOI PMC
Straub D, Blackwell N, Langarica-Fuentes A, Peltzer A, Nahnsen S, & Kleindienst S (2020). Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline. Frontiers in Microbiology, 11, 550420. 10.3389/fmicb.2020.550420 PubMed DOI PMC
Taberlet P, Bonin A, Zinger L, & Coissac E (2018). Environmental DNA: For biodiversity research and monitoring. Oxford University Press. 10.1093/oso/9780198767220.001.0001 DOI
Taberlet P, Coissac E, Hajibabaei M, & Rieseberg LH (2012). Environmental dna. Molecular ecology, 21(8), 1789–1793. 10.1111/j.1365-294X.2012.05542.x PubMed DOI
Taberlet P, Coissac E, Pompanon F, Brochmann C, & Willerslev E (2012). Towards next‐generation biodiversity assessment using DNA metabarcoding. Molecular ecology, 21(8), 2045–2050. 10.1111/j.1365-294X.2012.05470.x PubMed DOI
Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, … & Willerslev E (2007). Power and limitations of the chloroplast trn L (UAA) intron for plant DNA barcoding. Nucleic acids research, 35(3), e14–e14. 10.1093/nar/gkl938 PubMed DOI PMC
Tedersoo L, & Anslan S (2019). Towards PacBio‐based pan‐eukaryote metabarcoding using full‐length ITS sequences. Environmental Microbiology Reports, 11(5), 659–668. 10.1111/1758-2229.12776 PubMed DOI
Tedersoo L, Albertsen M, Anslan S, & Callahan B (2021). Perspectives and benefits of high-throughput long-read sequencing in microbial ecology. Applied and environmental microbiology, 87(17), e00626–21. 10.1128/AEM.00626-21 PubMed DOI PMC
Tedersoo L, Bahram M, Zinger L, Nilsson RH, Kennedy PG, Yang T, … & Mikryukov V (2022). Best practices in metabarcoding of fungi: From experimental design to results. Molecular ecology, 31(10), 2769–2795. 10.1111/mec.16460 PubMed DOI
Terrat S, Djemiel C, Journay C, Karimi B, Dequiedt S, Horrigue W, … & Ranjard L (2020). ReClustOR: a re‐clustering tool using an open‐reference method that improves operational taxonomic unit definition. Methods in Ecology and Evolution, 11(1), 168–180. 10.1111/2041-210X.13316 DOI
Thompson LR, Anderson SR, Den Uyl PA, Patin NV, Lim SJ, Sanderson G, & Goodwin KD (2022). Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake. GigaScience, 11. giac066. 10.1093/gigascience/giac066 PubMed DOI PMC
Thomsen PF, & Sigsgaard EE (2019). Environmental DNA metabarcoding of wild flowers reveals diverse communities of terrestrial arthropods. Ecology and evolution, 9(4), 1665–1679. 10.1002/ece3.4809 PubMed DOI PMC
Vasar M, Davison J, Neuenkamp L, Sepp S-K, Young JPW, Moora M, & Öpik M (2021). User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences. Molecular Ecology Resources, 21(4), 1380–1392. 10.1111/1755-0998.13340 PubMed DOI
Vetrovský T, Baldrian P, & Morais D (2018). SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses. Bioinformatics , 34(13), 2292–2294. 10.1093/bioinformatics/bty071 PubMed DOI PMC
Vu D, Nilsson RH, & Verkley GJM (2022). Dnabarcoder: An open-source software package for analysing and predicting DNA sequence similarity cutoffs for fungal sequence identification. Molecular Ecology Resources. 10.1111/1755-0998.13651 PubMed DOI PMC
Wang Q, Garrity GM, Tiedje JM, & Cole JR (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology, 73(16), 5261–5267. 10.1128/AEM.00062-07 PubMed DOI PMC
Weigand H, Beermann AJ, Čiampor F, Costa FO, Csabai Z, Duarte S, … & Ekrem T (2019). DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. Science of the Total Environment, 678, 499–524. 10.1016/j.scitotenv.2019.04.247 PubMed DOI
Westfall KM, Therriault TW, & Abbott CL (2020). A new approach to molecular biosurveillance of invasive species using DNA metabarcoding. Global Change Biology, 26(2), 1012–1022. 10.1111/gcb.14886 PubMed DOI
Wratten L, Wilm A, & Göke J (2021). Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature methods, 18(10), 1161–1168. 10.1038/s41592-021-01254-9 PubMed DOI
Zafeiropoulos H, Gargan L, Hintikka S, Pavloudi C, & Carlsson J (2021). The Dark mAtteR iNvestigator (DARN) tool: getting to know the known unknowns in COI amplicon data. Metabarcoding and Metagenomics, 5, e69657. 10.3897/mbmg.5.69657 DOI
Zafeiropoulos H, Viet HQ, Vasileiadou K, Potirakis A, Arvanitidis C, Topalis P, … & Pafilis E (2020). PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes. GigaScience, 9(3), giaa022. 10.1093/gigascience/giaa022 PubMed DOI PMC
Zinger L, Lionnet C, Benoiston AS, Donald J, Mercier C, & Boyer F (2021). metabaR: an R package for the evaluation and improvement of DNA metabarcoding data quality. Methods in Ecology and Evolution, 12(4), 586–592. 10.1111/2041-210X.13552 DOI