Most cited article - PubMed ID 30271256
Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding
Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.
- Keywords
- amplicon data analysis, bioinformatics, environmental DNA, metabarcoding, pipeline, review,
- MeSH
- Data Analysis MeSH
- Archaea genetics classification MeSH
- Bacteria genetics classification MeSH
- DNA, Environmental genetics MeSH
- Eukaryota genetics classification MeSH
- Metagenomics methods MeSH
- Software * MeSH
- DNA Barcoding, Taxonomic * methods MeSH
- Computational Biology * methods MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Publication type
- Journal Article MeSH
- Review MeSH
- Names of Substances
- DNA, Environmental MeSH
Fungi are key players in vital ecosystem services, spanning carbon cycling, decomposition, symbiotic associations with cultivated and wild plants and pathogenicity. The high importance of fungi in ecosystem processes contrasts with the incompleteness of our understanding of the patterns of fungal biogeography and the environmental factors that drive those patterns. To reduce this gap of knowledge, we collected and validated data published on the composition of soil fungal communities in terrestrial environments including soil and plant-associated habitats and made them publicly accessible through a user interface at https://globalfungi.com . The GlobalFungi database contains over 600 million observations of fungal sequences across > 17 000 samples with geographical locations and additional metadata contained in 178 original studies with millions of unique nucleotide sequences (sequence variants) of the fungal internal transcribed spacers (ITS) 1 and 2 representing fungal species and genera. The study represents the most comprehensive atlas of global fungal distribution, and it is framed in such a way that third-party data addition is possible.
Recent DNA-based studies have shown that the built environment is surprisingly rich in fungi. These indoor fungi - whether transient visitors or more persistent residents - may hold clues to the rising levels of human allergies and other medical and building-related health problems observed globally. The taxonomic identity of these fungi is crucial in such pursuits. Molecular identification of the built mycobiome is no trivial undertaking, however, given the large number of unidentified, misidentified, and technically compromised fungal sequences in public sequence databases. In addition, the sequence metadata required to make informed taxonomic decisions - such as country and host/substrate of collection - are often lacking even from reference and ex-type sequences. Here we report on a taxonomic annotation workshop (April 10-11, 2017) organized at the James Hutton Institute/University of Aberdeen (UK) to facilitate reproducible studies of the built mycobiome. The 32 participants went through public fungal ITS barcode sequences related to the built mycobiome for taxonomic and nomenclatural correctness, technical quality, and metadata availability. A total of 19,508 changes - including 4,783 name changes, 14,121 metadata annotations, and the removal of 99 technically compromised sequences - were implemented in the UNITE database for molecular identification of fungi (https://unite.ut.ee/) and shared with a range of other databases and downstream resources. Among the genera that saw the largest number of changes were Penicillium, Talaromyces, Cladosporium, Acremonium, and Alternaria, all of them of significant importance in both culture-based and culture-independent surveys of the built environment.
- Keywords
- Indoor mycobiome, built environment, fungi, metadata, molecular identification, open data, sequence annotation, systematics, taxonomy,
- Publication type
- Journal Article MeSH