Nejvíce citovaný článek - PubMed ID 19182786
The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.
- Klíčová slova
- secapr, de novo assembly, loci extraction, low-coverage whole genome sequencing, target sequence capture,
- MeSH
- fylogeneze MeSH
- genom * MeSH
- sekvenování celého genomu MeSH
- výpočetní biologie * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
- MeSH
- benchmarking MeSH
- genom lidský * MeSH
- genomika MeSH
- individualizovaná medicína MeSH
- lidé MeSH
- nádorové buněčné linie MeSH
- nádory genetika MeSH
- sekvenování celého genomu * MeSH
- sekvenování exomu * MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
The Asiatic wild dog (Cuon alpinus), restricted today largely to South and Southeast Asia, was widespread throughout Eurasia and even reached North America during the Pleistocene. Like many other species, it suffered from a huge range loss towards the end of the Pleistocene and went extinct in most of its former distribution. The fossil record of the dhole is scattered and the identification of fossils can be complicated by an overlap in size and a high morphological similarity between dholes and other canid species. We generated almost complete mitochondrial genomes for six putative dhole fossils from Europe. By using three lines of evidence, i.e., the number of reads mapping to various canid mitochondrial genomes, the evaluation and quantification of the mapping evenness along the reference genomes and phylogenetic analysis, we were able to identify two out of six samples as dhole, whereas four samples represent wolf fossils. This highlights the contribution genetic data can make when trying to identify the species affiliation of fossil specimens. The ancient dhole sequences are highly divergent when compared to modern dhole sequences, but the scarcity of dhole data for comparison impedes a more extensive analysis.
- Klíčová slova
- Cuon alpinus, ancient DNA, canids, dhole, hybridisation capture, mitogenome,
- MeSH
- Canidae anatomie a histologie klasifikace genetika MeSH
- fylogeneze * MeSH
- genom mitochondriální MeSH
- hybridizace genetická MeSH
- migrace zvířat MeSH
- mitochondriální DNA MeSH
- starobylá DNA * MeSH
- zkameněliny MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Evropa MeSH
- Názvy látek
- mitochondriální DNA MeSH
- starobylá DNA * MeSH
High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing effort on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing coverage. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth. Moreover, it has proven to produce powerful, large multi-locus DNA sequence datasets suitable for phylogenetic analyses. However, target capture requires careful considerations, which may greatly affect the success of experiments. Here we provide a simple flowchart for designing phylogenomic target capture experiments. We discuss necessary decisions from the identification of target loci to the final bioinformatic processing of sequence data. We outline challenges and solutions related to the taxonomic scope, sample quality, and available genomic resources of target capture projects. We hope this review will serve as a useful roadmap for designing and carrying out successful phylogenetic target capture studies.
- Klíčová slova
- Hyb-Seq, Illumina, NGS, anchored enrichment, bait, high throughput sequencing, molecular phylogenetics, probe,
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
BACKGROUND: Genomic selection (GS) can offer unprecedented gains, in terms of cost efficiency and generation turnover, to forest tree selective breeding; especially for late expressing and low heritability traits. Here, we used: 1) exome capture as a genotyping platform for 1372 Douglas-fir trees representing 37 full-sib families growing on three sites in British Columbia, Canada and 2) height growth and wood density (EBVs), and deregressed estimated breeding values (DEBVs) as phenotypes. Representing models with (EBVs) and without (DEBVs) pedigree structure. Ridge regression best linear unbiased predictor (RR-BLUP) and generalized ridge regression (GRR) were used to assess their predictive accuracies over space (within site, cross-sites, multi-site, and multi-site to single site) and time (age-age/ trait-trait). RESULTS: The RR-BLUP and GRR models produced similar predictive accuracies across the studied traits. Within-site GS prediction accuracies with models trained on EBVs were high (RR-BLUP: 0.79-0.91 and GRR: 0.80-0.91), and were generally similar to the multi-site (RR-BLUP: 0.83-0.91, GRR: 0.83-0.91) and multi-site to single-site predictive accuracies (RR-BLUP: 0.79-0.92, GRR: 0.79-0.92). Cross-site predictions were surprisingly high, with predictive accuracies within a similar range (RR-BLUP: 0.79-0.92, GRR: 0.78-0.91). Height at 12 years was deemed the earliest acceptable age at which accurate predictions can be made concerning future height (age-age) and wood density (trait-trait). Using DEBVs reduced the accuracies of all cross-validation procedures dramatically, indicating that the models were tracking pedigree (family means), rather than marker-QTL LD. CONCLUSIONS: While GS models' prediction accuracies were high, the main driving force was the pedigree tracking rather than LD. It is likely that many more markers are needed to increase the chance of capturing the LD between causal genes and markers.
- Klíčová slova
- Douglas-fir, Exome capture, Full-sib families, Genomic selection, Genotype x environment interaction, Predictive model,
- MeSH
- dřevo chemie genetika MeSH
- exom * MeSH
- genomika MeSH
- genotyp MeSH
- lineární modely MeSH
- lokus kvantitativního znaku MeSH
- modely genetické * MeSH
- Pseudotsuga genetika růst a vývoj MeSH
- selekce (genetika) * MeSH
- šlechtění rostlin * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH