pipeline
Dotaz
Zobrazit nápovědu
BACKGROUND: High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. RESULTS: Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. CONCLUSIONS: ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totem.software .
Unique molecular identifiers (UMIs) show outstanding performance in targeted high-throughput resequencing, being the most promising approach for the accurate identification of rare variants in complex DNA samples. This approach has application in multiple areas, including cancer diagnostics, thus demanding dedicated software and algorithms. Here we introduce MAGERI, a computational pipeline that efficiently handles all caveats of UMI-based analysis to obtain high-fidelity mutation profiles and call ultra-rare variants. Using an extensive set of benchmark datasets including gold-standard biological samples with known variant frequencies, cell-free DNA from tumor patient blood samples and publicly available UMI-encoded datasets we demonstrate that our method is both robust and efficient in calling rare variants. The versatility of our software is supported by accurate results obtained for both tumor DNA and viral RNA samples in datasets prepared using three different UMI-based protocols.
- MeSH
- databáze genetické MeSH
- lidé MeSH
- nádorové biomarkery krev genetika MeSH
- nádory genetika MeSH
- RNA virová genetika MeSH
- sekvenční analýza DNA metody MeSH
- sekvenční analýza RNA metody MeSH
- software * MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
- Publikační typ
- časopisecké články MeSH
The serious spread of antibiotic-resistant Staphylococcal aureus strains is alarming. This is reflected by the measures governments and health-related bodies are offering to ease antibiotic drug development. Finding new active agents, preferably with novel mechanism of action, or even finding new targets for drug development are essential. In this review, we summarize the current status of novel antistaphylococcal agents undergoing clinical trials. We mainly discuss antistaphylococcal small molecules and peptides in the text with a special focus on their chemistry, while antistaphylococcal immunotherapy (antibodies) are mentioned in a summative table. This review shall serve as a summary that influences future synthetic efforts in the antistaphyloccocals development field.
BACKGROUND: Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. RESULTS: We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification. CONCLUSIONS: Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.
- MeSH
- genetické markery genetika MeSH
- genom lidský MeSH
- geny vázané na chromozom X * MeSH
- geny vázané na chromozom Y * MeSH
- jednonukleotidový polymorfismus genetika MeSH
- lidé MeSH
- polymerázová řetězová reakce MeSH
- RNA genetika MeSH
- sekvenční analýza RNA metody MeSH
- software * MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols. RESULTS: We have developed a web-based application allowing direct upload of sequences from multiple virome samples using custom parameters. The samples are then processed in parallel using an identical protocol, and can be easily reanalyzed. The pipeline performs de-novo assembly, taxonomic classification of viruses as well as sample analyses based on user-defined grouping categories. Tables of virus abundance are produced from cross-validation by remapping the sequencing reads to a union of all observed reference viruses. In addition, read sets and reports are created after processing unmapped reads against known human and bacterial ribosome references. Secured interactive results are dynamically plotted with population and diversity charts, clustered heatmaps and a sortable and searchable abundance table. CONCLUSIONS: The Vipie web application is a unique tool for multi-sample metagenomic analysis of viral data, producing searchable hits tables, interactive population maps, alpha diversity measures and clustered heatmaps that are grouped in applicable custom sample categories. Known references such as human genome and bacterial ribosomal genes are optionally removed from unmapped ('dark matter') reads. Secured results are accessible and shareable on modern browsers. Vipie is a freely available web-based tool whose code is open source.
- MeSH
- genetická variace MeSH
- genomika metody MeSH
- internet * MeSH
- lidé MeSH
- mikrobiota genetika MeSH
- software * MeSH
- viry genetika MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within-population variation. Additionally, a public Illumina data set was used to validate the pipeline on community-level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within-population structure but also the successful application of the QRS pipeline on Illumina-generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences.
- MeSH
- Daphnia parazitologie MeSH
- genetická variace * MeSH
- Mesomycetozoea klasifikace genetika MeSH
- mezerníky ribozomální DNA chemie genetika MeSH
- sekvenční analýza DNA * MeSH
- software MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- hodnotící studie MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
Celiac disease is a chronic gastrointestinal autoimmune disease that occurs in approximately 0.5-1% of the U.S. population. At this point in time there is no cure nor therapeutic option available for patients with CD. The only treatment option is a 100% strict adherence to a gluten free diet. However, even complete adherence may not be enough as some foods might contain cross contaminated gluten. The purpose of this study was to analyze and describe different therapy options currently being investigated for potential use in patients with CD. An analysis of ClinicalTrials.gov was performed with the search term “Celiac Disease”. The search returned 192 results, however only 8 were pharmacologic treatment options. The pharmacologic therapies located were TIMP-GLIA, ALV003, AMG714, pancrelipase, Nexvax2, RO5459072, Hu-Mik Beta-1, and Necator americanus (Na). With the nonexistent treatment options currently available, further research needs to be completed to create new therapeutic options for patients with CD with the goal of ultimately curing patients.
- MeSH
- celiakie * farmakoterapie MeSH
- klinická studie jako téma MeSH
- lidé MeSH
- vyvíjení léků MeSH
- Check Tag
- lidé MeSH
Horizontal gene transfer (HGT) is a key driver in the evolution of bacterial genomes. The acquisition of genes mediated by HGT may enable bacteria to adapt to ever-changing environmental conditions. Long-term application of antibiotics in intensive agriculture is associated with the dissemination of antibiotic resistance genes among bacteria with the consequences causing public health concern. Commensal farm-animal-associated gut microbiota are considered the reservoir of the resistance genes. Therefore, in this study, we identified known and not-yet characterized mobilized genes originating from chicken and porcine fecal samples using our innovative pipeline followed by network analysis to provide appropriate visualization to support proper interpretation.
- MeSH
- antibakteriální látky MeSH
- Bacteria genetika MeSH
- bakteriální geny MeSH
- genom bakteriální MeSH
- mikrobiota * MeSH
- prasata MeSH
- přenos genů horizontální * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: Recurrent or primary advanced metastatic cervical cancer (R/M CC) has a poor prognosis with a 5-year-survival rate of 16.5%, demanding novel and improved therapies for the treatment of these patients. The first-line standard of care for R/M CC now benefits from the addition of the immune checkpoint inhibitor, pembrolizumab, to platinum-based chemotherapy with paclitaxel and bevacizumab. Additionally, new options for second-line treatment have become available in recent years. AREAS COVERED: Here, we review current investigational drugs and discuss their relative targets, efficacies, and potential within the R/M CC treatment landscape. This review will focus on recently published data and key ongoing clinical trials in patients with R/M CC, covering multiple modes of action, including immunotherapies, antibody-drug conjugates, and tyrosine kinase inhibitors. We searched clinicaltrials.gov for ongoing trials and pubmed.ncbi.nih.gov for recently published trial data, as well as recent years' proceedings from the annual conferences of the American Society of Clinical Oncology (ASCO), European Society for Medical Oncology (ESMO), European Society of Gynaecological Oncology (ESGO), and the International Gynecologic Cancer Society (IGCS). EXPERT OPINION: Therapeutics currently attracting attention include novel immune checkpoint inhibitors, therapeutic vaccinations, antibody-drug conjugates, such as tisotumab vedotin, tyrosine kinase inhibitors targeting HER2, and multitarget synergistic combinations.