The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
- MeSH
- Data Analysis MeSH
- Benchmarking * MeSH
- Proteins MeSH
- Proteomics * MeSH
- Workflow MeSH
- Software MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
- MeSH
- Benchmarking * MeSH
- Datasets as Topic MeSH
- Humans MeSH
- Mutation MeSH
- DNA Mutational Analysis standards MeSH
- Cell Line, Tumor MeSH
- Breast Neoplasms genetics MeSH
- Reference Standards MeSH
- Reproducibility of Results MeSH
- Whole Genome Sequencing standards MeSH
- High-Throughput Nucleotide Sequencing standards MeSH
- Germ Cells MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Microscale thermophoresis (MST), and the closely related Temperature Related Intensity Change (TRIC), are synonyms for a recently developed measurement technique in the field of biophysics to quantify biomolecular interactions, using the (capillary-based) NanoTemper Monolith and (multiwell plate-based) Dianthus instruments. Although this technique has been extensively used within the scientific community due to its low sample consumption, ease of use, and ubiquitous applicability, MST/TRIC has not enjoyed the unambiguous acceptance from biophysicists afforded to other biophysical techniques like isothermal titration calorimetry (ITC) or surface plasmon resonance (SPR). This might be attributed to several facts, e.g., that various (not fully understood) effects are contributing to the signal, that the technique is licensed to only a single instrument developer, NanoTemper Technology, and that its reliability and reproducibility have never been tested independently and systematically. Thus, a working group of ARBRE-MOBIEU has set up a benchmark study on MST/TRIC to assess this technique as a method to characterize biomolecular interactions. Here we present the results of this study involving 32 scientific groups within Europe and two groups from the US, carrying out experiments on 40 Monolith instruments, employing a standard operation procedure and centrally prepared samples. A protein-small molecule interaction, a newly developed protein-protein interaction system and a pure dye were used as test systems. We characterized the instrument properties and evaluated instrument performance, reproducibility, the effect of different analysis tools, the influence of the experimenter during data analysis, and thus the overall reliability of this method.
- MeSH
- Benchmarking * MeSH
- Calorimetry MeSH
- Laboratories * MeSH
- Reproducibility of Results MeSH
- Temperature MeSH
- Publication type
- Journal Article MeSH
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
- MeSH
- Benchmarking MeSH
- Genome, Human * MeSH
- Genomics MeSH
- Precision Medicine MeSH
- Humans MeSH
- Cell Line, Tumor MeSH
- Neoplasms genetics MeSH
- Whole Genome Sequencing * MeSH
- Exome Sequencing * MeSH
- Computational Biology MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Dataset MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
The semiempirical quantum mechanical (SQM) methods used in drug design are commonly parametrized and tested on data sets of systems that may not be representative models for drug-biomolecule interactions in terms of both size and chemical composition. This is addressed here with a new benchmark data set, PLF547, derived from protein-ligand complexes, consisting of complexes of ligands with protein fragments (such as amino-acid side chains), with interaction energies based on MP2-F12 and DLPNO-CCSD(T) calculations. From these, composite benchmark interaction energies are also built for complexes of the ligand with the complete active site of the protein (PLA15 data set). These data sets are used to test multiple SQM methods with corrections for noncovalent interactions; the role of the solvation model in the calculations is tested as well.
BACKGROUND: A minority of European countries have participated in international comparisons with high level data on lung cancer. However, the nature and extent of data collection across the continent is simply unknown, and without accurate data collection it is not possible to compare practice and set benchmarks to which lung cancer services can aspire. METHODS: Using an established network of lung cancer specialists in 37 European countries, a survey was distributed in December 2014. The results relate to current practice in each country at the time, early 2015. The results were compiled and then verified with co-authors over the following months. RESULTS: Thirty-five completed surveys were received which describe a range of current practice for lung cancer data collection. Thirty countries have data collection at the national level, but this is not so in Albania, Bosnia-Herzegovina, Italy, Spain and Switzerland. Data collection varied from paper records with no survival analysis, to well-established electronic databases with links to census data and survival analyses. CONCLUSION: Using a network of committed clinicians, we have gathered validated comparative data reporting an observed difference in data collection mechanisms across Europe. We have identified the need to develop a well-designed dataset, whilst acknowledging what is feasible within each country, and aspiring to collect high quality data for clinical research.
- MeSH
- Databases, Factual statistics & numerical data MeSH
- Medical Oncology methods statistics & numerical data MeSH
- Humans MeSH
- Lung Neoplasms diagnosis therapy MeSH
- Data Collection methods statistics & numerical data MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Europe MeSH
A new database of nucleic acid base trimers has been developed that includes 141 geometries and stabilization energies obtained at the RI-MP2 level of theory with the TZVPP basis set. Compared to previously compiled biologically oriented databases, this new construct includes considerably more complicated structures; the various intermolecular interactions in the trimers are quite heterogeneous and in particular include simultaneous hydrogen bonding and stacking interactions, which is similar to the situation in actual biopolymers. Validation against these benchmark data is therefore a more demanding task for approximate models, since correct descriptions of all energy terms are unlikely to be accomplished by fortuitous cancellations of systematic errors. The density functionals TPSS (both with and without an empirical dispersion term), PWB6K, M05-2X, and BH&H, and the self-consistent charge density functional tight binding method augmented with an empirical dispersion term (SCC-DFTB-D) were assessed for their abilities accurately to compute structures and energies. The best reproduction of the BSSE corrected RI-MP2 stabilization energies was achieved by the TPSS functional (TZVPP basis set) combined with empirical dispersion; removal of the dispersion correction leads to significantly degraded performance. The M05-2X and PWB6K functionals performed very well in reproducing the RI-MP2 geometries, but showed a systematic moderate underestimation of the magnitude of base stacking interactions. The SCC-DFTB-D method predicts geometries in fair agreement with RI-MP2; given its computational efficiency it represents a good option for initial scanning of analogous biopolymeric potential energy surfaces. BH&H gives geometries of comparable quality to the other functionals but significantly overestimates interaction energies other than stacking.
Non-target analysis (NTA) employing high-resolution mass spectrometry is a commonly applied approach for the detection of novel chemicals of emerging concern in complex environmental samples. NTA typically results in large and information-rich datasets that require computer aided (ideally automated) strategies for their processing and interpretation. Such strategies do however raise the challenge of reproducibility between and within different processing workflows. An effective strategy to mitigate such problems is the implementation of inter-laboratory studies (ILS) with the aim to evaluate different workflows and agree on harmonized/standardized quality control procedures. Here we present the data generated during such an ILS. This study was organized through the Norman Network and included 21 participants from 11 countries. A set of samples based on the passive sampling of drinking water pre and post treatment was shipped to all the participating laboratories for analysis, using one pre-defined method and one locally (i.e. in-house) developed method. The data generated represents a valuable resource (i.e. benchmark) for future developments of algorithms and workflows for NTA experiments.
... The area’s benchmark text, completely revised and updated -- In the twenty years since publication of ... ... the first edition of The Statistical Analysis of Failure Time Data, researchers have produced a library ... ... Analysis of Correlated Failure Time Data -- With its comprehensive survey of the field and resources ... ... tor students and researchers, The Statistical Analysis of Failure Time Data remains the benchmark text ... ... A: Some Sets of Data 378 -- Appendix B: Supporting Technical Material 396 -- Bibliography 404 -- Author ...
Wiley series in probability and statistics
2nd ed. xiii, 439 s.
- Keywords
- Analýza dat, Analýza statistická, Regrese,
- Conspectus
- Statistika
- NML Fields
- statistika, zdravotnická statistika
OBJECTIVE: This study aimed to identify diets with improved nutrient quality and environmental impact within the boundaries of dietary practices. DESIGN: We used Data Envelopment Analysis to benchmark diets for improved adherence to food-based dietary guidelines (FBDG). We then optimised these diets for dietary preferences, nutrient quality and environmental impact. Diets were evaluated using the Nutrient Rich Diet score (NRD15.3), diet-related greenhouse gas emission (GHGE) and a diet similarity index that quantified the proportion of food intake that remained similar as compared with the observed diet. SETTING: National dietary surveys of four European countries (Denmark, Czech Republic, Italy and France). SUBJECTS: Approximately 6500 adults, aged 18-64 years. RESULTS: When dietary preferences were prioritised, NRD15·3 was ~6 % higher, GHGE was ~4 % lower and ~85 % of food intake remained similar. This diet had higher amounts of fruit, vegetables and whole grains than the observed diet. When nutrient quality was prioritised, NRD15·3 was ~16 % higher, GHGE was ~3 % lower and ~72 % of food intake remained similar. This diet had higher amounts of legumes and fish and lower amounts of sweetened and alcoholic beverages. Finally, when environmental impact was prioritised, NRD15·3 was ~9 % higher, GHGE was ~21 % lower and ~73 % of food intake remained similar. In this diet, red and processed meat partly shifted to either eggs, poultry, fish or dairy. CONCLUSIONS: Benchmark modelling can generate diets with improved adherence to FBDG within the boundaries of dietary practices, but fully maximising health and minimising GHGE cannot be achieved simultaneously.
- MeSH
- Benchmarking * MeSH
- Diet standards MeSH
- Adult MeSH
- Energy Intake MeSH
- Humans MeSH
- Carbon Footprint * MeSH
- Nutrition Surveys MeSH
- Check Tag
- Adult MeSH
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Czech Republic MeSH
- Europe MeSH
- France MeSH
- Italy MeSH