large‐scale datasets Dotaz Zobrazit nápovědu
Microscopic examination plays a significant role in the initial screening for a variety of hematological, as well as non-hematological, diagnoses. Microscopic blood smear examination that is considered a key diagnostic technique, is in recent clinical practice still performed manually, which is not only time consuming, but can lead to human errors. Although automated and semi-automated systems have been developed in recent years, their high purchasing and maintenance costs make them unaffordable for many medical institutions. Even though much research has been conducted lately to explore more accurate and feasible solutions, most researchers had to deal with a lack of medical data. To address the lack of large-scale databases in this field, we created a high-resolution dataset containing a total of 16027 annotated white blood cells. Moreover, the dataset covers overall 9 types of white blood cells, including clinically significant pathological findings. Since we used high-quality acquisition equipment, the dataset provides one of the highest quality images of blood cells, achieving an approximate resolution of 42 pixels per 1 μm.
- MeSH
- leukocyty * cytologie patologie MeSH
- lidé MeSH
- mikroskopie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- práce podpořená grantem MeSH
Vital mitochondrial DNA (mtDNA) populations exist in cells and may consist of heteroplasmic mixtures of mtDNA types. The evolution of these heteroplasmic populations through development, ageing, and generations is central to genetic diseases, but is poorly understood in mammals. Here we dissect these population dynamics using a dataset of unprecedented size and temporal span, comprising 1947 single-cell oocyte and 899 somatic measurements of heteroplasmy change throughout lifetimes and generations in two genetically distinct mouse models. We provide a novel and detailed quantitative characterisation of the linear increase in heteroplasmy variance throughout mammalian life courses in oocytes and pups. We find that differences in mean heteroplasmy are induced between generations, and the heteroplasmy of germline and somatic precursors diverge early in development, with a haplotype-specific direction of segregation. We develop stochastic theory predicting the implications of these dynamics for ageing and disease manifestation and discuss its application to human mtDNA dynamics.
- MeSH
- datové soubory jako téma MeSH
- genom mitochondriální genetika MeSH
- haplotypy genetika MeSH
- mitochondriální DNA genetika MeSH
- mitochondrie metabolismus MeSH
- modely u zvířat MeSH
- myši inbrední C57BL MeSH
- myši MeSH
- oocyty cytologie imunologie MeSH
- variabilita počtu kopií segmentů DNA genetika MeSH
- věkové faktory MeSH
- zvířata MeSH
- Check Tag
- myši MeSH
- ženské pohlaví MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Pancreatic ductal adenocarcinoma (PDAC), the most deadly solid malignancy, is typically detected late and at an inoperable stage. Early or incidental detection is associated with prolonged survival, but screening asymptomatic individuals for PDAC using a single test remains unfeasible due to the low prevalence and potential harms of false positives. Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale screening, however, identification of PDAC using non-contrast CT has long been considered impossible. Here, we develop a deep learning approach, pancreatic cancer detection with artificial intelligence (PANDA), that can detect and classify pancreatic lesions with high accuracy via non-contrast CT. PANDA is trained on a dataset of 3,208 patients from a single center. PANDA achieves an area under the receiver operating characteristic curve (AUC) of 0.986-0.996 for lesion detection in a multicenter validation involving 6,239 patients across 10 centers, outperforms the mean radiologist performance by 34.1% in sensitivity and 6.3% in specificity for PDAC identification, and achieves a sensitivity of 92.9% and specificity of 99.9% for lesion detection in a real-world multi-scenario validation consisting of 20,530 consecutive patients. Notably, PANDA utilized with non-contrast CT shows non-inferiority to radiology reports (using contrast-enhanced CT) in the differentiation of common pancreatic lesion subtypes. PANDA could potentially serve as a new tool for large-scale pancreatic cancer screening.
- MeSH
- deep learning * MeSH
- duktální karcinom pankreatu * diagnostické zobrazování patologie MeSH
- lidé MeSH
- nádory slinivky břišní * diagnostické zobrazování patologie MeSH
- pankreas diagnostické zobrazování patologie MeSH
- počítačová rentgenová tomografie MeSH
- retrospektivní studie MeSH
- umělá inteligence MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- multicentrická studie MeSH
A phylogenetic tree at the species level is still far off for highly diverse insect orders, including the Coleoptera, but the taxonomic breadth of public sequence databases is growing. In addition, new types of data may contribute to increasing taxon coverage, such as metagenomic shotgun sequencing for assembly of mitogenomes from bulk specimen samples. The current study explores the application of these techniques for large-scale efforts to build the tree of Coleoptera. We used shotgun data from 17 different ecological and taxonomic datasets (5 unpublished) to assemble a total of 1942 mitogenome contigs of >3000 bp. These sequences were combined into a single dataset together with all mitochondrial data available at GenBank, in addition to nuclear markers widely used in molecular phylogenetics. The resulting matrix of nearly 16,000 species with two or more loci produced trees (RAxML) showing overall congruence with the Linnaean taxonomy at hierarchical levels from suborders to genera. We tested the role of full-length mitogenomes in stabilizing the tree from GenBank data, as mitogenomes might link terminals with non-overlapping gene representation. However, the mitogenome data were only partly useful in this respect, presumably because of the purely automated approach to assembly and gene delimitation, but improvements in future may be possible by using multiple assemblers and manual curation. In conclusion, the combination of data mining and metagenomic sequencing of bulk samples provided the largest phylogenetic tree of Coleoptera to date, which represents a summary of existing phylogenetic knowledge and a defensible tree of great utility, in particular for studies at the intra-familial level, despite some shortcomings for resolving basal nodes.
- MeSH
- algoritmy MeSH
- brouci klasifikace genetika MeSH
- databáze genetické MeSH
- fylogeneze * MeSH
- metagenomika * MeSH
- mitochondrie genetika MeSH
- sekvence nukleotidů MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Transcriptome sequencing (RNA-seq) is widely used to detect gene rearrangements and quantitate gene expression in acute lymphoblastic leukemia (ALL), but its utility and accuracy in identifying copy number variations (CNVs) has not been well described. CNV information inferred from RNA-seq can be highly informative to guide disease classification and risk stratification in ALL due to the high incidence of aneuploid subtypes within this disease. Here we describe RNAseqCNV, a method to detect large scale CNVs from RNA-seq data. We used models based on normalized gene expression and minor allele frequency to classify arm level CNVs with high accuracy in ALL (99.1% overall and 98.3% for non-diploid chromosome arms, respectively), and the models were further validated with excellent performance in acute myeloid leukemia (accuracy 99.8% overall and 99.4% for non-diploid chromosome arms). RNAseqCNV outperforms alternative RNA-seq based algorithms in calling CNVs in the ALL dataset, especially in samples with a high proportion of CNVs. The CNV calls were highly concordant with DNA-based CNV results and more reliable than conventional cytogenetic-based karyotypes. RNAseqCNV provides a method to robustly identify copy number alterations in the absence of DNA-based analyses, further enhancing the utility of RNA-seq to classify ALL subtype.
- MeSH
- algoritmy MeSH
- karyotypizace MeSH
- lidé MeSH
- sekvenování transkriptomu MeSH
- variabilita počtu kopií segmentů DNA * genetika MeSH
- vysoce účinné nukleotidové sekvenování * metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
Left-right asymmetry is an important organizing feature of the healthy brain that may be altered in schizophrenia, but most studies have used relatively small samples and heterogeneous approaches, resulting in equivocal findings. We carried out the largest case-control study of structural brain asymmetries in schizophrenia, with MRI data from 5,080 affected individuals and 6,015 controls across 46 datasets, using a single image analysis protocol. Asymmetry indexes were calculated for global and regional cortical thickness, surface area, and subcortical volume measures. Differences of asymmetry were calculated between affected individuals and controls per dataset, and effect sizes were meta-analyzed across datasets. Small average case-control differences were observed for thickness asymmetries of the rostral anterior cingulate and the middle temporal gyrus, both driven by thinner left-hemispheric cortices in schizophrenia. Analyses of these asymmetries with respect to the use of antipsychotic medication and other clinical variables did not show any significant associations. Assessment of age- and sex-specific effects revealed a stronger average leftward asymmetry of pallidum volume between older cases and controls. Case-control differences in a multivariate context were assessed in a subset of the data (N = 2,029), which revealed that 7% of the variance across all structural asymmetries was explained by case-control status. Subtle case-control differences of brain macrostructural asymmetry may reflect differences at the molecular, cytoarchitectonic, or circuit levels that have functional relevance for the disorder. Reduced left middle temporal cortical thickness is consistent with altered left-hemisphere language network organization in schizophrenia.
- MeSH
- funkční lateralita MeSH
- lidé MeSH
- magnetická rezonanční tomografie metody MeSH
- mozek diagnostické zobrazování MeSH
- mozková kůra MeSH
- schizofrenie * diagnostické zobrazování MeSH
- studie případů a kontrol MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
Renal cell carcinoma (RCC) represents 2.2% of all cancer incidences; however, prognostic or predictive RCC biomarkers at protein level are largely missing. To support proteomics research of localized and metastatic RCC, we introduce a new library of targeted mass spectrometry assays for accurate protein quantification in malignant and normal kidney tissue. Aliquots of 86 initially localized RCC, 75 metastatic RCC and 17 adjacent non-cancerous fresh frozen tissue lysates were trypsin digested, pooled, and fractionated using hydrophilic chromatography. The fractions were analyzed using LC-MS/MS on QExactive HF-X mass spectrometer in data-dependent acquisition (DDA) mode. A resulting spectral library contains 77,817 peptides representing 7960 protein groups (FDR = 1%). Further, we confirm applicability of this library on four RCC datasets measured in data-independent acquisition (DIA) mode, demonstrating a specific quantification of a substantially increased part of RCC proteome, depending on LC-MS/MS instrumentation. Impact of sample specificity of the library on the results of targeted DIA data extraction was demonstrated by parallel analyses of two datasets by two pan human libraries. The new RCC specific library has potential to contribute to better understanding the RCC development at molecular level, leading to new diagnostic and therapeutic targets.
Recent developments in high-throughput sequencing (HTS), also called next-generation sequencing (NGS), technologies and bioinformatics have drastically changed research on viral pathogens and spurred growing interest in the field of virus diagnostics. However, the reliability of HTS-based virus detection protocols must be evaluated before adopting them for diagnostics. Many different bioinformatics algorithms aimed at detecting viruses in HTS data have been reported but little attention has been paid thus far to their sensitivity and reliability for diagnostic purposes. Therefore, we compared the ability of 21 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 12 plant viruses through a double-blind large-scale performance test using 10 datasets of 21- to 24-nucleotide small RNA (sRNA) sequences from three different infected plants. The sensitivity of virus detection ranged between 35 and 100% among participants, with a marked negative effect when sequence depth decreased. The false-positive detection rate was very low and mainly related to the identification of host genome-integrated viral sequences or misinterpretation of the results. Reproducibility was high (91.6%). This work revealed the key influence of bioinformatics strategies for the sensitive detection of viruses in HTS sRNA datasets and, more specifically (i) the difficulty in detecting viral agents when they are novel or their sRNA abundance is low, (ii) the influence of key parameters at both assembly and annotation steps, (iii) the importance of completeness of reference sequence databases, and (iv) the significant level of scientific expertise needed when interpreting pipeline results. Overall, this work underlines key parameters and proposes recommendations for reliable sRNA-based detection of known and unknown viruses.
Several molecular clonality assays have been developed to assess canine B cell proliferations. These assays were based on different sequence data, utilized different assay designs and employed different testing strategies. This has resulted in a complex body of literature and complicates evidence-based selection of primer sets. In addition, further refinement of primer sets is difficult because it is unknown how well current primer sets cover the expressed sequence repertoire. The objectives of this study were 1) to provide an overview of published IGH clonality assays that highlights key differences in assay design and testing strategy and 2) to propose a novel method for optimizing primer sets that leverages large-scale sequencing data. A review of previously published assays highlighted confounding factors that hamper a direct comparison of performance metrics between studies. These findings illustrate the need for a multi-institutional effort to harmonize veterinary clonality testing. A novel in silico analysis of primer sequences using a large dataset of expressed sequences identified shortfalls of existing primer sets and was used to guide primer optimization. Three optimized primer sets were tested and yielded qualitative sensitivity values between 80-90%. The qualitative sensitivity ranged from 1% to over 50% and was dependent on the size of the neoplastic clone and the sample DNA used. These findings illustrate that inclusion of high-throughput sequencing data for primer design can be a useful tool to guide primer design and optimization. This strategy could be applied to other antigen receptor loci or species to further improve veterinary clonality assays.
- MeSH
- B-lymfocyty cytologie MeSH
- buněčné klony * MeSH
- DNA primery * MeSH
- psi genetika imunologie MeSH
- těžké řetězce imunoglobulinů genetika MeSH
- zvířata MeSH
- Check Tag
- psi genetika imunologie MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.
- MeSH
- centromera genetika MeSH
- chromozomy rostlin genetika MeSH
- DNA rostlinná genetika MeSH
- konzervovaná sekvence genetika MeSH
- molekulární sekvence - údaje MeSH
- rýže (rod) genetika MeSH
- satelitní DNA genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH