large‐scale datasets
Dotaz
Zobrazit nápovědu
Microscopic examination plays a significant role in the initial screening for a variety of hematological, as well as non-hematological, diagnoses. Microscopic blood smear examination that is considered a key diagnostic technique, is in recent clinical practice still performed manually, which is not only time consuming, but can lead to human errors. Although automated and semi-automated systems have been developed in recent years, their high purchasing and maintenance costs make them unaffordable for many medical institutions. Even though much research has been conducted lately to explore more accurate and feasible solutions, most researchers had to deal with a lack of medical data. To address the lack of large-scale databases in this field, we created a high-resolution dataset containing a total of 16027 annotated white blood cells. Moreover, the dataset covers overall 9 types of white blood cells, including clinically significant pathological findings. Since we used high-quality acquisition equipment, the dataset provides one of the highest quality images of blood cells, achieving an approximate resolution of 42 pixels per 1 μm.
- MeSH
- leukocyty * cytologie patologie MeSH
- lidé MeSH
- mikroskopie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- práce podpořená grantem MeSH
A phylogenetic tree at the species level is still far off for highly diverse insect orders, including the Coleoptera, but the taxonomic breadth of public sequence databases is growing. In addition, new types of data may contribute to increasing taxon coverage, such as metagenomic shotgun sequencing for assembly of mitogenomes from bulk specimen samples. The current study explores the application of these techniques for large-scale efforts to build the tree of Coleoptera. We used shotgun data from 17 different ecological and taxonomic datasets (5 unpublished) to assemble a total of 1942 mitogenome contigs of >3000 bp. These sequences were combined into a single dataset together with all mitochondrial data available at GenBank, in addition to nuclear markers widely used in molecular phylogenetics. The resulting matrix of nearly 16,000 species with two or more loci produced trees (RAxML) showing overall congruence with the Linnaean taxonomy at hierarchical levels from suborders to genera. We tested the role of full-length mitogenomes in stabilizing the tree from GenBank data, as mitogenomes might link terminals with non-overlapping gene representation. However, the mitogenome data were only partly useful in this respect, presumably because of the purely automated approach to assembly and gene delimitation, but improvements in future may be possible by using multiple assemblers and manual curation. In conclusion, the combination of data mining and metagenomic sequencing of bulk samples provided the largest phylogenetic tree of Coleoptera to date, which represents a summary of existing phylogenetic knowledge and a defensible tree of great utility, in particular for studies at the intra-familial level, despite some shortcomings for resolving basal nodes.
- MeSH
- algoritmy MeSH
- brouci klasifikace genetika MeSH
- databáze genetické MeSH
- fylogeneze * MeSH
- metagenomika * MeSH
- mitochondrie genetika MeSH
- sekvence nukleotidů MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Transcriptome sequencing (RNA-seq) is widely used to detect gene rearrangements and quantitate gene expression in acute lymphoblastic leukemia (ALL), but its utility and accuracy in identifying copy number variations (CNVs) has not been well described. CNV information inferred from RNA-seq can be highly informative to guide disease classification and risk stratification in ALL due to the high incidence of aneuploid subtypes within this disease. Here we describe RNAseqCNV, a method to detect large scale CNVs from RNA-seq data. We used models based on normalized gene expression and minor allele frequency to classify arm level CNVs with high accuracy in ALL (99.1% overall and 98.3% for non-diploid chromosome arms, respectively), and the models were further validated with excellent performance in acute myeloid leukemia (accuracy 99.8% overall and 99.4% for non-diploid chromosome arms). RNAseqCNV outperforms alternative RNA-seq based algorithms in calling CNVs in the ALL dataset, especially in samples with a high proportion of CNVs. The CNV calls were highly concordant with DNA-based CNV results and more reliable than conventional cytogenetic-based karyotypes. RNAseqCNV provides a method to robustly identify copy number alterations in the absence of DNA-based analyses, further enhancing the utility of RNA-seq to classify ALL subtype.
- MeSH
- algoritmy MeSH
- karyotypizace MeSH
- lidé MeSH
- sekvenování transkriptomu MeSH
- variabilita počtu kopií segmentů DNA * genetika MeSH
- vysoce účinné nukleotidové sekvenování * metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
Renal cell carcinoma (RCC) represents 2.2% of all cancer incidences; however, prognostic or predictive RCC biomarkers at protein level are largely missing. To support proteomics research of localized and metastatic RCC, we introduce a new library of targeted mass spectrometry assays for accurate protein quantification in malignant and normal kidney tissue. Aliquots of 86 initially localized RCC, 75 metastatic RCC and 17 adjacent non-cancerous fresh frozen tissue lysates were trypsin digested, pooled, and fractionated using hydrophilic chromatography. The fractions were analyzed using LC-MS/MS on QExactive HF-X mass spectrometer in data-dependent acquisition (DDA) mode. A resulting spectral library contains 77,817 peptides representing 7960 protein groups (FDR = 1%). Further, we confirm applicability of this library on four RCC datasets measured in data-independent acquisition (DIA) mode, demonstrating a specific quantification of a substantially increased part of RCC proteome, depending on LC-MS/MS instrumentation. Impact of sample specificity of the library on the results of targeted DIA data extraction was demonstrated by parallel analyses of two datasets by two pan human libraries. The new RCC specific library has potential to contribute to better understanding the RCC development at molecular level, leading to new diagnostic and therapeutic targets.
Several molecular clonality assays have been developed to assess canine B cell proliferations. These assays were based on different sequence data, utilized different assay designs and employed different testing strategies. This has resulted in a complex body of literature and complicates evidence-based selection of primer sets. In addition, further refinement of primer sets is difficult because it is unknown how well current primer sets cover the expressed sequence repertoire. The objectives of this study were 1) to provide an overview of published IGH clonality assays that highlights key differences in assay design and testing strategy and 2) to propose a novel method for optimizing primer sets that leverages large-scale sequencing data. A review of previously published assays highlighted confounding factors that hamper a direct comparison of performance metrics between studies. These findings illustrate the need for a multi-institutional effort to harmonize veterinary clonality testing. A novel in silico analysis of primer sequences using a large dataset of expressed sequences identified shortfalls of existing primer sets and was used to guide primer optimization. Three optimized primer sets were tested and yielded qualitative sensitivity values between 80-90%. The qualitative sensitivity ranged from 1% to over 50% and was dependent on the size of the neoplastic clone and the sample DNA used. These findings illustrate that inclusion of high-throughput sequencing data for primer design can be a useful tool to guide primer design and optimization. This strategy could be applied to other antigen receptor loci or species to further improve veterinary clonality assays.
- MeSH
- B-lymfocyty cytologie MeSH
- buněčné klony * MeSH
- DNA primery * MeSH
- psi genetika imunologie MeSH
- těžké řetězce imunoglobulinů genetika MeSH
- zvířata MeSH
- Check Tag
- psi genetika imunologie MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.
- MeSH
- centromera genetika MeSH
- chromozomy rostlin genetika MeSH
- DNA rostlinná genetika MeSH
- konzervovaná sekvence genetika MeSH
- molekulární sekvence - údaje MeSH
- rýže (rod) genetika MeSH
- satelitní DNA genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
MOTIVATION: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. RESULTS: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. AVAILABILITY AND IMPLEMENTATION: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. CONTACT: muller@mou.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy * MeSH
- databáze proteinů * MeSH
- epitopy chemie MeSH
- interakční proteinové domény a motivy * MeSH
- lidé MeSH
- Markovovy řetězce MeSH
- molekulární sekvence - údaje MeSH
- monoklonální protilátky chemie MeSH
- peptidy chemie MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- shluková analýza MeSH
- software MeSH
- src homologní domény MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Vital mitochondrial DNA (mtDNA) populations exist in cells and may consist of heteroplasmic mixtures of mtDNA types. The evolution of these heteroplasmic populations through development, ageing, and generations is central to genetic diseases, but is poorly understood in mammals. Here we dissect these population dynamics using a dataset of unprecedented size and temporal span, comprising 1947 single-cell oocyte and 899 somatic measurements of heteroplasmy change throughout lifetimes and generations in two genetically distinct mouse models. We provide a novel and detailed quantitative characterisation of the linear increase in heteroplasmy variance throughout mammalian life courses in oocytes and pups. We find that differences in mean heteroplasmy are induced between generations, and the heteroplasmy of germline and somatic precursors diverge early in development, with a haplotype-specific direction of segregation. We develop stochastic theory predicting the implications of these dynamics for ageing and disease manifestation and discuss its application to human mtDNA dynamics.
- MeSH
- datové soubory jako téma MeSH
- genom mitochondriální genetika MeSH
- haplotypy genetika MeSH
- mitochondriální DNA genetika MeSH
- mitochondrie metabolismus MeSH
- modely u zvířat MeSH
- myši inbrední C57BL MeSH
- myši MeSH
- oocyty cytologie imunologie MeSH
- variabilita počtu kopií segmentů DNA genetika MeSH
- věkové faktory MeSH
- zvířata MeSH
- Check Tag
- myši MeSH
- ženské pohlaví MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
INTRODUCTION: Despite the widespread use of the Movement Assessment Battery for Children, 2nd edition (MABC-2), little is known about the sensitivity or specificity of the individual items to detect probable Developmental Coordination Disorder (p-DCD). This study examined which specific MABC-2 items were most sensitive to identify children with p-DCD and which items would predict p-DCD. METHODS: Based on a large dataset including European and African children aged 3-16 years (n = 4916, typically developing (TD, 49.6 % boys); n = 822 p-DCD (53.1 % boys), Hedges' g was calculated to establish the standardized mean difference (SMD) between p-DCD/TD. SMDs were considered substantial when absolute values at or above 1.4. Sensitivity and specificity of the raw MABC-2 item scores predicting p-DCD/TD per age band (AB) were established with logistic regression analysis. RESULTS: AB1: Children with p-DCD performed substantially poorer on threading beads (SMD: -1.61) and jumping on mats (SMD: 1.61). By combining all items and the country of origin, the sensitivity was 61.7 % and specificity 98.6 %. AB2: Walking heel-to-toe forwards (SMD: 1.65) was substantially poorer in p-DCD. By combining all items and the country of origin, the sensitivity was 79.0 % and specificity 97.6 %. AB3: Catching a ball with the preferred (SMD: 1.8) or non-preferred (SMD: 1.61) hand, and for walking heel-to-toe backwards (SMD: 1.78) were substantially poorer in p-DCD. All items combined resulted in a sensitivity of 94.4 % and specificity of 99.6 %. CONCLUSION: Not all MABC-2 items are equally sensitive to distinguish between performances of p-DCD and TD. Despite the good specificity, the sensitivity was only moderate in AB1-2, the age at which children learn culturally influenced motor skills.
- MeSH
- dítě MeSH
- lidé MeSH
- logistické modely MeSH
- mladiství MeSH
- motorické dovednosti MeSH
- pohyb MeSH
- poruchy motorických dovedností * diagnóza MeSH
- předškolní dítě MeSH
- senzitivita a specificita * MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Afrika MeSH
- Evropa MeSH
Pancreatic ductal adenocarcinoma (PDAC), the most deadly solid malignancy, is typically detected late and at an inoperable stage. Early or incidental detection is associated with prolonged survival, but screening asymptomatic individuals for PDAC using a single test remains unfeasible due to the low prevalence and potential harms of false positives. Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale screening, however, identification of PDAC using non-contrast CT has long been considered impossible. Here, we develop a deep learning approach, pancreatic cancer detection with artificial intelligence (PANDA), that can detect and classify pancreatic lesions with high accuracy via non-contrast CT. PANDA is trained on a dataset of 3,208 patients from a single center. PANDA achieves an area under the receiver operating characteristic curve (AUC) of 0.986-0.996 for lesion detection in a multicenter validation involving 6,239 patients across 10 centers, outperforms the mean radiologist performance by 34.1% in sensitivity and 6.3% in specificity for PDAC identification, and achieves a sensitivity of 92.9% and specificity of 99.9% for lesion detection in a real-world multi-scenario validation consisting of 20,530 consecutive patients. Notably, PANDA utilized with non-contrast CT shows non-inferiority to radiology reports (using contrast-enhanced CT) in the differentiation of common pancreatic lesion subtypes. PANDA could potentially serve as a new tool for large-scale pancreatic cancer screening.
- MeSH
- deep learning * MeSH
- duktální karcinom slinivky břišní * diagnostické zobrazování patologie MeSH
- lidé MeSH
- nádory slinivky břišní * diagnostické zobrazování patologie MeSH
- pankreas diagnostické zobrazování patologie MeSH
- počítačová rentgenová tomografie MeSH
- retrospektivní studie MeSH
- umělá inteligence MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- multicentrická studie MeSH