JavaScript is NOT enabled !

Please enable JavaScript.

large‐scale datasets Query Show help

Exact matching Semantic

Reset

53 hits in Medvik

Online article

A high-resolution large-scale dataset of pathological and normal white blood cells

... To address the lack of large-scale databases in this field, we created a high-resolution dataset containing ...

Bodzas, Alexandra
Author Bodzas, Alexandra Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Ostrava, Czech Republic. alexandra.bodzas@vsb.cz
Kodytek, Pavel
Author Kodytek, Pavel Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Zidek, Jan
Author Zidek, Jan Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Ostrava, Czech Republic

Scientific data. 2023 ; 10 (1) : 466. [pub] 20230719

Sci Data
ISSN 2052-4463
Medvik
Source

Microscopic examination plays a significant role in the initial screening for a variety of hematological, as well as non-hematological, diagnoses. Microscopic blood smear examination that is considered a key diagnostic technique, is in recent clinical practice still performed manually, which is not only time consuming, but can lead to human errors. Although automated and semi-automated systems have been developed in recent years, their high purchasing and maintenance costs make them unaffordable for many medical institutions. Even though much research has been conducted lately to explore more accurate and feasible solutions, most researchers had to deal with a lack of medical data. To address the lack of large-scale databases in this field, we created a high-resolution dataset containing a total of 16027 annotated white blood cells. Moreover, the dataset covers overall 9 types of white blood cells, including clinically significant pathological findings. Since we used high-quality acquisition equipment, the dataset provides one of the highest quality images of blood cells, achieving an approximate resolution of 42 pixels per 1 μm.

Article

The contribution of mitochondrial metagenomics to large-scale data mining and phylogenetic analysis of Coleoptera

... The current study explores the application of these techniques for large-scale efforts to build the tree ...

Molecular phylogenetics and evolution. 2018 ; 128 (-) : 1-11. [pub] 20180725

Mol Phylogenet Evol
ISSN 1095-9513
Medvik
Source

A phylogenetic tree at the species level is still far off for highly diverse insect orders, including the Coleoptera, but the taxonomic breadth of public sequence databases is growing. In addition, new types of data may contribute to increasing taxon coverage, such as metagenomic shotgun sequencing for assembly of mitogenomes from bulk specimen samples. The current study explores the application of these techniques for large-scale efforts to build the tree of Coleoptera. We used shotgun data from 17 different ecological and taxonomic datasets (5 unpublished) to assemble a total of 1942 mitogenome contigs of >3000 bp. These sequences were combined into a single dataset together with all mitochondrial data available at GenBank, in addition to nuclear markers widely used in molecular phylogenetics. The resulting matrix of nearly 16,000 species with two or more loci produced trees (RAxML) showing overall congruence with the Linnaean taxonomy at hierarchical levels from suborders to genera. We tested the role of full-length mitogenomes in stabilizing the tree from GenBank data, as mitogenomes might link terminals with non-overlapping gene representation. However, the mitogenome data were only partly useful in this respect, presumably because of the purely automated approach to assembly and gene delimitation, but improvements in future may be possible by using multiple assemblers and manual curation. In conclusion, the combination of data mining and metagenomic sequencing of bulk samples provided the largest phylogenetic tree of Coleoptera to date, which represents a summary of existing phylogenetic knowledge and a defensible tree of great utility, in particular for studies at the intra-familial level, despite some shortcomings for resolving basal nodes.

Online article

RNAseqCNV: analysis of large-scale copy number variations from RNA-seq data

... Here we describe RNAseqCNV, a method to detect large scale CNVs from RNA-seq data. ...

Leukemia. 2022 ; 36 (6) : 1492-1498. [pub] 20220329

ISSN 1476-5551
Medvik
Source

Transcriptome sequencing (RNA-seq) is widely used to detect gene rearrangements and quantitate gene expression in acute lymphoblastic leukemia (ALL), but its utility and accuracy in identifying copy number variations (CNVs) has not been well described. CNV information inferred from RNA-seq can be highly informative to guide disease classification and risk stratification in ALL due to the high incidence of aneuploid subtypes within this disease. Here we describe RNAseqCNV, a method to detect large scale CNVs from RNA-seq data. We used models based on normalized gene expression and minor allele frequency to classify arm level CNVs with high accuracy in ALL (99.1% overall and 98.3% for non-diploid chromosome arms, respectively), and the models were further validated with excellent performance in acute myeloid leukemia (accuracy 99.8% overall and 99.4% for non-diploid chromosome arms). RNAseqCNV outperforms alternative RNA-seq based algorithms in calling CNVs in the ALL dataset, especially in samples with a high proportion of CNVs. The CNV calls were highly concordant with DNA-based CNV results and more reliable than conventional cytogenetic-based karyotypes. RNAseqCNV provides a method to robustly identify copy number alterations in the absence of DNA-based analyses, further enhancing the utility of RNA-seq to classify ALL subtype.

MeSH
Algorithms MeSH
Karyotyping MeSH
Humans MeSH
RNA-Seq MeSH
DNA Copy Number Variations * genetics MeSH
High-Throughput Nucleotide Sequencing * methods MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, N.I.H., Extramural MeSH

Online article

A large-scale assay library for targeted protein quantification in renal cell carcinoma tissues

... Further, we confirm applicability of this library on four RCC datasets measured in data-independent acquisition ...

Proteomics. 2022 ; 22 (7) : e2100228. [pub] 20211222

ISSN 1615-9861
Medvik
Source

Renal cell carcinoma (RCC) represents 2.2% of all cancer incidences; however, prognostic or predictive RCC biomarkers at protein level are largely missing. To support proteomics research of localized and metastatic RCC, we introduce a new library of targeted mass spectrometry assays for accurate protein quantification in malignant and normal kidney tissue. Aliquots of 86 initially localized RCC, 75 metastatic RCC and 17 adjacent non-cancerous fresh frozen tissue lysates were trypsin digested, pooled, and fractionated using hydrophilic chromatography. The fractions were analyzed using LC-MS/MS on QExactive HF-X mass spectrometer in data-dependent acquisition (DDA) mode. A resulting spectral library contains 77,817 peptides representing 7960 protein groups (FDR = 1%). Further, we confirm applicability of this library on four RCC datasets measured in data-independent acquisition (DIA) mode, demonstrating a specific quantification of a substantially increased part of RCC proteome, depending on LC-MS/MS instrumentation. Impact of sample specificity of the library on the results of targeted DIA data extraction was demonstrated by parallel analyses of two datasets by two pan human libraries. The new RCC specific library has potential to contribute to better understanding the RCC development at molecular level, leading to new diagnostic and therapeutic targets.

Online article

A review of canine B cell clonality assays and primer set optimization using large-scale repertoire data

... design and testing strategy and 2) to propose a novel method for optimizing primer sets that leverages large-scale ...

Veterinary immunology and immunopathology. 2019 ; 209 (-) : 45-52. [pub] 20190125

Vet Immunol Immunopathol
ISSN 1873-2534
Medvik
Source

Several molecular clonality assays have been developed to assess canine B cell proliferations. These assays were based on different sequence data, utilized different assay designs and employed different testing strategies. This has resulted in a complex body of literature and complicates evidence-based selection of primer sets. In addition, further refinement of primer sets is difficult because it is unknown how well current primer sets cover the expressed sequence repertoire. The objectives of this study were 1) to provide an overview of published IGH clonality assays that highlights key differences in assay design and testing strategy and 2) to propose a novel method for optimizing primer sets that leverages large-scale sequencing data. A review of previously published assays highlighted confounding factors that hamper a direct comparison of performance metrics between studies. These findings illustrate the need for a multi-institutional effort to harmonize veterinary clonality testing. A novel in silico analysis of primer sequences using a large dataset of expressed sequences identified shortfalls of existing primer sets and was used to guide primer optimization. Three optimized primer sets were tested and yielded qualitative sensitivity values between 80-90%. The qualitative sensitivity ranged from 1% to over 50% and was dependent on the size of the neoplastic clone and the sample DNA used. These findings illustrate that inclusion of high-throughput sequencing data for primer design can be a useful tool to guide primer design and optimization. This strategy could be applied to other antigen receptor loci or species to further improve veterinary clonality assays.

MeSH
B-Lymphocytes cytology MeSH
Clone Cells * MeSH
DNA Primers * MeSH
Dogs genetics immunology MeSH
Immunoglobulin Heavy Chains genetics MeSH
Animals MeSH
Check Tag
Dogs genetics immunology MeSH
Animals MeSH
Publication type
Journal Article MeSH
Review MeSH

Online article

Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data

... well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets ...

Bioinformatics. 2010 ; 26 (17) : 2101-8. [pub] 20100708

ISSN 1367-4811
Medvik
Source

MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.

MeSH
Centromere genetics MeSH
Chromosomes, Plant genetics MeSH
DNA, Plant genetics MeSH
Conserved Sequence genetics MeSH
Molecular Sequence Data MeSH
Oryza genetics MeSH
DNA, Satellite genetics MeSH
Base Sequence MeSH
Sequence Analysis, DNA methods MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH

Online article

Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets

... Continued development of these methods allows for large-scale screening, resulting in vast amounts of ...

Bioinformatics. 2016 ; 32 (1) : 9-16. [pub] 20150905

ISSN 1367-4811
Medvik
Source

MOTIVATION: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. RESULTS: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. AVAILABILITY AND IMPLEMENTATION: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. CONTACT: muller@mou.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

MeSH
Algorithms * MeSH
Databases, Protein * MeSH
Epitopes chemistry MeSH
Protein Interaction Domains and Motifs * MeSH
Humans MeSH
Markov Chains MeSH
Molecular Sequence Data MeSH
Antibodies, Monoclonal chemistry MeSH
Peptides chemistry MeSH
Amino Acid Sequence MeSH
Sequence Alignment MeSH
Cluster Analysis MeSH
Software MeSH
src Homology Domains MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Online article

Large-scale genetic analysis reveals mammalian mtDNA heteroplasmy dynamics and variance increase through lifetimes and generations

... Here we dissect these population dynamics using a dataset of unprecedented size and temporal span, comprising ...

Nature communications. 2018 ; 9 (1) : 2488. [pub] 20180627

Nat Commun
ISSN 2041-1723
Medvik
Source

Vital mitochondrial DNA (mtDNA) populations exist in cells and may consist of heteroplasmic mixtures of mtDNA types. The evolution of these heteroplasmic populations through development, ageing, and generations is central to genetic diseases, but is poorly understood in mammals. Here we dissect these population dynamics using a dataset of unprecedented size and temporal span, comprising 1947 single-cell oocyte and 899 somatic measurements of heteroplasmy change throughout lifetimes and generations in two genetically distinct mouse models. We provide a novel and detailed quantitative characterisation of the linear increase in heteroplasmy variance throughout mammalian life courses in oocytes and pups. We find that differences in mean heteroplasmy are induced between generations, and the heteroplasmy of germline and somatic precursors diverge early in development, with a haplotype-specific direction of segregation. We develop stochastic theory predicting the implications of these dynamics for ageing and disease manifestation and discuss its application to human mtDNA dynamics.

MeSH
Datasets as Topic MeSH
Genome, Mitochondrial genetics MeSH
Haplotypes genetics MeSH
DNA, Mitochondrial genetics MeSH
Mitochondria metabolism MeSH
Models, Animal MeSH
Mice, Inbred C57BL MeSH
Mice MeSH
Oocytes cytology immunology MeSH
DNA Copy Number Variations genetics MeSH
Age Factors MeSH
Animals MeSH
Check Tag
Mice MeSH
Female MeSH
Animals MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Article

Which items of the movement assessment battery for children are most sensitive for identifying children with probable developmental coordination disorder? Results from a large-scale study

... METHODS: Based on a large dataset including European and African children aged 3-16 years (n = 4916, ...

Research in developmental disabilities. 2025 ; 157 (-) : 104904. [pub] 20250108

Res Dev Disabil
ISSN 1873-3379
Medvik
Source

INTRODUCTION: Despite the widespread use of the Movement Assessment Battery for Children, 2nd edition (MABC-2), little is known about the sensitivity or specificity of the individual items to detect probable Developmental Coordination Disorder (p-DCD). This study examined which specific MABC-2 items were most sensitive to identify children with p-DCD and which items would predict p-DCD. METHODS: Based on a large dataset including European and African children aged 3-16 years (n = 4916, typically developing (TD, 49.6 % boys); n = 822 p-DCD (53.1 % boys), Hedges' g was calculated to establish the standardized mean difference (SMD) between p-DCD/TD. SMDs were considered substantial when absolute values at or above 1.4. Sensitivity and specificity of the raw MABC-2 item scores predicting p-DCD/TD per age band (AB) were established with logistic regression analysis. RESULTS: AB1: Children with p-DCD performed substantially poorer on threading beads (SMD: -1.61) and jumping on mats (SMD: 1.61). By combining all items and the country of origin, the sensitivity was 61.7 % and specificity 98.6 %. AB2: Walking heel-to-toe forwards (SMD: 1.65) was substantially poorer in p-DCD. By combining all items and the country of origin, the sensitivity was 79.0 % and specificity 97.6 %. AB3: Catching a ball with the preferred (SMD: 1.8) or non-preferred (SMD: 1.61) hand, and for walking heel-to-toe backwards (SMD: 1.78) were substantially poorer in p-DCD. All items combined resulted in a sensitivity of 94.4 % and specificity of 99.6 %. CONCLUSION: Not all MABC-2 items are equally sensitive to distinguish between performances of p-DCD and TD. Despite the good specificity, the sensitivity was only moderate in AB1-2, the age at which children learn culturally influenced motor skills.

Article

Large-scale pancreatic cancer detection via non-contrast CT and deep learning

... Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale ...

Nature medicine. 2023 ; 29 (12) : 3033-3043. [pub] 20231120

Nat Med
ISSN 1546-170X
Medvik
Source

Pancreatic ductal adenocarcinoma (PDAC), the most deadly solid malignancy, is typically detected late and at an inoperable stage. Early or incidental detection is associated with prolonged survival, but screening asymptomatic individuals for PDAC using a single test remains unfeasible due to the low prevalence and potential harms of false positives. Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale screening, however, identification of PDAC using non-contrast CT has long been considered impossible. Here, we develop a deep learning approach, pancreatic cancer detection with artificial intelligence (PANDA), that can detect and classify pancreatic lesions with high accuracy via non-contrast CT. PANDA is trained on a dataset of 3,208 patients from a single center. PANDA achieves an area under the receiver operating characteristic curve (AUC) of 0.986-0.996 for lesion detection in a multicenter validation involving 6,239 patients across 10 centers, outperforms the mean radiologist performance by 34.1% in sensitivity and 6.3% in specificity for PDAC identification, and achieves a sensitivity of 92.9% and specificity of 99.9% for lesion detection in a real-world multi-scenario validation consisting of 20,530 consecutive patients. Notably, PANDA utilized with non-contrast CT shows non-inferiority to radiology reports (using contrast-enhanced CT) in the differentiation of common pancreatic lesion subtypes. PANDA could potentially serve as a new tool for large-scale pancreatic cancer screening.

MeSH
Deep Learning * MeSH
Carcinoma, Pancreatic Ductal * diagnostic imaging pathology MeSH
Humans MeSH
Pancreatic Neoplasms * diagnostic imaging pathology MeSH
Pancreas diagnostic imaging pathology MeSH
Tomography, X-Ray Computed MeSH
Retrospective Studies MeSH
Artificial Intelligence MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Multicenter Study MeSH

Collections

Published

Filters

large‐scale datasets Query Show help

Exact matching Semantic

large‐scale datasets Query Show help Exact matching Semantic

Refine by MeSH

large‐scale datasets Query Show help

Exact matching Semantic