Data clustering problems
Dotaz
Zobrazit nápovědu
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
- MeSH
- genom lidský * MeSH
- genomika MeSH
- lidé MeSH
- nejistota MeSH
- shluková analýza MeSH
- strukturální variace genomu * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
The paper presents an application of a clustering technique inspired by ant colony metaheuristics. The paper addresses the problem of long-term (Holter) electrocardiogram data processing. Long-term recording produces a huge amount of biomedical data, which must be preprocessed prior to its presentation to the specialist. The paper also discusses relevant aspects improving the robustness, stability and convergence criteria of the method. The method is compared with well known clustering techniques (both classical and nature-inspired), first testing on the known dataset and finally applying them to the real ECG data records from the MIT-BIH database and outperforms the standard methods. Electrocardiogram data clustering can effectively reduce the amount of data presented to the cardiologist: cardiac arrhythmia and significant morphology changes in the ECG can be visually emphasized in a reasonable time. The final evaluation of the ECG recording must still be made by an expert.
- MeSH
- algoritmy MeSH
- biomimetika metody MeSH
- chování zvířat MeSH
- diagnóza počítačová metody MeSH
- elektrokardiografie ambulantní metody MeSH
- financování organizované MeSH
- Formicidae fyziologie MeSH
- lidé MeSH
- počítačové zpracování signálu MeSH
- reprodukovatelnost výsledků MeSH
- rozpoznávání automatizované metody MeSH
- senzitivita a specificita MeSH
- shluková analýza MeSH
- srdeční arytmie diagnóza patofyziologie MeSH
- srdeční frekvence MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- hodnotící studie MeSH
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.
- MeSH
- databáze proteinů * MeSH
- konformace proteinů MeSH
- shluková analýza MeSH
- software * MeSH
- správnost dat MeSH
- uživatelské rozhraní počítače MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Evropa MeSH
BACKGROUND: Emotional and behavioural problems (EBP) are the most common mental health issues during adolescence, and their incidence has increased in recent years. The system of care for adolescents with EBP is known to have several problems, making the provision of care less than optimal, and attention needs to be given to potential improvements. We, therefore, aimed to examine what needs to be done to improve the system of care for adolescents with EBP and to assess the urgency and feasibility of the proposed measures from the perspective of care providers. METHODS: We used Concept mapping, a participatory mixed-method research, based on qualitative data collection and quantitative data analysis. A total of 33 stakeholders from 17 institutions participated in our study, including psychologists, pedagogues for children with special needs, teachers, educational counsellors, social workers and child psychiatrists. RESULTS: Respondents identified 43 ideas for improving of the system of care for adolescents with EBP grouped into 5 clusters related to increasing the competencies of care providers, changes at schools and school systems, support for existing services, transparency of the care system in institutions and public administration, and the adjustment of legislative conditions. The most urgent and feasible proposals were related to the support of awareness-raising activities on the topic of EBP, the creation of effective screening tools for the identification of EBP in adolescents, strengthening the role of parents in the process of care, comprehensive work with the family, creation of multidisciplinary support teams and intersectoral cooperation. CONCLUSIONS: Measures which are more accessible and responsive to the pitfalls of the care system, together with those strengthening the role of families and schools, have greater potential for improvements which are in favour of adolescents with EBP. Care providers should be invited more often and much more involved in the discussion and the co-creation of measures to improve the system of care for adolescents with EBP.
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.
- MeSH
- lidé MeSH
- populační skupiny MeSH
- randomizované kontrolované studie jako téma * MeSH
- shluková analýza * MeSH
- statistické modely MeSH
- výzkumný projekt * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
... WHY COLLECT INJURY DATA USING A COMMUNITY SURVEY? 4 -- 2.1 What is an injury? ... ... DATA COLLECTION 25 -- 5.1 Survey data elements 25 -- 5.1.1 Core and expanded data sets 25 -- 5.1.2 Using ... ... elements 31 -- 5.4 Expanded data elements 43 -- 5.5 Additional expanded data elements 62 -- 5.5.1 Socioeconomic ... ... DATA ENTRY AND ANALYSIS 75 -- 9.1 Data entry 75 -- 9.2 Statistical data analysis 76 -- 9.2.1 Descriptive ... ... data analysis 76 -- 9.2.2 Cross-tabulations 77 -- 9.2.3 More advanced forms of data analysis 78 -- 9.3 ...
140 s. : tab. ; 30 cm
- MeSH
- manuály jako téma MeSH
- rány a poranění epidemiologie MeSH
- sběr dat metody MeSH
- Publikační typ
- příručky MeSH
- Konspekt
- Veřejné zdraví a hygiena
- NLK Obory
- traumatologie
- lékařská informatika
- veřejné zdravotnictví
- NLK Publikační typ
- publikace WHO
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.
- MeSH
- lidé MeSH
- randomizované kontrolované studie jako téma metody MeSH
- shluková analýza * MeSH
- statistické modely MeSH
- velikost vzorku MeSH
- výzkumný projekt * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Many cases of rapid evolutionary radiations in plant and animal lineages are known; however phylogenetic relationships among these lineages have been difficult to resolve by systematists. Increasing amounts of genomic data have been sequentially applied in an attempt to resolve these radiations, dissecting their evolutionary patterns into a series of bifurcating events. Here we explore one such rapid radiation in the tropical plant order Zingiberales (the bananas and relatives) which includes eight families, approximately 110 genera, and more than 2600 species. One clade, the "Ginger families", including (Costaceae + Zingiberaceae) (Marantaceae + Cannaceae), has been well-resolved and well-supported in all previous studies. However, well-supported reconstructions among the "Banana families" (Musaceae, Heliconiaceae, Lowiaceae, Strelitziaceae), which most likely diverged about 90 Mya, have been difficult to confirm. Supported with anatomical, morphological, single locus, and genome-wide data, nearly every possible phylogenetic placement has been proposed for these families. In an attempt to resolve this complex evolutionary event, hybridization-based target enrichment was used to obtain sequences from up to 378 putatively orthologous low-copy nuclear genes (all ≥ 960 bp). Individual gene trees recovered multiple topologies among the early divergent lineages, with varying levels of support for these relationships. One topology of the "Banana families" (Musaceae (Heliconiaceae (Lowiaceae + Strelitziaceae))), which has not been suggested until now, was almost consistently recovered in all multilocus analyses of the nuclear dataset (concatenated - ExaML, coalescent - ASTRAL and ASTRID, supertree - MRL, and Bayesian concordance - BUCKy). Nevertheless, the multiple topologies recovered among these lineages suggest that even large amounts of genomic data might not be able to fully resolve relationships at this phylogenetic depth. This lack of well-supported resolution could suggest methodological problems (i.e., violation of model assumptions in both concatenated and coalescent analyses) or more likely reflect an evolutionary history shaped by an explosive, rapid, and nearly simultaneous polychotomous radiation in this group of plants towards the end of the Cretaceous, perhaps driven by vertebrate pollinator selection.
- MeSH
- Bayesova věta MeSH
- buněčné jádro genetika MeSH
- databáze genetické MeSH
- fylogeneze * MeSH
- genomika * MeSH
- otevřené čtecí rámce genetika MeSH
- tropické klima * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zázvorníkotvaré klasifikace genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
Based on molecular data three major clades have been recognized within Bilateria: Deuterostomia, Ecdysozoa, and Spiralia. Within Spiralia, small-sized and simply organized animals such as flatworms, gastrotrichs, and gnathostomulids have recently been grouped together as Platyzoa. However, the representation of putative platyzoans was low in the respective molecular phylogenetic studies, in terms of both, taxon number and sequence data. Furthermore, increased substitution rates in platyzoan taxa raised the possibility that monophyletic Platyzoa represents an artifact due to long-branch attraction. In order to overcome such problems, we employed a phylogenomic approach, thereby substantially increasing 1) the number of sampled species within Platyzoa and 2) species-specific sequence coverage in data sets of up to 82,162 amino acid positions. Using established and new measures (long-branch score), we disentangled phylogenetic signal from misleading effects such as long-branch attraction. In doing so, our phylogenomic analyses did not recover a monophyletic origin of platyzoan taxa that, instead, appeared paraphyletic with respect to the other spiralians. Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa and all other spiralians represent a monophyletic group, which we name Platytrochozoa. Platyzoan paraphyly suggests that the last common ancestor of Spiralia was a simple-bodied organism lacking coelomic cavities, segmentation, and complex brain structures, and that more complex animals such as annelids evolved from such a simply organized ancestor. This conclusion contradicts alternative evolutionary scenarios proposing an annelid-like ancestor of Bilateria and Spiralia and several independent events of secondary reduction.
- MeSH
- cizopasní červi klasifikace genetika MeSH
- fylogeneze MeSH
- genom u helmintů MeSH
- genomika metody MeSH
- molekulární evoluce MeSH
- ploštěnci klasifikace genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH