The Aquila Optimizer (AO) is a newly proposed, highly capable metaheuristic algorithm based on the hunting and search behavior of the Aquila bird. However, the AO faces some challenges when dealing with high-dimensional optimization problems due to its narrow exploration capabilities and a tendency to converge prematurely to local optima, which can decrease its performance in complex scenarios. This paper presents a modified form of the previously proposed AO, the Locality Opposition-Based Learning Aquila Optimizer (LOBLAO), aimed at resolving such issues and improving the performance of tasks related to global optimization and data clustering in particular. The proposed LOBLAO incorporates two key advancements: the Opposition-Based Learning (OBL) strategy, which enhances solution diversity and balances exploration and exploitation, and the Mutation Search Strategy (MSS), which mitigates the risk of local optima and ensures robust exploration of the search space. Comprehensive experiments on benchmark test functions and data clustering problems demonstrate the efficacy of LOBLAO. The results reveal that LOBLAO outperforms the original AO and several state-of-the-art optimization algorithms, showcasing superior performance in tackling high-dimensional datasets. In particular, LOBLAO achieved the best average ranking of 1.625 across multiple clustering problems, underscoring its robustness and versatility. These findings highlight the significant potential of LOBLAO to solve diverse and challenging optimization problems, establishing it as a valuable tool for researchers and practitioners.
- Keywords
- Aquila optimizer, Data clustering problems, Meta-heuristics optimization algorithms, Opposition-based learning, Optimization problems,
- Publication type
- Journal Article MeSH
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
- Keywords
- Breakpoints uncertainty problem, Constrained clustering, Mendelian inheritance error, Structural variants, Whole genome sequencing,
- MeSH
- Genome, Human * MeSH
- Genomics MeSH
- Humans MeSH
- Uncertainty MeSH
- Cluster Analysis MeSH
- Genomic Structural Variation * MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.
- Keywords
- Cybersecurity education, Data science, Educational data mining, Learning analytics, Security training,
- Publication type
- Journal Article MeSH
The paper presents an application of a clustering technique inspired by ant colony metaheuristics. The paper addresses the problem of long-term (Holter) electrocardiogram data processing. Long-term recording produces a huge amount of biomedical data, which must be preprocessed prior to its presentation to the specialist. The paper also discusses relevant aspects improving the robustness, stability and convergence criteria of the method. The method is compared with well known clustering techniques (both classical and nature-inspired), first testing on the known dataset and finally applying them to the real ECG data records from the MIT-BIH database and outperforms the standard methods. Electrocardiogram data clustering can effectively reduce the amount of data presented to the cardiologist: cardiac arrhythmia and significant morphology changes in the ECG can be visually emphasized in a reasonable time. The final evaluation of the ECG recording must still be made by an expert.
- MeSH
- Algorithms MeSH
- Biomimetics methods MeSH
- Behavior, Animal MeSH
- Diagnosis, Computer-Assisted methods MeSH
- Electrocardiography, Ambulatory methods MeSH
- Ants physiology MeSH
- Humans MeSH
- Signal Processing, Computer-Assisted * MeSH
- Reproducibility of Results MeSH
- Pattern Recognition, Automated methods MeSH
- Sensitivity and Specificity MeSH
- Cluster Analysis * MeSH
- Arrhythmias, Cardiac diagnosis physiopathology MeSH
- Heart Rate * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Evaluation Study MeSH
- Research Support, Non-U.S. Gov't MeSH
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.
- MeSH
- Databases, Protein * MeSH
- Protein Conformation MeSH
- Cluster Analysis MeSH
- Software * MeSH
- Data Accuracy MeSH
- User-Computer Interface MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Europe MeSH
BACKGROUND: Emotional and behavioural problems (EBP) are the most common mental health issues during adolescence, and their incidence has increased in recent years. The system of care for adolescents with EBP is known to have several problems, making the provision of care less than optimal, and attention needs to be given to potential improvements. We, therefore, aimed to examine what needs to be done to improve the system of care for adolescents with EBP and to assess the urgency and feasibility of the proposed measures from the perspective of care providers. METHODS: We used Concept mapping, a participatory mixed-method research, based on qualitative data collection and quantitative data analysis. A total of 33 stakeholders from 17 institutions participated in our study, including psychologists, pedagogues for children with special needs, teachers, educational counsellors, social workers and child psychiatrists. RESULTS: Respondents identified 43 ideas for improving of the system of care for adolescents with EBP grouped into 5 clusters related to increasing the competencies of care providers, changes at schools and school systems, support for existing services, transparency of the care system in institutions and public administration, and the adjustment of legislative conditions. The most urgent and feasible proposals were related to the support of awareness-raising activities on the topic of EBP, the creation of effective screening tools for the identification of EBP in adolescents, strengthening the role of parents in the process of care, comprehensive work with the family, creation of multidisciplinary support teams and intersectoral cooperation. CONCLUSIONS: Measures which are more accessible and responsive to the pitfalls of the care system, together with those strengthening the role of families and schools, have greater potential for improvements which are in favour of adolescents with EBP. Care providers should be invited more often and much more involved in the discussion and the co-creation of measures to improve the system of care for adolescents with EBP.
- Keywords
- Adolescents, Care providers, Children, Concept mapping, Emotional and behavioural problems, System of care,
- MeSH
- Child MeSH
- Emotions MeSH
- Humans MeSH
- Adolescent MeSH
- Problem Behavior * psychology MeSH
- Parents psychology MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Adolescent MeSH
- Publication type
- Journal Article MeSH
To solve recurring problems in drug discovery, matched molecular pair (MMP) analysis is used to understand relationships between chemical structure and function. For the MMP analysis of large data sets (>10,000 compounds), available tools lack flexible search and visualization functionality and require computational expertise. Here, we present Matcher, an open-source application for MMP analysis, with novel search algorithms and fully automated querying-to-visualization that requires no programming expertise. Matcher enables unprecedented control over the search and clustering of MMP transformations based on both variable fragment and constant environment structure, which is critical for disentangling relevant and irrelevant data to a given problem. Users can exert such control through a built-in chemical sketcher and with a few mouse clicks can navigate between resulting MMP transformations, statistics, property distribution graphs, and structures with raw experimental data, for confident and accelerated decision making. Matcher can be used with any collection of structure/property data; here, we demonstrate usage with a public ChEMBL data set of about 20,000 small molecules with CYP3A4 and/or hERG inhibition data. Users can reproduce all examples demonstrated herein via unique links within Matcher's interface-a functionality that anyone can use to preserve and share their own analyses. Matcher and all its dependencies are open-source, can be used for free, and are available with containerized deployment from code at https://github.com/Merck/Matcher. Matcher makes large structure/property data sets more transparent than ever before and accelerates the data-driven solution of common problems in drug discovery.
Disease registries will often contain the addresses of cases included in the registry. If the registry includes information on all cases, or deaths, occurring in a defined geographical area and time period and if there is a postcode/zip code or map reference for each case it is possible to carry out a variety of different types of geographical analysis that may give clues to the aetiology of the disease. For such analyses it will usually also be necessary to have population data for the region covered by the registry and for separate sub-regions within it. In this paper we review types of analysis that may be applied to such data and give references to examples of applications and the statistical methods used. These include, first, methods of presenting incidence rates, and particularly the use of maps; of particular concern is the development of methods for presenting data that take into account the problems of rates calculated for small populations and which may therefore happen to be high or low simply by chance. Secondly, we consider, the analysis of "clustering" and "clusters" of cases of disease. These problems have been the subject of considerable methodological development in recent years. Analyses of clustering address the question of whether there is a general tendency for there to be aggregations of cases or areas of high incidence the analysis of clusters is concerned with problems of detecting specific locations where there are unusual aggregations of cases. The third type of problem considered here is whether there are, within the registry region, aetiological factors that vary geographically with consequent variations in disease incidence in different sub-regions. Where there is geographical variation it may be possible to use regression analysis to relate such variation to factors such as socio-economic status or levels of some environmental hazard. Finally we consider the problem of determining whether disease rates in certain areas may be related to distance from the source of some potential causative agent.
Wireless Sensor Networks (WSNs) can be defined as a cluster of sensors with a restricted power supply deployed in a specific area to gather environmental data. One of the most challenging areas of research is to design energy-efficient data gathering algorithms in large-scale WSNs, as each sensor node, in general, has limited energy resources. Literature review shows that with regards to energy saving, clustering-based techniques for data gathering are quite effective. Moreover, cluster head (CH) optimization is a non-deterministic polynomial (NP) hard problem. Both the lifespan of the network and its energy efficiency are improved by choosing the optimal path in routing. The technique put forth in this paper is based on multi swarm optimization (MSO) (i.e., multi-PSO) together with Tabu search (TS) techniques. Efficient CHs are chosen by the proposed system, which increases the optimization of routing and life of the network. The obtained results show that the MSO-Tabu approach has a 14%, 5%, 11%, and 4% higher number of clusters and a 20%, 6%, 14%, and 6% lesser average packet loss rate as compared to a genetic algorithm (GA), differential evolution (DE), Tabu, and MSO based clustering, respectively. Moreover, the MSO-Tabu approach has 136%, 36%, 136%, and 38% higher lifetime computation, and 22%, 16%, 51%, and 12% higher average dissipated energy. Thus, the study's outcome shows that the proposed MSO-Tabu is efficient, as it enhances the number of clusters formed, average energy dissipated, lifetime computation, and there is a decrease in mean packet loss and end-to-end delay.
- Keywords
- cluster head (CH), energy consumption, metaheuristics, particle swarm optimization (PSO), wireless energy transfer,
- Publication type
- Journal Article MeSH
In recent years, most countries around the world have struggled with the consequences of budget cuts in health expenditure, obliging them to utilize their resources efficiently. In this context, performance evaluation facilitates the decision-making process in improving the efficiency of the healthcare system. However, the performance evaluation of many sectors, including the healthcare systems, is, on the one hand, a challenging issue and on the other hand a useful tool for decision- making with the aim of optimizing the use of resources. This study proposes a new methodology comprising two well-known analytical approaches: (i) data envelopment analysis (DEA) to measure the efficiencies and (ii) data science to complement the DEA model in providing insightful recommendations for strategic decision making on productivity enhancement. The suggested method is a first attempt to combine two DEA extensions: flexible measure and cross-efficiency. We develop a pair of benevolent and aggressive scenarios aiming at evaluating cross-efficiency in the presence of flexible measures. Next, we perform data mining cluster analysis to create groups of homogeneous countries. Organizing the data in similar groups facilitates identifying a set of benchmarks that perform similarly in terms of operating conditions. Comparing the benchmark set with poorly performing countries we can obtain attainable goals for performance enhancement which will assist policymakers to strategically act upon it. A case study of healthcare systems in 120 countries is taken as an example to illustrate the potential application of our new method.
- Keywords
- Clustering, Cross-efficiency, Data envelopment analysis, Data science, Flexible measure, Healthcare,
- MeSH
- Resource Allocation methods MeSH
- Global Health MeSH
- Efficiency, Organizational * MeSH
- Humans MeSH
- Delivery of Health Care * methods organization & administration MeSH
- Decision Making MeSH
- Cluster Analysis MeSH
- Models, Statistical * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH