Despite being information rich, the vast majority of untargeted mass spectrometry data are underutilized; most analytes are not used for downstream interpretation or reanalysis after publication. The inability to dive into these rich raw mass spectrometry datasets is due to the limited flexibility and scalability of existing software tools. Here we introduce a new language, the Mass Spectrometry Query Language (MassQL), and an accompanying software ecosystem that addresses these issues by enabling the community to directly query mass spectrometry data with an expressive set of user-defined mass spectrometry patterns. Illustrated by real-world examples, MassQL provides a data-driven definition of chemical diversity by enabling the reanalysis of all public untargeted metabolomics data, empowering scientists across many disciplines to make new discoveries. MassQL has been widely implemented in multiple open-source and commercial mass spectrometry analysis tools, which enhances the ability, interoperability and reproducibility of mining of mass spectrometry data for the research community.
- MeSH
- data mining * metody MeSH
- hmotnostní spektrometrie * metody MeSH
- lidé MeSH
- metabolomika * metody MeSH
- programovací jazyk * MeSH
- software * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Online weight loss information is commonly sought by internet users, and it may impact their health decisions and behaviors. Previous studies examined a limited number of Google search queries and relied on manual approaches to retrieve online weight loss websites. OBJECTIVE: This study aimed to identify and describe the characteristics of the top weight loss websites on Google. METHODS: This study gathered 432 Google search queries collected from Google autocomplete suggestions, "People Also Ask" featured questions, and Google Trends data. A data-mining software tool was developed to retrieve the search results automatically, setting English and the United States as the default criteria for language and location, respectively. Domain classification and evaluation technologies were used to categorize the websites according to their content and determine their risk of cyberattack. In addition, the top 5 most frequent websites in nonadvertising (ie, nonsponsored) search results were inspected for quality. RESULTS: The results revealed that the top 5 nonadvertising websites were healthline.com, webmd.com, verywellfit.com, mayoclinic.org, and womenshealthmag.com. All provided accuracy statements and author credentials. The domain categorization taxonomy yielded a total of 101 unique categories. After grouping the websites that appeared less than 5 times, the most frequent categories involved "Health" (104/623, 16.69%), "Personal Pages and Blogs" (91/623, 14.61%), "Nutrition and Diet" (48/623, 7.7%), and "Exercise" (34/623, 5.46%). The risk of being a victim of a cyberattack was low. CONCLUSIONS: The findings suggested that while quality information is accessible, users may still encounter less reliable content among various online resources. Therefore, better tools and methods are needed to guide users toward trustworthy weight loss information.
- Klíčová slova
- Google, consumer health informatics, cyberattack risk, data mining, digital health, information seeking, internet search, online health information, website analysis, weight loss,
- MeSH
- data mining * metody MeSH
- hmotnostní úbytek * MeSH
- internet * MeSH
- lidé MeSH
- vyhledávač MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Spojené státy americké MeSH
Lipidomics and metabolomics communities comprise various informatics tools; however, software programs handling multimodal mass spectrometry (MS) data with structural annotations guided by the Lipidomics Standards Initiative are limited. Here, we provide MS-DIAL 5 for in-depth lipidome structural elucidation through electron-activated dissociation (EAD)-based tandem MS and determining their molecular localization through MS imaging (MSI) data using a species/tissue-specific lipidome database containing the predicted collision-cross section values. With the optimized EAD settings using 14 eV kinetic energy, the program correctly delineated lipid structures for 96.4% of authentic standards, among which 78.0% had the sn-, OH-, and/or C = C positions correctly assigned at concentrations exceeding 1 μM. We showcased our workflow by annotating the sn- and double-bond positions of eye-specific phosphatidylcholines containing very-long-chain polyunsaturated fatty acids (VLC-PUFAs), characterized as PC n-3-VLC-PUFA/FA. Using MSI data from the eye and n-3-VLC-PUFA-supplemented HeLa cells, we identified glycerol 3-phosphate acyltransferase as an enzyme candidate responsible for incorporating n-3 VLC-PUFAs into the sn1 position of phospholipids in mammalian cells, which was confirmed using EAD-MS/MS and recombinant proteins in a cell-free system. Therefore, the MS-DIAL 5 environment, combined with optimized MS data acquisition methods, facilitates a better understanding of lipid structures and their localization, offering insights into lipid biology.
- MeSH
- data mining * metody MeSH
- fosfatidylcholiny metabolismus chemie MeSH
- HeLa buňky MeSH
- hmotnostní spektrometrie metody MeSH
- lidé MeSH
- lipidomika * metody MeSH
- lipidy chemie analýza MeSH
- metabolomika metody MeSH
- nenasycené mastné kyseliny metabolismus chemie MeSH
- software MeSH
- tandemová hmotnostní spektrometrie metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- fosfatidylcholiny MeSH
- lipidy MeSH
- nenasycené mastné kyseliny MeSH
This work presents an automated data-mining model for age-at-death estimation based on 3D scans of the auricular surface of the pelvic bone. The study is based on a multi-population sample of 688 individuals (males and females) originating from one Asian and five European identified osteological collections. Our method requires no expert knowledge and achieves similar accuracy compared to traditional subjective methods. Apart from data acquisition, the whole procedure of pre-processing, feature extraction and age estimation is fully automated and implemented as a computer program. This program is a part of a freely available web-based software tool called CoxAGE3D. This software tool is available at https://coxage3d.fit.cvut.cz/ Our age-at-death estimation method is suitable for use on individuals with known/unknown population affinity and provides moderate correlation between the estimated age and actual age (Pearson's correlation coefficient is 0.56), and a mean absolute error of 12.4 years.
- Klíčová slova
- 3D surface analysis, Adult age-at-death estimation, Auricular surface, Automated analysis, Data mining methods,
- MeSH
- data mining MeSH
- faciální stigmatizace MeSH
- lidé MeSH
- obličej MeSH
- pánevní kosti * diagnostické zobrazování MeSH
- software MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone's lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. In this pre-Editorial paper, we provide an overview of open research issues and challenges for each of the topics addressed in this special issue. Potential authors can directly use this as a guideline for developing their paper.
- Klíčová slova
- Artificial Intelligence, Biotechnology, Deep Learning, Digital Transformation, Machine Learning,
- MeSH
- biotechnologie MeSH
- data mining MeSH
- ekosystém * MeSH
- umělá inteligence * MeSH
- znalostní báze MeSH
- Publikační typ
- úvodní články MeSH
BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.
- Klíčová slova
- Allele Catalog Pipeline, Allele Catalog Tool, Alleles in Gene, Data Visualization, Variant Calling Pipeline,
- MeSH
- alely * MeSH
- Arabidopsis * genetika MeSH
- data mining * metody MeSH
- datové soubory jako téma * MeSH
- frekvence genu MeSH
- genotyp MeSH
- Glycine max * genetika MeSH
- internet * MeSH
- kukuřice setá * genetika MeSH
- metadata MeSH
- mutace MeSH
- pigmentace genetika MeSH
- rostlinné geny genetika MeSH
- software * MeSH
- substituce aminokyselin MeSH
- vegetační klid genetika MeSH
- vizualizace dat MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- DOG1 protein, Arabidopsis MeSH Prohlížeč
Age-at-death estimation of adult skeletal remains is a key part of biological profile estimation, yet it remains problematic for several reasons. One of them may be the subjective nature of the evaluation of age-related changes, or the fact that the human eye is unable to detect all the relevant surface changes. We have several aims: (1) to validate already existing computer models for age estimation; (2) to propose our own expert system based on computational approaches to eliminate the factor of subjectivity and to use the full potential of surface changes on an articulation area; and (3) to determine what age range the pubic symphysis is useful for age estimation. A sample of 483 3D representations of the pubic symphyseal surfaces from the ossa coxae of adult individuals coming from four European (two from Portugal, one from Switzerland and Greece) and one Asian (Thailand) identified skeletal collections was used. A validation of published algorithms showed very high error in our dataset-the Mean Absolute Error (MAE) ranged from 16.2 and 25.1 years. Two completely new approaches were proposed in this paper: SASS (Simple Automated Symphyseal Surface-based) and AANNESS (Advanced Automated Neural Network-grounded Extended Symphyseal Surface-based), whose MAE values are 11.7 and 10.6 years, respectively. Lastly, it was demonstrated that our models could estimate the age-at-death using the pubic symphysis over the entire adult age range. The proposed models offer objective age estimates with low estimation error (compared to traditional visual methods) and are able to estimate age using the pubic symphysis across the entire adult age range.
- MeSH
- data mining MeSH
- dospělí MeSH
- lidé MeSH
- soudní antropologie metody MeSH
- symphysis pubica * MeSH
- určení kostního věku metody MeSH
- zobrazování trojrozměrné MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Human-nature relationships are an important aspect of leisure research. Previous studies also reported that nature-related activities have a health benefit. In this study, we surveyed US-American birdwatchers at two time points during the COVID pandemic (independent samples). During the beginning of the COVID pandemic in spring 2020, we analyzed their comments with an AI sentiment analysis. Approximately one year later (winter 2020/21), during the second wave, the study was repeated, and a second data set was analyzed. Here we show that during the ongoing pandemic, the sentiments became more negative. This is an important result because it shows that despite the positive impact of nature on mental health, the sentiments become more negative in the enduring pandemic.
- Klíčová slova
- COVID-19, birding, birdwatching, sentiment analysis,
- MeSH
- COVID-19 * MeSH
- lidé MeSH
- pandemie MeSH
- postoj MeSH
- postojová analýza MeSH
- SARS-CoV-2 MeSH
- sociální média * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Spojené státy americké epidemiologie MeSH
Developments in information technology have impacted on all areas of modern life and in particular facilitated the growth of globalisation in commerce and communication. Within the drugs area this means that both drugs discourse and drug markets have become increasingly digitally enabled. In response to this, new methods are being developed that attempt to research and monitor the digital environment. In this commentary we present three case studies of innovative approaches and related challenges to software-automated data mining of the digital environment: (i) an e-shop finder to detect e-shops offering new psychoactive substances, (ii) scraping of forum data from online discussion boards, (iii) automated sentiment analysis of discussions in online discussion boards. We conclude that the work presented brings opportunities in terms of leveraging data for developing a more timely and granular understanding of the various aspects of drug-use phenomena in the digital environment. In particular, combining the number of e-shops, discussion posts, and sentiments regarding particular substances could be used for ad hoc risk assessments as well as longitudinal drug monitoring and indicate "online popularity". The main challenges of digital data mining involve data representativity and ethical considerations.
- Klíčová slova
- Data mining, E-shops, Internet-based drug forums, Natural language processing, Online surveillance, Psychoactive substances, Text mining,
- MeSH
- data mining MeSH
- léčivé přípravky * MeSH
- lidé MeSH
- monitorování léčiv MeSH
- obchod MeSH
- poruchy spojené s užíváním psychoaktivních látek * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- léčivé přípravky * MeSH
BACKGROUND: The beginning of the coronavirus disease (COVID-19) epidemic dates back to December 31, 2019, when the first cases were reported in the People's Republic of China. In the Czech Republic, the first three cases of infection with the novel coronavirus were confirmed on March 1, 2020. The joint effort of state authorities and researchers gave rise to a unique team, which combines methodical knowledge of real-world processes with the know-how needed for effective processing, analysis, and online visualization of data. OBJECTIVE: Due to an urgent need for a tool that presents important reports based on valid data sources, a team of government experts and researchers focused on the design and development of a web app intended to provide a regularly updated overview of COVID-19 epidemiology in the Czech Republic to the general population. METHODS: The cross-industry standard process for data mining model was chosen for the complex solution of analytical processing and visualization of data that provides validated information on the COVID-19 epidemic across the Czech Republic. Great emphasis was put on the understanding and a correct implementation of all six steps (business understanding, data understanding, data preparation, modelling, evaluation, and deployment) needed in the process, including the infrastructure of a nationwide information system; the methodological setting of communication channels between all involved stakeholders; and data collection, processing, analysis, validation, and visualization. RESULTS: The web-based overview of the current spread of COVID-19 in the Czech Republic has been developed as an online platform providing a set of outputs in the form of tables, graphs, and maps intended for the general public. On March 12, 2020, the first version of the web portal, containing fourteen overviews divided into five topical sections, was released. The web portal's primary objective is to publish a well-arranged visualization and clear explanation of basic information consisting of the overall numbers of performed tests, confirmed cases of COVID-19, COVID-19-related deaths, the daily and cumulative overviews of people with a positive COVID-19 case, performed tests, location and country of infection of people with a positive COVID-19 case, hospitalizations of patients with COVID-19, and distribution of personal protective equipment. CONCLUSIONS: The online interactive overview of the current spread of COVID-19 in the Czech Republic was launched on March 11, 2020, and has immediately become the primary communication channel employed by the health care sector to present the current situation regarding the COVID-19 epidemic. This complex reporting of the COVID-19 epidemic in the Czech Republic also shows an effective way to interconnect knowledge held by various specialists, such as regional and national methodology experts (who report positive cases of the disease on a daily basis), with knowledge held by developers of central registries, analysts, developers of web apps, and leaders in the health care sector.
- Klíčová slova
- COVID-19, CRISP-DM, Czech Republic, app, coronavirus disease, data mining, epidemiological overview, epidemiology, health data, interactive reporting, modeling, public health, virus, web app,
- MeSH
- Betacoronavirus * MeSH
- COVID-19 MeSH
- data mining MeSH
- internet MeSH
- koronavirové infekce epidemiologie MeSH
- lidé MeSH
- pandemie MeSH
- SARS-CoV-2 MeSH
- software MeSH
- virová pneumonie epidemiologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Česká republika epidemiologie MeSH