The visual evaluation of data derived from screening and optimization experiments in the development of new analytical methods poses a considerable time investment and introduces the risk of subjectivity. This study presents a novel approach to processing such data, based on factor analysis of mixed data and hierarchical clustering - multivariate techniques implemented in the R programming language. The methodology is demonstrated in the early-stage screening and optimization of the chromatographic separation of 15 structurally diverse drugs that affect the central nervous system, using a custom R Language script. The presented explorative approach enabled the identification of key parameters affecting the separation and significantly reduced the time required to evaluate the comprehensive dataset from the screening experiments. Based on the data analysis results, the optimal combination of stationary phase and mobile phase composition was selected, considering retention, overall resolution, and peak shape of compounds. Additionally, compounds vulnerable to changes in selected chromatographic conditions were identified. As a complement to the presented R Language script, a web-based application ChromaFAMDeX has been developed to offer an intuitive interface that enhances the accessibility of the used statistical methods. Accompanying the publication, the R script and the link to the standalone application are provided, enabling replication and adaptation of the methodology.
- Keywords
- Factor analysis of mixed data, Hierarchical clustering, Liquid chromatography, Optimization, R Language,
- MeSH
- Chromatography, Liquid methods MeSH
- Multivariate Analysis MeSH
- Programming Languages * MeSH
- Software * MeSH
- Chromatography, High Pressure Liquid methods MeSH
- Publication type
- Journal Article MeSH
Despite being information rich, the vast majority of untargeted mass spectrometry data are underutilized; most analytes are not used for downstream interpretation or reanalysis after publication. The inability to dive into these rich raw mass spectrometry datasets is due to the limited flexibility and scalability of existing software tools. Here we introduce a new language, the Mass Spectrometry Query Language (MassQL), and an accompanying software ecosystem that addresses these issues by enabling the community to directly query mass spectrometry data with an expressive set of user-defined mass spectrometry patterns. Illustrated by real-world examples, MassQL provides a data-driven definition of chemical diversity by enabling the reanalysis of all public untargeted metabolomics data, empowering scientists across many disciplines to make new discoveries. MassQL has been widely implemented in multiple open-source and commercial mass spectrometry analysis tools, which enhances the ability, interoperability and reproducibility of mining of mass spectrometry data for the research community.
- MeSH
- Data Mining * methods MeSH
- Mass Spectrometry * methods MeSH
- Humans MeSH
- Metabolomics * methods MeSH
- Programming Languages * MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
This paper presents the domain of information sciences, applied informatics and biomedical engineering, proposing to develop methods for an automated detection of similarities between two particular virtual learning environments - virtual patients at Akutne.cz and the OPTIMED curriculum management system - in order to provide support to clinically oriented stages of medical and healthcare studies. For this purpose, the authors used large amounts of text-based data collected by the system for mapping medical curricula and through the system for virtual patient authoring and delivery. The proposed text-mining algorithm for an automated detection of links between content entities of these systems has been successfully implemented by the means of a web-based toolbox.
- Keywords
- OPTIMED, R programming language, akutne.cz, medical curriculum, text similarity, virtual patient,
- MeSH
- Algorithms MeSH
- Curriculum * MeSH
- Humans MeSH
- Patient Simulation * MeSH
- Software * MeSH
- Education, Medical * MeSH
- Learning MeSH
- Virtual Reality MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Citizen science projects store an enormous amount of information about species distribution, diversity and characteristics. Researchers are now beginning to make use of this rich collection of data. However, access to these databases is not always straightforward. Apart from the largest and international projects, citizen science repositories often lack specific Application Programming Interfaces (APIs) to connect them to the scientific environments. Thus, it is necessary to develop simple routines to allow researchers to take advantage of the information collected by smaller citizen science projects, for instance, programming specific packages to connect them to popular scientific environments (like R). Here, we present rAvis, an R-package to connect R-users with Proyecto AVIS (http://proyectoavis.com), a Spanish citizen science project with more than 82,000 bird observation records. We develop several functions to explore the database, to plot the geographic distribution of the species occurrences, and to generate personal queries to the database about species occurrences (number of individuals, distribution, etc.) and birdwatcher observations (number of species recorded by each collaborator, UTMs visited, etc.). This new R-package will allow scientists to access this database and to exploit the information generated by Spanish birdwatchers over the last 40 years.
- MeSH
- Biodiversity MeSH
- Time Factors MeSH
- Databases, Factual * MeSH
- Volunteers MeSH
- Ecology methods MeSH
- Internet MeSH
- Population Dynamics MeSH
- Programming Languages MeSH
- Birds * MeSH
- Software * MeSH
- Computational Biology methods MeSH
- Research Design MeSH
- Geography MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Spain MeSH
BACKGROUND: The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is required for the rational analysis of massive data generated by next-generation sequencing. RESULTS: Here we introduce tcR, a new R package, representing a platform for the advanced analysis of T cell receptor repertoires, which includes diversity measures, shared T cell receptor sequences identification, gene usage statistics computation and other widely used methods. The tool has proven its utility in recent research studies. CONCLUSIONS: tcR is an R package for the advanced analysis of T cell receptor repertoires after primary TR sequences extraction from raw sequencing reads. The stable version can be directly installed from The Comprehensive R Archive Network ( http://cran.r-project.org/mirrors.html ). The source code and development version are available at tcR GitHub ( http://imminfo.github.io/tcr/ ) along with the full documentation and typical usage examples.
- MeSH
- Immunoglobulins genetics MeSH
- Humans MeSH
- Programming Languages MeSH
- Receptors, Antigen, T-Cell genetics immunology MeSH
- Sequence Analysis, DNA methods MeSH
- Software * MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Immunoglobulins MeSH
- Receptors, Antigen, T-Cell MeSH
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
- MeSH
- Algorithms * MeSH
- Models, Biological MeSH
- History, 20th Century MeSH
- History, 21st Century MeSH
- Linear Models MeSH
- Nonlinear Dynamics MeSH
- Computer Simulation MeSH
- Signal Processing, Computer-Assisted MeSH
- Programming Languages * MeSH
- Software * MeSH
- Computational Biology history methods MeSH
- Check Tag
- History, 20th Century MeSH
- History, 21st Century MeSH
- Publication type
- Journal Article MeSH
- Historical Article MeSH
- Review MeSH
Intact (whole) cell MALDI TOF mass spectrometry is a commonly used tool in clinical microbiology for several decades. Recently it was introduced to analysis of eukaryotic cells, including cancer and stem cells. Besides targeted metabolomic and proteomic applications, the intact cell MALDI TOF mass spectrometry provides a sufficient sensitivity and specificity to discriminate cell types, isogenous cell lines or even the metabolic states. This makes the intact cell MALDI TOF mass spectrometry a promising tool for quality control in advanced cell cultures with a potential to reveal batch-to-batch variation, aberrant clones, or unwanted shifts in cell phenotype. However, cellular alterations induced by change in expression of a single gene has not been addressed by intact cell mass spectrometry yet. In this work we used a well-characterized human ovarian cancer cell line SKOV3 with silenced expression of a tumor suppressor candidate 3 gene (TUSC3). TUSC3 is involved in co-translational N-glycosylation of proteins with well-known global impact on cell phenotype. Altogether, this experimental design represents a highly suitable model for optimization of intact cell mass spectrometry and analysis of spectral data. Here we investigated five machine learning algorithms (k-nearest neighbors, decision tree, random forest, partial least squares discrimination, and artificial neural network) and optimized their performance either in pure populations or in two-component mixtures composed of cells with normal or silenced expression of TUSC3. All five algorithms reached accuracy over 90 % and were able to reveal even subtle changes in mass spectra corresponding to alterations of TUSC3 expression. In summary, we demonstrate that spectral fingerprints generated by intact cell MALDI-TOF mass spectrometry coupled to a machine learning classifier can reveal minute changes induced by alteration of a single gene, and therefore contribute to the portfolio of quality control applications in routine cell and tissue cultures.
- Keywords
- Bioinformatics, Biotyping, Cell culture, Intact cell MALDI TOF MS, Machine learning, Quality control, R programming language, TUSC3,
- Publication type
- Journal Article MeSH
MOTIVATION: Genome analysis has become one of the most important tools for understanding the complex process of cancerogenesis. With increasing resolution of CGH arrays, the demand for computationally efficient algorithms arises, which are effective in the detection of aberrations even in very noisy data. RESULTS: We developed a rather simple, non-parametric technique of high computational efficiency for CGH array analysis that adopts a median absolute deviation concept for breakpoint detection, comprising median smoothing for pre-processing. The resulting algorithm has the potential to outperform any single smoothing approach as well as several recently proposed segmentation techniques. We show its performance through the application of simulated and real datasets in comparison to three other methods for array CGH analysis. IMPLEMENTATION: Our approach is implemented in the R-language and environment for statistical computing (version 2.6.1 for Windows, R-project, 2007). The code is available at: http://www.iba.muni.cz/~budinska/msmad.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Data visualization is a pivotal component of a structural biologist's arsenal. The Mol* Viewer makes molecular visualizations available to broader audiences via most web browsers. While Mol* provides a wide range of functionality, it has a steep learning curve and is only available via a JavaScript interface. To enhance the accessibility and usability of web-based molecular visualization, we introduce MolViewSpec (molstar.org/mol-view-spec), a standardized approach for defining molecular visualizations that decouples the definition of complex molecular scenes from their rendering. Scene definition can include references to commonly used structural, volumetric, and annotation data formats together with a description of how the data should be visualized and paired with optional annotations specifying colors, labels, measurements, and custom 3D geometries. Developed as an open standard, this solution paves the way for broader interoperability and support across different programming languages and molecular viewers, enabling more streamlined, standardized, and reproducible visual molecular analyses. MolViewSpec is freely available as a Mol* extension and a standalone Python package.
- MeSH
- Internet MeSH
- Computer Graphics * MeSH
- Software * MeSH
- User-Computer Interface MeSH
- Publication type
- Journal Article MeSH
Community efforts in the computational molecular sciences (CMS) are evolving toward modular, open, and interoperable interfaces that work with existing community codes to provide more functionality and composability than could be achieved with a single program. The Quantum Chemistry Common Driver and Databases (QCDB) project provides such capability through an application programming interface (API) that facilitates interoperability across multiple quantum chemistry software packages. In tandem with the Molecular Sciences Software Institute and their Quantum Chemistry Archive ecosystem, the unique functionalities of several CMS programs are integrated, including CFOUR, GAMESS, NWChem, OpenMM, Psi4, Qcore, TeraChem, and Turbomole, to provide common computational functions, i.e., energy, gradient, and Hessian computations as well as molecular properties such as atomic charges and vibrational frequency analysis. Both standard users and power users benefit from adopting these APIs as they lower the language barrier of input styles and enable a standard layout of variables and data. These designs allow end-to-end interoperable programming of complex computations and provide best practices options by default.
- Publication type
- Journal Article MeSH