Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.
Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts' enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.
- MeSH
- Catalysis * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
It is now possible to generate large volumes of high-quality images of biomolecules at near-atomic resolution and in near-native states using cryogenic electron microscopy/electron tomography (Cryo-EM/ET). However, the precise annotation of structures like filaments and membranes remains a major barrier towards applying these methods in high-throughput. To address this, we present TARDIS (Transformer-based Rapid Dimensionless Instance Segmentation), a machine-learning framework for fast and accurate annotation of micrographs and tomograms. TARDIS combines deep learning for semantic segmentation with a novel geometric model for precise instance segmentation of various macromolecules. We develop pre-trained models within TARDIS for segmenting microtubules and membranes, demonstrating high accuracy across multiple modalities and resolutions, enabling segmentation of over 13,000 tomograms from the CZI Cryo-Electron Tomography data portal. As a modular framework, TARDIS can be extended to new structures and imaging modalities with minimal modification. TARDIS is open-source and freely available at https://github.com/SMLC-NYSBC/TARDIS, and accelerates analysis of high-resolution biomolecular structural imaging data.
- Keywords
- CNN, Cryo-EM/ET, DIST, Filaments, Instance Segmentation, Membranes, Microtubules, Point Cloud, Segmentation, Semantic Segmentation, TARDIS, TEM EM/ET,
- Publication type
- Journal Article MeSH
- Preprint MeSH
As a prevalent neurodevelopmental disease, attention-deficit hyperactivity disorder (ADHD) impairs the learning and memory capacity, and so far, there has been no available treatment option for long-term efficacy. Alterations in gene regulation and synapse-related proteins influence learning and memory capacity; nevertheless, the regulatory mechanism of synapse-related protein synthesis is still unclear in ADHD. LncRNAs have been found participating in regulating genes in multiple disorders. For instance, lncRNA Metastasis Associated Lung Adenocarcinoma Transcript 1 (MALAT1) has an essential regulatory function in numerous psychiatric diseases. However, how MALAT1 influences synapse-related protein synthesis in ADHD remains largely unknown. Here, our study found that MALAT1 decreased in the hippocampus tissue of spontaneously hypertensive rats (SHRs) compared to the standard controls, Wistar Kyoto (WKY) rats. Subsequent experiments revealed that MALAT1 enhanced the expression of neurexin 1 (NRXN1), which promoted the synapse-related genes (SYN1, PSD95, and GAP43) expression. Then, the bioinformatic analyses predicted that miR-141-3p and miR-200a-3p, microRNAs belonging to miR-200 family and sharing same seed sequence, could interact with MALAT1 and NRXN1 mRNA, which were further confirmed by luciferase report assays. Finally, rescue experiments indicated that MALAT1 influenced the expression of NRXN1 by sponging miR-141-3p/200a-3p. All data verified our hypothesis that MALAT1 regulated synapse-related proteins (SYN1, PSD95, and GAP43) through the MALAT1-miR-141-3p/200a-3p-NRXN1 axis in ADHD. Our research underscored a novel role of MALAT1 in the pathogenesis of impaired learning and memory capacity in ADHD and may shed more light on developing diagnostic biomarkers and more effective therapeutic interventions for individuals with ADHD.
- MeSH
- Attention Deficit Disorder with Hyperactivity * genetics MeSH
- Rats MeSH
- MicroRNAs * genetics metabolism MeSH
- Rats, Inbred WKY MeSH
- Gene Expression Regulation MeSH
- RNA, Long Noncoding * genetics metabolism MeSH
- Animals MeSH
- Check Tag
- Rats MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- MicroRNAs * MeSH
- Mirn141 microRNA, rat MeSH Browser
- RNA, Long Noncoding * MeSH
Feature selection is a critical component of machine learning and data mining which addresses challenges like irrelevance, noise, redundancy in large-scale data etc., which often result in the curse of dimensionality. This study employs a K-nearest neighbour wrapper to implement feature selection using six nature-inspired algorithms, derived from human behaviour and mammal-inspired techniques. Evaluated on six real-world datasets, the study aims to compare the performance of these algorithms in terms of accuracy, feature count, fitness, convergence and computational cost. The findings underscore the efficacy of the Human Learning Optimization, Poor and Rich Optimization and Grey Wolf Optimizer algorithms across multiple performance metrics. For instance, for mean fitness, Human Learning Optimization outperforms the others, followed by Poor and Rich Optimization and Harmony Search. The study suggests the potential of human-inspired algorithms, particularly Poor and Rich Optimization, in robust feature selection without compromising classification accuracy.
- Keywords
- Algorithms, Feature reduction, KNN, Metaheuristics, Non-traditional algorithms, Optimization,
- Publication type
- Journal Article MeSH
BACKGROUND: Because of its non-destructive nature, label-free imaging is an important strategy for studying biological processes. However, routine microscopic techniques like phase contrast or DIC suffer from shadow-cast artifacts making automatic segmentation challenging. The aim of this study was to compare the segmentation efficacy of published steps of segmentation work-flow (image reconstruction, foreground segmentation, cell detection (seed-point extraction) and cell (instance) segmentation) on a dataset of the same cells from multiple contrast microscopic modalities. RESULTS: We built a collection of routines aimed at image segmentation of viable adherent cells grown on the culture dish acquired by phase contrast, differential interference contrast, Hoffman modulation contrast and quantitative phase imaging, and we performed a comprehensive comparison of available segmentation methods applicable for label-free data. We demonstrated that it is crucial to perform the image reconstruction step, enabling the use of segmentation methods originally not applicable on label-free images. Further we compared foreground segmentation methods (thresholding, feature-extraction, level-set, graph-cut, learning-based), seed-point extraction methods (Laplacian of Gaussians, radial symmetry and distance transform, iterative radial voting, maximally stable extremal region and learning-based) and single cell segmentation methods. We validated suitable set of methods for each microscopy modality and published them online. CONCLUSIONS: We demonstrate that image reconstruction step allows the use of segmentation methods not originally intended for label-free imaging. In addition to the comprehensive comparison of methods, raw and reconstructed annotated data and Matlab codes are provided.
- Keywords
- Cell segmentation, Differential contrast image, Image reconstruction, Laplacian of Gaussians, Methods comparison, Microscopy, Quantitative phase imaging,
- MeSH
- Algorithms MeSH
- Cell Fractionation methods MeSH
- Humans MeSH
- Microscopy methods MeSH
- Image Processing, Computer-Assisted MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Review MeSH