ensemble sampling
Dotaz
Zobrazit nápovědu
Guanine quadruplexes (G4s) are non-canonical nucleic acids structures common in important genomic regions. Parallel-stranded G4 folds are the most abundant, but their folding mechanism is not fully understood. Recent research highlighted that G4 DNA molecules fold via kinetic partitioning mechanism dominated by competition amongst diverse long-living G4 folds. The role of other intermediate species such as parallel G-triplexes and G-hairpins in the folding process has been a matter of debate. Here, we use standard and enhanced-sampling molecular dynamics simulations (total length of ∼0.9 ms) to study these potential folding intermediates. We suggest that parallel G-triplex per se is rather an unstable species that is in local equilibrium with a broad ensemble of triplex-like structures. The equilibrium is shifted to well-structured G-triplex by stacked aromatic ligand and to a lesser extent by flanking duplexes or nucleotides. Next, we study propeller loop formation in GGGAGGGAGGG, GGGAGGG and GGGTTAGGG sequences. We identify multiple folding pathways from different unfolded and misfolded structures leading towards an ensemble of intermediates called cross-like structures (cross-hairpins), thus providing atomistic level of description of the single-molecule folding events. In summary, the parallel G-triplex is a possible, but not mandatory short-living (transitory) intermediate in the folding of parallel-stranded G4.
- MeSH
- DNA chemie genetika metabolismus MeSH
- G-kvadruplexy * MeSH
- guanin chemie metabolismus MeSH
- jednovláknová DNA chemie genetika metabolismus MeSH
- kinetika MeSH
- konformace nukleové kyseliny * MeSH
- lidé MeSH
- sekvence nukleotidů MeSH
- simulace molekulární dynamiky * MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
This work extends the multi-scale computational scheme for the quantum mechanics (QM) calculations of Nuclear Magnetic Resonance (NMR) chemical shifts (CSs) in proteins that lack a well-defined 3D structure. The scheme couples the sampling of an intrinsically disordered protein (IDP) by classical molecular dynamics (MD) with protein fragmentation using the adjustable density matrix assembler (ADMA) and density functional theory (DFT) calculations. In contrast to our early investigation on IDPs (Pavlíková Přecechtělová et al., J. Chem. Theory Comput., 2019, 15, 5642-5658) and the state-of-the art NMR calculations for structured proteins, a partial re-optimization was implemented on the raw MD geometries in vibrational normal mode coordinates to enhance the accuracy of the MD/ADMA/DFT computational scheme. In addition, machine-learning based cluster analysis was performed on the scheme to explore its potential in producing protein structure ensembles (CLUSTER ensembles) that yield accurate CSs at a reduced computational cost. The performance of the cluster-based calculations is validated against results obtained with conventional structural ensembles consisting of MD snapshots extracted from the MD trajectory at regular time intervals (REGULAR ensembles). CS calculations performed with the refined MD/ADMA/DFT framework employed the 6-311++G(d,p) basis set that outperformed IGLO-III calculations with the same density functional approximation (B3LYP) and both explicit and implicit solvation. The partial geometry optimization did not universally improve the agreement of computed CSs with the experiment but substantially decreased errors associated with the ensemble averaging. A CLUSTER ensemble with 50 structures yielded ensemble averages close to those obtained with a REGULAR ensemble consisting of 500 MD frames. The cluster based calculations thus required only a fraction of the computational time.
We have carried out an extended set of standard and enhanced-sampling MD simulations (for a cumulative simulation time of 620 μs) with the aim to study folding landscapes of the rGGGUUAGGG and rGGGAGGG parallel G-hairpins (PH) with propeller loop. We identify folding and unfolding pathways of the PH, which is bridged with the unfolded state via an ensemble of cross-like structures (CS) possessing mutually tilted or perpendicular G-strands interacting via guanine-guanine H-bonding. The oligonucleotides reach the PH conformation from the unfolded state via a conformational diffusion through the folding landscape, i.e. as a series of rearrangements of the H-bond interactions starting from compacted anti-parallel hairpin-like structures. Although isolated PHs do not appear to be thermodynamically stable we suggest that CS and PH-types of structures are sufficiently populated during RNA guanine quadruplex (GQ) folding within the context of complete GQ-forming sequences. These structures may participate in compact coil-like ensembles that involve all four G-strands and already some bound ions. Such ensembles can then rearrange into the fully folded parallel GQs via conformational diffusion. We propose that the basic atomistic folding mechanism of propeller loops suggested in this work may be common for their formation in RNA and DNA GQs.
The integration of complementary molecular methods (including X-ray crystallography, NMR spectroscopy, small angle X-ray/neutron scattering, and computational techniques) is frequently required to obtain a comprehensive understanding of dynamic macromolecular complexes. In particular, these techniques are critical for studying intrinsically disordered protein regions (IDRs) or intrinsically disordered proteins (IDPs) that are part of large protein:protein complexes. Here, we explain how to prepare IDP samples suitable for study using NMR spectroscopy, and describe a novel SAXS modeling method (ensemble refinement of SAXS; EROS) that integrates the results from complementary methods, including crystal structures and NMR chemical shift perturbations, among others, to accurately model SAXS data and describe ensemble structures of dynamic macromolecular complexes.
- MeSH
- endozomální třídící komplexy pro transport chemie metabolismus MeSH
- konformace proteinů MeSH
- krystalografie rentgenová metody MeSH
- lidé MeSH
- magnetická rezonanční spektroskopie metody MeSH
- mitogenem aktivované proteinkinasy chemie metabolismus MeSH
- molekulární modely MeSH
- radiační rozptyl * MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
Energy relaxation in light-harvesting complexes has been extensively studied by various ultrafast spectroscopic techniques, the fastest processes being in the sub-100-fs range. At the same time, much slower dynamics have been observed in individual complexes by single-molecule fluorescence spectroscopy (SMS). In this work, we use a pump-probe-type SMS technique to observe the ultrafast energy relaxation in single light-harvesting complexes LH2 of purple bacteria. After excitation at 800 nm, the measured relaxation time distribution of multiple complexes has a peak at 95 fs and is asymmetric, with a tail at slower relaxation times. When tuning the excitation wavelength, the distribution changes in both its shape and position. The observed behavior agrees with what is to be expected from the LH2 excited states structure. As we show by a Redfield theory calculation of the relaxation times, the distribution shape corresponds to the expected effect of Gaussian disorder of the pigment transition energies. By repeatedly measuring few individual complexes for minutes, we find that complexes sample the relaxation time distribution on a timescale of seconds. Furthermore, by comparing the distribution from a single long-lived complex with the whole ensemble, we demonstrate that, regarding the relaxation times, the ensemble can be considered ergodic. Our findings thus agree with the commonly used notion of an ensemble of identical LH2 complexes experiencing slow random fluctuations.
- MeSH
- bakteriochlorofyly chemie účinky záření MeSH
- čas MeSH
- fluorescenční spektrometrie metody MeSH
- konfokální mikroskopie MeSH
- lasery MeSH
- neparametrická statistika MeSH
- normální rozdělení MeSH
- přenos energie * MeSH
- Rhodopseudomonas chemie MeSH
- světlo MeSH
- světlosběrné proteinové komplexy chemie účinky záření MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- srovnávací studie MeSH
Various host-guest peptide series are used by experimentalists as reference conformational states. One such use is as a baseline for random-coil NMR chemical shifts. Comparison to this random-coil baseline, through secondary chemical shifts, is used to infer protein secondary structure. The use of these random-coil data sets rests on the perception that the reference chemical shifts arise from states where there is little or no conformational bias. However, there is growing evidence that the conformational composition of natively and nonnatively unfolded proteins fail to approach anything that can be construed as random coil. Here, we use molecular dynamics simulations of an alanine-based host-guest peptide series (AAXAA) as a model of unfolded and denatured states to examine the intrinsic propensities of the amino acids. We produced ensembles that are in good agreement with the experimental NMR chemical shifts and confirm that the sampling of the 20 natural amino acids in this peptide series is be far from random. Preferences toward certain regions of conformational space were both present and dependent upon the environment when compared under conditions typically used to denature proteins, i.e., thermal and chemical denaturation. Moreover, the simulations allowed us to examine the conformational makeup of the underlying ensembles giving rise to the ensemble-averaged chemical shifts. We present these data as an intrinsic backbone propensity library that forms part of our Structural Library of Intrinsic Residue Propensities to inform model building, to aid in interpretation of experiment, and for structure prediction of natively and nonnatively unfolded states.
PURPOSE: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity. METHODS: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity. RESULTS: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement. CONCLUSION: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
- Publikační typ
- časopisecké články MeSH
Random Forest is an ensemble of decision trees based on the bagging and random subspace concepts. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches known as oblique and rotation double random forests. In the first approach, we propose rotation based double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique double random forest. The oblique double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques (Tikhonov regularization, axis-parallel split regularization, Null space regularization) are employed for tackling the small sample size problems in the decision trees of oblique double random forest. The proposed ensembles of decision trees produce trees with bigger size compared to the standard ensembles of decision trees as bagging is used at each non-leaf node which results in improved performance. The evaluation of the baseline models and the proposed oblique and rotation double random forest models is performed on benchmark 121 UCI datasets and real-world fisheries datasets. Both statistical analysis and the experimental results demonstrate the efficacy of the proposed oblique and rotation double random forest models compared to the baseline models on the benchmark datasets.
- MeSH
- algoritmy * MeSH
- analýza hlavních komponent MeSH
- rotace MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
The heme-based oxygen sensor histidine kinase AfGcHK is part of a two-component signal transduction system in bacteria. O2 binding to the Fe(II) heme complex of its N-terminal globin domain strongly stimulates autophosphorylation at His183 in its C-terminal kinase domain. The 6-coordinate heme Fe(III)-OH- and -CN- complexes of AfGcHK are also active, but the 5-coordinate heme Fe(II) complex and the heme-free apo-form are inactive. Here, we determined the crystal structures of the isolated dimeric globin domains of the active Fe(III)-CN- and inactive 5-coordinate Fe(II) forms, revealing striking structural differences on the heme-proximal side of the globin domain. Using hydrogen/deuterium exchange coupled with mass spectrometry to characterize the conformations of the active and inactive forms of full-length AfGcHK in solution, we investigated the intramolecular signal transduction mechanisms. Major differences between the active and inactive forms were observed on the heme-proximal side (helix H5), at the dimerization interface (helices H6 and H7 and loop L7) of the globin domain and in the ATP-binding site (helices H9 and H11) of the kinase domain. Moreover, separation of the sensor and kinase domains, which deactivates catalysis, increased the solvent exposure of the globin domain-dimerization interface (helix H6) as well as the flexibility and solvent exposure of helix H11. Together, these results suggest that structural changes at the heme-proximal side, the globin domain-dimerization interface, and the ATP-binding site are important in the signal transduction mechanism of AfGcHK. We conclude that AfGcHK functions as an ensemble of molecules sampling at least two conformational states.
- MeSH
- bakteriální proteiny chemie metabolismus MeSH
- fosforylace MeSH
- hem chemie MeSH
- histidinkinasa chemie metabolismus MeSH
- hmotnostní spektrometrie MeSH
- krystalografie rentgenová MeSH
- kvarterní struktura proteinů MeSH
- kyslík metabolismus MeSH
- molekulární modely MeSH
- Myxococcales metabolismus MeSH
- oxidace-redukce MeSH
- proteinové domény MeSH
- signální transdukce MeSH
- vodík-deuteriová výměna MeSH
- železité sloučeniny chemie MeSH
- železnaté sloučeniny chemie MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
OBJECTIVE: We present the PaHaW Parkinson's disease handwriting database, consisting of handwriting samples from Parkinson's disease (PD) patients and healthy controls. Our goal is to show that kinematic features and pressure features in handwriting can be used for the differential diagnosis of PD. METHODS AND MATERIAL: The database contains records from 37 PD patients and 38 healthy controls performing eight different handwriting tasks. The tasks include drawing an Archimedean spiral, repetitively writing orthographically simple syllables and words, and writing of a sentence. In addition to the conventional kinematic features related to the dynamics of handwriting, we investigated new pressure features based on the pressure exerted on the writing surface. To discriminate between PD patients and healthy subjects, three different classifiers were compared: K-nearest neighbors (K-NN), ensemble AdaBoost classifier, and support vector machines (SVM). RESULTS: For predicting PD based on kinematic and pressure features of handwriting, the best performing model was SVM with classification accuracy of Pacc=81.3% (sensitivity Psen=87.4% and specificity of Pspe=80.9%). When evaluated separately, pressure features proved to be relevant for PD diagnosis, yielding Pacc=82.5% compared to Pacc=75.4% using kinematic features. CONCLUSION: Experimental results showed that an analysis of kinematic and pressure features during handwriting can help assess subtle characteristics of handwriting and discriminate between PD patients and healthy controls.
- MeSH
- biomechanika * MeSH
- diferenciální diagnóza MeSH
- lidé středního věku MeSH
- lidé MeSH
- Parkinsonova nemoc diagnóza MeSH
- psaní rukou * MeSH
- senioři MeSH
- studie případů a kontrol MeSH
- support vector machine MeSH
- tlak MeSH
- Check Tag
- lidé středního věku MeSH
- lidé MeSH
- senioři MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH