Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
32714941
PubMed Central
PMC7344294
DOI
10.3389/fmolb.2020.00132
Knihovny.cz E-zdroje
- Klíčová slova
- Time-lagged Independent Component Analysis, dimensionality reduction, molecular dynamics, tSNE, trajectory analysis,
- Publikační typ
- časopisecké články MeSH
Molecular simulation trajectories represent high-dimensional data. Such data can be visualized by methods of dimensionality reduction. Non-linear dimensionality reduction methods are likely to be more efficient than linear ones due to the fact that motions of atoms are non-linear. Here we test a popular non-linear t-distributed Stochastic Neighbor Embedding (t-SNE) method on analysis of trajectories of 200 ns alanine dipeptide dynamics and 208 μs Trp-cage folding and unfolding. Furthermore, we introduce a time-lagged variant of t-SNE in order to focus on rarely occurring transitions in the molecular system. This time-lagged t-SNE efficiently separates states according to distance in time. Using this method it is possible to visualize key states of studied systems (e.g., unfolded and folded protein) as well as possible kinetic traps using a two-dimensional plot. Time-lagged t-SNE is a visualization method and other applications, such as clustering and free energy modeling, must be done with caution.
Department of Biochemistry and Microbiology University of Chemistry and Technology Prague Czechia
Department of Mathematics University of Chemistry and Technology Prague Czechia
Zobrazit více v PubMed
Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., et al. (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers SoftwareX 1–2, 19–25. 10.1016/j.softx.2015.06.001 DOI
Amadei A., Linssen A. B., Berendsen J. (1993). H. Essential dynamics of proteins. Prot. Struct. Funct. Bioinform. 17, 412–425. 10.1002/prot.340170408 PubMed DOI
Brown W. M., Martin S., Pollock S. N., Coutsias E. A., Watson J.-P. (2008). Algorithmic dimensionality reduction for molecular structure analysis. J. Chem. Phys. 129:064118. 10.1063/1.2968610 PubMed DOI PMC
Bussi G., Donadio D., Parrinello M. (2007). Canonical sampling through velocity-rescaling. J. Chem. Phys. 126:014101. 10.1063/1.2408420 PubMed DOI
Ceriotti M., Tribello G. A., Parrinello M. (2011). Simplifying the representation of complex free-energy landscapes using sketch-map. Proc. Natl. Acad. Sci. U.S.A. 108, 13023–13028. 10.1073/pnas.1108486108 PubMed DOI PMC
Chen W., Ferguson L. A. (2018). Molecular enhanced sampling with autoencoders: on-the-fly collective variable discovery and accelerated free energy landscape exploration. J. Comput. Chem. 39, 2079–2102. 10.1002/jcc.25520 PubMed DOI
Darden T., York D., Pedersen L. (1998). Particle mesh Ewald: An N · log(N) method for Ewald sums in large systems. J. Chem. Phys. 98:10089 10.1063/1.464397 DOI
Das P., Moll M., Stamati H., Kavraki L. E., Clementi C. (2006). Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc. Natl. Acad. Sci. U.S.A. 103, 9885–9890. 10.1073/pnas.0603553103 PubMed DOI PMC
Duan M., Fan J., Li M., Han L., Huo S. (2013). Evaluation of dimensionality-reduction methods from peptide folding-unfolding simulations. J. Chem. Theory Comput. 9, 2490–2497. 10.1021/ct400052y PubMed DOI PMC
Ferguson A. L., Panagiotopoulos A. Z., Debenedetti P. G., Kevrekidis G. I. (2010). Systematic determination of order parameters for chain dynamics using diffusion maps. Proc. Natl. Acad. Sci. U.S.A. 107, 13597–13602. 10.1073/pnas.1003293107 PubMed DOI PMC
Ferguson A. L., Panagiotopoulos A. Z., Kevrekidis I. G., Debenedetti P. G. (2011). Nonlinear dimensionality reduction in molecular simulation: the diffusion map approach. Chem. Phys. Lett. 509, 1–11. 10.1016/j.cplett.2011.04.066 DOI
Hess B., Bekker H., Berendsen H. J. C., Fraaije J. G. E. M. (1997). LINCS: a linear constraint solver for molecular simulations. J. Comp. Chem. 18, 1463–1472. 10.1002/(SICI)1096-987X(199709)18:12<1463::AID-JCC4>3.0.CO;2-H PubMed DOI
Hyvarinen A., Karhunen J., Oja E. (2001). Independent Component Analysis. New York, NY: John Wiley & Sons Int; 10.1002/0471221317 DOI
Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W., Klein M. L. (1983). Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935. 10.1063/1.445869 DOI
Lindorff-Larsen K., Piana S., Dror R. O., Shaw A. D. (2011). How fast-folding proteins fold. Science 334, 517–520. 10.1126/science.1208351 PubMed DOI
Lindorff-Larsen K., Piana S., Palmo K., Maragakis P., Klepeis J. L., O Dror R., et al. . (2010). Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins 78, 1950–1958. 10.1002/prot.22711 PubMed DOI PMC
McGibbon R. T., Beauchamp K. A., Harrigan M. P., Klein C., Swails J. M., Hernández C. X., et al. . (2015). MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109, 1528–1532. 10.1016/j.bpj.2015.08.015 PubMed DOI PMC
Molgedey L., Schuster G. H. (1994). Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 72, 3634–3637. 10.1103/PhysRevLett.72.3634 PubMed DOI
Mu Y., Nguyen P. H., Stock G. (2005). Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Prot. Struct. Funct. Bioinform. 58, 45–52. 10.1002/prot.20310 PubMed DOI
Neidigh J. W., Fesinmeyer R. M., Andersen N. H. (2002). Designing a 20-residue protein. Nat. Struct. Biol. 9, 425–430. 10.1038/nsb798 PubMed DOI
Noé F., Clementi C. (2015). Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory Comput. 11, 5002–5011. 10.1021/acs.jctc.5b00553 PubMed DOI
Oliphant T. E. (2006). A Guide to NumPy. Spanish Fork, UT: Trelgol Publishing.
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., et al. (2011). Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830.
Perez-Hernandez G., Paul F., Giorgino T., de Fabritiis G., Noé F. (2013). Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139:015102. 10.1063/1.4811489 PubMed DOI
Plaku E., Stamati H., Clementi C., Kavraki L. E. (2007). Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Prot. Struct. Funct. Bioinform. 67, 897–907. 10.1002/prot.21337 PubMed DOI
Schwantes C. R., Pande S. V. (2013). Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9. J. Chem. Theory Comput. 9, 2000–2009. 10.1021/ct300878a PubMed DOI PMC
Spiwok V., Králová B. (2011). Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. J. Chem. Phys. 135, 224504 10.1063/1.3660208 PubMed DOI
Spiwok V., Lipovová P, Králová B. (2007). Metadynamics in essential coordinates: free energy simulation of conformational changes. J. Phys. Chem. B 111, 3073–3076. 10.1021/jp068587c PubMed DOI
Stamati H., Clementi C., Kavraki L. E. (2010). Application of nonlinear dimensionality reduction to characterize the conformational landscape of small peptides. Prot. Struct. Funct. Bioinform. 78, 223–235. 10.1002/prot.22526 PubMed DOI PMC
Sultan M. M., Pande S. V. (2018). Automated design of collective variables using supervised machine learning. J. Chem. Phys. 149:094106. 10.1063/1.5029972 PubMed DOI
Sutto L., Dabramo M., Gervasio F. L. (2010). Comparing the efficiency of biased and unbiased molecular dynamics in reconstructing the free energy landscape of met-enkephalin. J. Chem. Theory Comput. 6, 3640–3646. 10.1021/ct100413b DOI
Trapl D., Horvacanin I., Mareska V., Ozcelik F., Unal G., Spiwok V. (2019). Anncolvar: approximation of complex collective variables by artificial neural networks for analysis and biasing of molecular simulations. Front. Mol. Biosci. 6:25. 10.3389/fmolb.2019.00025 PubMed DOI PMC
Tribello G. A., Ceriotti M., Parrinello M. (2012). Using sketch-map coordinates to analyze and bias molecular dynamics simulations. Proc. Natl. Acad. Sci. U.S.A. 109, 5196–5201 10.1073/pnas.1201152109 PubMed DOI PMC
Tribello G. A., Gasparotto P. (2019). Using dimensionality reduction to analyze protein trajectories Front. Mol. Biosci. 6:46 10.3389/fmolb.2019.00046 PubMed DOI PMC
van der Maaten L. J. P., Hinton G. E. (2008). Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605.
Wehmeyer C., Noé F. (2018). Time-lagged autoencoders: deep learning of slow collective variables for molecular kinetics. J. Chem. Phys. 148:241703. 10.1063/1.5011399 PubMed DOI
Wehmeyer C., Scherer M. K., Hempel T., Husic B. E., Olsson S., Noé F. (2019). Introduction to Markov state modeling with the PyEMMA software. Living J. Comp. Mol. Sci. 1:5965 10.33011/livecoms.1.1.5965 DOI
Acceleration of Molecular Simulations by Parametric Time-Lagged tSNE Metadynamics