Learning Monte Carlo Move Sets and Molecular Distributions in a Variational Autoencoding Framework
Department of Chemical Engineering and Materials Science
Location: Gateway South 024
Speaker: Dr. Jacob Monroe, University of Arkansas
ABSTRACT
Multiscale modeling requires the linking of models at different levels of detail with the goal of gaining accelerations from lower fidelity models while recovering fine details from higher resolution models. Ideally, tight communication and seamless switching between models at different resolutions is possible. This is particularly important in molecular simulations of soft matter, where there is a tight coupling between molecular-level details and mesoscale structures. Typically, coarse-grained (i.e., lower resolution) models are necessary to capture the time and length-scales of self-assembly and extensive conformational change, but atomistic resolution is required for computing quantitatively accurate predictions of properties of interest. We show that representing the coarse-graining and back-mapping problem within a variational autoencoding framework provides a tractable problem for learning probabilistic models of molecular degrees of freedom. These models can be leveraged as Monte Carlo (MC) move sets that accelerate sampling by moving through a coarse-grained space yet rigorously satisfy detailed balance and preserve the atomistic ensemble. Through a number of toy and model systems, we demonstrate the accelerations possible through these VAE-based MC moves.
In the context of proteins, we present decoding models based on conditional normalizing flows that can recover atomistic details from coarse-grained representations of protein sidechains. Crucially, the models are transferable to any protein sequence, account for the local environment of a sidechain, and provide exact log-probabilities for autoregressively generated atomistic configurations. We also demonstrate, however, that reweighting is extremely challenging despite state-of-the-art performance on recently developed metrics and generation of configurations with low energies in atomistic protein force fields. Through detailed analysis of configurational weights, we demonstrate that machine-learned backmappings must not only generate configurations with reasonable energies but also correctly assign relative probabilities under the generative model. These are broadly important considerations in generative modeling of atomistic molecular configurations.
BIOGRAPHY
Jacob received a B.S. in chemical engineering from the University of Virginia before proceeding to a Ph.D. at the University of California, Santa Barbara, where he used molecular simulations to explore ways in which surface chemical heterogeneity can be exploited to control interfacial water properties and water-mediated interactions. In an NRC postdoctoral fellowship at NIST in Gaithersburg, MD, he identified ways to incorporate modern machine learning tools into molecular simulation workflows. Jacob joined the University of Arkansas as an assistant professor in January of 2023. The Monroe Research Group develops machine learning methods firmly grounded in statistical mechanics to enable rapid yet rigorous calculations of thermodynamic properties of biomolecules and biomaterials. Jacob received a DOE Early Career Award in 2023.