31 Jul 2024

Multivariate Wilson Priors

Pre-prints Demonstrate Multivariate Priors for Time-Resolved Crystallography

Kevin Dalton

Comparative Crystallography and Careless

In modern structural biology, we are often more interested in the small differences between related structures rather than the structures themselves. There are many applications of this concept in the domain of comparative crystallography. For instance, equilibrium biophysical perturbations such as multitemperature crystallography, non-equilibrium perturbations like electric-field or temperature jumps, drug fragment screening, and anomalous diffraction all fall under the umbrella of comparative crystallography. While comparative data have been successfully analyzed using conventional tools, the fidelity of experiments are often challenged by the magnitude of systematic errors present in diffraction data. These can be hundreds of times larger than the true structural differences.

Systematic errors in rotation data Example of systematic errors in conventional diffraction data from Dalton et al. (CC-BY license)

To address the problem of systematic errors in diffraction data, we built careless, a flexible tool for merging reflection intensities. Careless uses modern concepts from machine learning like variational inference and deep learning to correct the systematic errors. In our original publication we demonstrated that careless works well for one important application of comparative crystallography which is time-resolved diffraction. Specifically, we showed state of the art inference of time-resolved structural changes in photoactive yellow protein. This week, we’re excited to announce a new feature which takes careless to the next level in the comparative crystallography setting. We shared our insights in a series of 3 closely related pre-prints.

  1. Sensitive Detection of Structural Differences using a Statistical Framework for Comparative Crystallography.
  2. Resolving DJ-1 Glyoxalase Catalysis Using Mix-and-Inject Serial Crystallography at a Synchrotron
  3. Scaling and Merging Time-Resolved Laue Data with Variational Inference

Multivariate Wilson Prior

Applications of the multivariate Wilson prior Applications of the multivariate Wilson prior from Hekstra et al. (CC-BY-NC license)

Hekstra et al. (1) derive a mathematical formalism for comparative crystallography. The key insight in their approach is to add some structure into the Bayesian prior distribution used for merging data in Careless. Specifically, the default implementation in careless treats related structures as statistically independent. By allowing users to specify that subsequent time-points should be statistically dependent, this work is able to tease much more signal out of diffraction data. Specific applications demonstrated in the manuscript include

  • polychromatic anomalous diffraction
  • time-resolved polychromatic diffraction
  • anomalous diffraction at an X-ray free electron laser (XFEL)
  • drug fragment screen

In every case, the structured prior is able to increase the amount of signal observed in the data.

Time-resolved Diffraction of an Enzyme, DJ-1

Zielinski and Dolamore et al. (2) applied the Multivariate Wilson prior to DJ-1, an enzyme involved in Parkinson’s disease. In this study, Lois Pollack’s team from Cornell University developed sample mixers which allowed DJ-1 to be rapidly mixed with a substrate, methylglyoxal. Mark Wilson’s group from University of Nebraska Lincoln grew the DJ-1 crystals which were recorded in a time-resolved fashion at BioCARS, a polychromatic beamline at the Advanced Photon Source. This configuration allowed the collaborators to observe the conversion of toxic methylgloxal into non-toxic lactate. The exceptional clarity of the time-resolved difference electron density maps produced by Careless enabled new insights into the catalytic mechanism of DJ-1 including an explanation of the enantio-purity of the products.

Stereo chemistry of DJ-1 Lactoyl-cysteine intermediate The mechanism of enantioselective hydrolysis in DJ-1 catalysis as explained by Zielinski and Dolamore et al.. Water can only access the Lactoylcysteine intermediate from the solvent-exposed face of the active site. (CC-BY-NC-ND license)

Long-range allostery in DJ-1 Zielinski and Dolamore et al. show evidence of long-range allostery between the active site and Cystein-53. (CC-BY-NC-ND license)

Variational Inference Best Practices

Finally, Zielinski et al (3), details the application of Careless to time-resolved data using DJ-1 as an example. This manuscript offers helpful guidance to users seeking to get the most of their time-resolved diffraction data. Furthermore, it contains a thorough ablation study which demonstrates the impact of many components of the model. As judged from the ablation, the multivariate Wilson prior was extremely important for maximizing time-resolved signal. In our estimation, the prior contributes a nearly 10 sigma boost in signal to noise!

DJ-1 merging ablation study

Zielinski et al. studied the importance of careless model components by conducting ablation studies. Among other important observations, they showed the multivariate Wilson prior accounts for a nearly 10 sigma increase in time-resolved difference signal. (CC-BY-NC license)

The Future of Variational Inference in Crystallography

Together, these pre-prints highlight the flexibility of variational inference as a framework for modeling crystallographic data. New insights like the multivariate prior are easy to incorporate and test. Furthermore, they translate to meaningful biological results! The rs-station devs are looking forward to continuing advancing the state of the art as we support and develop this important technology.