31 Jul 2024
Kevin Dalton
In modern structural biology, we are often more interested in the small differences between related structures rather than the structures themselves. There are many applications of this concept in the domain of comparative crystallography. For instance, equilibrium biophysical perturbations such as multitemperature crystallography, non-equilibrium perturbations like electric-field or temperature jumps, drug fragment screening, and anomalous diffraction all fall under the umbrella of comparative crystallography. While comparative data have been successfully analyzed using conventional tools, the fidelity of experiments are often challenged by the magnitude of systematic errors present in diffraction data. These can be hundreds of times larger than the true structural differences.
To address the problem of systematic errors in diffraction data, we built careless, a flexible tool for merging reflection intensities. Careless uses modern concepts from machine learning like variational inference and deep learning to correct the systematic errors. In our original publication we demonstrated that careless works well for one important application of comparative crystallography which is time-resolved diffraction. Specifically, we showed state of the art inference of time-resolved structural changes in photoactive yellow protein. This week, we’re excited to announce a new feature which takes careless to the next level in the comparative crystallography setting. We shared our insights in a series of 3 closely related pre-prints.
Hekstra et al. (1) derive a mathematical formalism for comparative crystallography. The key insight in their approach is to add some structure into the Bayesian prior distribution used for merging data in Careless. Specifically, the default implementation in careless treats related structures as statistically independent. By allowing users to specify that subsequent time-points should be statistically dependent, this work is able to tease much more signal out of diffraction data. Specific applications demonstrated in the manuscript include
In every case, the structured prior is able to increase the amount of signal observed in the data.
Zielinski and Dolamore et al. (2) applied the Multivariate Wilson prior to DJ-1, an enzyme involved in Parkinson’s disease. In this study, Lois Pollack’s team from Cornell University developed sample mixers which allowed DJ-1 to be rapidly mixed with a substrate, methylglyoxal. Mark Wilson’s group from University of Nebraska Lincoln grew the DJ-1 crystals which were recorded in a time-resolved fashion at BioCARS, a polychromatic beamline at the Advanced Photon Source. This configuration allowed the collaborators to observe the conversion of toxic methylgloxal into non-toxic lactate. The exceptional clarity of the time-resolved difference electron density maps produced by Careless enabled new insights into the catalytic mechanism of DJ-1 including an explanation of the enantio-purity of the products.
Finally, Zielinski et al (3), details the application of Careless to time-resolved data using DJ-1 as an example. This manuscript offers helpful guidance to users seeking to get the most of their time-resolved diffraction data. Furthermore, it contains a thorough ablation study which demonstrates the impact of many components of the model. As judged from the ablation, the multivariate Wilson prior was extremely important for maximizing time-resolved signal. In our estimation, the prior contributes a nearly 10 sigma boost in signal to noise!
Together, these pre-prints highlight the flexibility of variational inference as a framework for modeling crystallographic data. New insights like the multivariate prior are easy to incorporate and test. Furthermore, they translate to meaningful biological results! The rs-station devs are looking forward to continuing advancing the state of the art as we support and develop this important technology.