{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`reciprocalspaceship` provides methods for reading and writing MTZ files, and can be easily used to join reflection data by Miller indices. We will demonstrate these uses by loading diffraction data of tetragonal hen egg-white lysozyme (HEWL)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9.9\n" ] } ], "source": [ "import reciprocalspaceship as rs\n", "print(rs.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This diffraction data was collected at the Sector 24-ID-C beamline at [NE-CAT](https://lilith.nec.aps.anl.gov/) at APS. Diffraction images were collected at ambient room temperature (295K), and low energy (6550 eV) in order to collect native sulfur anomalous diffraction for experimental phasing. The diffraction images were processed in [DIALS](https://dials.github.io/) for indexing, geometry refinement, and spot integration, and scaling and merging was done in [AIMLESS](http://www.ccp4.ac.uk/html/aimless.html). This data reduction yielded an MTZ file that is included in the `data/` subdirectory. Here, we will load the MTZ file and inspect its contents." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Loading reflection data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reflection tables can be loaded using the top-level function, `rs.read_mtz()`. This returns a `DataSet` object, that is analogous to a `pandas.DataFrame`. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'DataSet'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "refltable = rs.read_mtz(\"data/HEWL_SSAD_24IDC.mtz\")\n", "type(refltable).__name__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This reflection table was produced directly from `AIMLESS`, and contains several different data columns:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FreeR_flagIMEANSIGIMEANI(+)SIGI(+)I(-)SIGI(-)N(+)N(-)
HKL
00414661.2998721.953098661.2998721.953098661.2998721.9530981616
843229.649105.9809343229.649105.9809343229.649105.9809341616
1261361.867243.060851361.867243.060851361.867243.060851616
16194124.393196.891084124.393196.891084124.393196.8910888
10116559.336858.6263559.336858.6263559.336858.62636464
\n", "
" ], "text/plain": [ " FreeR_flag IMEAN SIGIMEAN I(+) SIGI(+) I(-) \\\n", "H K L \n", "0 0 4 14 661.29987 21.953098 661.29987 21.953098 661.29987 \n", " 8 4 3229.649 105.980934 3229.649 105.980934 3229.649 \n", " 12 6 1361.8672 43.06085 1361.8672 43.06085 1361.8672 \n", " 16 19 4124.393 196.89108 4124.393 196.89108 4124.393 \n", "1 0 1 16 559.33685 8.6263 559.33685 8.6263 559.33685 \n", "\n", " SIGI(-) N(+) N(-) \n", "H K L \n", "0 0 4 21.953098 16 16 \n", " 8 105.980934 16 16 \n", " 12 43.06085 16 16 \n", " 16 196.89108 8 8 \n", "1 0 1 8.6263 64 64 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "refltable.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of reflections: 12542\n" ] } ], "source": [ "print(f\"Number of reflections: {len(refltable)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internally, each of these data columns is stored using a custom `dtype` that was added to the conventional `pandas` and `numpy` datatypes. This enables `DataSet` reflection tables to be written back to MTZ files. There is a `dtype` for each of the possible datatypes listed in the [MTZ file specification](http://www.ccp4.ac.uk/html/f2mtz.html#CTYPOUT). " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FreeR_flag MTZInt\n", "IMEAN Intensity\n", "SIGIMEAN Stddev\n", "I(+) FriedelIntensity\n", "SIGI(+) StddevFriedelI\n", "I(-) FriedelIntensity\n", "SIGI(-) StddevFriedelI\n", "N(+) MTZInt\n", "N(-) MTZInt\n", "dtype: object" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "refltable.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Additional crystallographic metadata is read from the MTZ file and can be stored as attributes of the `DataSet`. These include the crystallographic spacegroup and unit cell parameters, which are stored as `gemmi.SpaceGroup` and `gemmi.UnitCell` objects. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "refltable.spacegroup" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "refltable.cell" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Plotting reflection data\n", "\n", "For illustrative purposes, let's plot the $I(+)$ data against the $I(-)$ data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(6, 6))\n", "plt.plot(refltable['I(+)'].to_numpy(), refltable['I(-)'].to_numpy(), 'k.', alpha=0.1)\n", "plt.xlabel(\"I(+)\")\n", "plt.ylabel(\"I(-)\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the [next example](2_mergingstats.ipynb), we will investigate this anomalous signal in more detail. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Writing Reflection Data\n", "\n", "It is also possible to write out MTZ files using `DataSet.write_mtz()`. This functionality depends on the correct setting of each column's `dtype`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "refltable.write_mtz(\"data/HEWL_SSAD_24IDC.mtz\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }