This page was generated from docs/userguide/mtzdtypes.ipynb. Interactive online version: Binder badge.

MTZ Data Types

MTZ files use column types to specify what type of crystallographic data is contained within a given column (see MTZ specification). This enables columns to have arbitrary names while ensuring that the column values are interpreted correctly.

In order to ensure that MTZ data types behave as expected in rs.DataSet objects, we have implemented a set of custom pandas dtypes to represent the crystallographic data found in MTZ files. This facilitates MTZ file I/O, and makes it possible to write methods that operate only on expected types of crystallographic data.

[1]:
import reciprocalspaceship as rs
import numpy as np
from IPython.display import HTML

Supported MTZ data types

The following MTZ dtypes are available for rs.DataSet and rs.DataSeries objects:

[2]:
df = rs.summarize_mtz_dtypes(print_summary=False)
HTML(df.to_html(index=False))
[2]:
MTZ Code Name Class Internal
D AnomalousDifference AnomalousDifferenceDtype float32
B Batch BatchDtype int32
K FriedelIntensity FriedelIntensityDtype float32
G FriedelSFAmplitude FriedelStructureFactorAmplitudeDtype float32
H HKL HKLIndexDtype int32
A HendricksonLattman HendricksonLattmanDtype float32
J Intensity IntensityDtype float32
I MTZInt MTZIntDtype int32
R MTZReal MTZRealDtype float32
Y M/ISYM M_IsymDtype int32
E NormalizedSFAmplitude NormalizedStructureFactorAmplitudeDtype float32
P Phase PhaseDtype float32
Q Stddev StandardDeviationDtype float32
M StddevFriedelI StandardDeviationFriedelIDtype float32
L StddevFriedelSF StandardDeviationFriedelSFDtype float32
F SFAmplitude StructureFactorAmplitudeDtype float32
W Weight WeightDtype float32

Internally, these are all stored as numpy arrays of 32-bit ints or floats. This is because MTZ files only take 32-bit values. It is worth keeping in mind that other data types can be stored in an rs.DataSet column or rs.DataSeries; however, only MTZ dtypes can be written out to an MTZ file.


Specifying MTZ data types

It is possible to specify a dtype using the MTZ Code, Name, or Class from the above table:

[3]:
data1 = rs.DataSeries([0, 1, 2], dtype="J")
data1
[3]:
0   0.0
1   1.0
2   2.0
dtype: Intensity
[4]:
data2 = rs.DataSeries([0, 1, 2], dtype="Intensity")
data2
[4]:
0   0.0
1   1.0
2   2.0
dtype: Intensity
[5]:
data3 = rs.DataSeries([0, 1, 2], dtype=rs.IntensityDtype())
data3
[5]:
0   0.0
1   1.0
2   2.0
dtype: Intensity

If you already have an rs.DataSeries, it is possible to change it to an MTZ dtype:

[6]:
data4 = rs.DataSeries([0, 1, 2], dtype=np.int64)
data4.astype("Intensity")
[6]:
0   0.0
1   1.0
2   2.0
dtype: Intensity

In the example above, the np.int64 array was converted into an array of float32 values because that is that is the internal storage type for the rs.IntensityDtype.


Inferring MTZ data types

If data is read directly from a MTZ file, the proper MTZ data types will be set automatically. However, in order to facilitate working with MTZ dtypes, there is also support for inferring proper dtypes based on the underlying data and name of a rs.DataSeries or the columns of an rs.DataSet. Inferring the proper dtype is not always possible, but these functions are written to work for most common column names in MTZ files. If you come across common cases that do not seem to be supported, please feel free to file an issue on GitHub.

Inferring dtype for DataSeries:

[7]:
data = rs.DataSeries([0, 1, 2], name="SigI")
print(data)
0    0
1    1
2    2
Name: SigI, dtype: int64
[8]:
data.infer_mtz_dtype()
[8]:
0   0.0
1   1.0
2   2.0
Name: SigI, dtype: Stddev

It is also possible to infer the dtype for all of the columns in a DataSet. To illustrate this, we will read in an MTZ file, set all of the columns to the object dtype, and infer the correct dtypes:

[9]:
dataset = rs.read_mtz("../examples/data/HEWL_SSAD_24IDC.mtz")
dataset.dtypes
[9]:
FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object
[10]:
dataset = dataset.astype(object)
dataset.dtypes
[10]:
FreeR_flag    object
IMEAN         object
SIGIMEAN      object
I(+)          object
SIGI(+)       object
I(-)          object
SIGI(-)       object
N(+)          object
N(-)          object
dtype: object
[11]:
dataset.infer_mtz_dtypes(inplace=True)
dataset.dtypes
[11]:
FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

Switching between Friedel and non-Friedel data types

Several MTZ data types are specific for anomalous data pertaining to Friedel pairs. For applicable rs.DataSeries objects, it is possible to switch between these data types. For data types without a Friedel-equivalent, the rs.DataSeries is returned unchanged:

[12]:
# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="Intensity").to_friedel_dtype()
[12]:
0   0.0
1   1.0
2   2.0
dtype: FriedelIntensity
[13]:
# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="FriedelIntensity").from_friedel_dtype()
[13]:
0   0.0
1   1.0
2   2.0
dtype: Intensity
[14]:
# No Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="MTZInt").to_friedel_dtype()
[14]:
0   0
1   1
2   2
dtype: MTZInt

Writing out MTZ files

Any data that will be written out to a MTZ format file must have an MTZ data type.

[15]:
dataset.dtypes
[15]:
FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object
[16]:
dataset.write_mtz("temp.mtz")

If there is a non-MTZ dtype in a DataSet, DataSet.write_mtz() will raise a ValueError.

[17]:
dataset["Temp"] = "string"
dataset.dtypes
[17]:
FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
Temp                    object
dtype: object
[18]:
dataset.write_mtz("temp.mtz")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 dataset.write_mtz("temp.mtz")

File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/dataset.py:611, in DataSet.write_mtz(self, mtzfile, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
    585 """
    586 Write DataSet to MTZ file.
    587
   (...)
    607     Dataset name to assign to MTZ file
    608 """
    609 from reciprocalspaceship import io
--> 611 return io.write_mtz(
    612     self,
    613     mtzfile,
    614     skip_problem_mtztypes,
    615     project_name,
    616     crystal_name,
    617     dataset_name,
    618 )

File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/io/mtz.py:225, in write_mtz(dataset, mtzfile, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
    191 def write_mtz(
    192     dataset,
    193     mtzfile,
   (...)
    197     dataset_name,
    198 ):
    199     """
    200     Write an MTZ reflection file from the reflection data in a DataSet.
    201
   (...)
    223         Dataset name to assign to MTZ file
    224     """
--> 225     mtz = to_gemmi(
    226         dataset, skip_problem_mtztypes, project_name, crystal_name, dataset_name
    227     )
    228     mtz.write_to_file(mtzfile)
    229     return

File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/io/mtz.py:151, in to_gemmi(dataset, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
    149         continue
    150     else:
--> 151         raise ValueError(
    152             f"column {c} of type {cseries.dtype} cannot be written to an MTZ file. "
    153             f"To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True"
    154         )
    155 mtz.set_data(temp[columns].to_numpy(dtype="float32"))
    157 # Handle Unmerged data

ValueError: column Temp of type object cannot be written to an MTZ file. To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True

As the error message states, it is still possible to write out the MTZ by setting skip_problem_mtztypes=True. This will skip any columns with non-MTZ data types:

[19]:
dataset.write_mtz("temp.mtz", skip_problem_mtztypes=True)