This page was generated from docs/userguide/mtzdtypes.ipynb. Interactive online version:

MTZ Data Types

MTZ files use column types to specify what type of crystallographic data is contained within a given column (see MTZ specification). This enables columns to have arbitrary names while ensuring that the column values are interpreted correctly.

In order to ensure that MTZ data types behave as expected in rs.DataSet objects, we have implemented a set of custom pandas dtypes to represent the crystallographic data found in MTZ files. This facilitates MTZ file I/O, and makes it possible to write methods that operate only on expected types of crystallographic data.

[1]:

import reciprocalspaceship as rs
import numpy as np
from IPython.display import HTML

Supported MTZ data types

The following MTZ dtypes are available for rs.DataSet and rs.DataSeries objects:

[2]:

df = rs.summarize_mtz_dtypes(print_summary=False)
HTML(df.to_html(index=False))

[2]:

MTZ Code	Name	Class	Internal
D	AnomalousDifference	AnomalousDifferenceDtype	float32
B	Batch	BatchDtype	int32
K	FriedelIntensity	FriedelIntensityDtype	float32
G	FriedelSFAmplitude	FriedelStructureFactorAmplitudeDtype	float32
H	HKL	HKLIndexDtype	int32
A	HendricksonLattman	HendricksonLattmanDtype	float32
J	Intensity	IntensityDtype	float32
I	MTZInt	MTZIntDtype	int32
R	MTZReal	MTZRealDtype	float32
Y	M/ISYM	M_IsymDtype	int32
E	NormalizedSFAmplitude	NormalizedStructureFactorAmplitudeDtype	float32
P	Phase	PhaseDtype	float32
Q	Stddev	StandardDeviationDtype	float32
M	StddevFriedelI	StandardDeviationFriedelIDtype	float32
L	StddevFriedelSF	StandardDeviationFriedelSFDtype	float32
F	SFAmplitude	StructureFactorAmplitudeDtype	float32
W	Weight	WeightDtype	float32

Internally, these are all stored as numpy arrays of 32-bit ints or floats. This is because MTZ files only take 32-bit values. It is worth keeping in mind that other data types can be stored in an rs.DataSet column or rs.DataSeries; however, only MTZ dtypes can be written out to an MTZ file.

Specifying MTZ data types

It is possible to specify a dtype using the MTZ Code, Name, or Class from the above table:

[3]:

data1 = rs.DataSeries([0, 1, 2], dtype="J")
data1

[3]:

0   0.0
1   1.0
2   2.0
dtype: Intensity

[4]:

data2 = rs.DataSeries([0, 1, 2], dtype="Intensity")
data2

[4]:

0   0.0
1   1.0
2   2.0
dtype: Intensity

[5]:

data3 = rs.DataSeries([0, 1, 2], dtype=rs.IntensityDtype())
data3

[5]:

0   0.0
1   1.0
2   2.0
dtype: Intensity

If you already have an rs.DataSeries, it is possible to change it to an MTZ dtype:

[6]:

data4 = rs.DataSeries([0, 1, 2], dtype=np.int64)
data4.astype("Intensity")

[6]:

0   0.0
1   1.0
2   2.0
dtype: Intensity

In the example above, the np.int64 array was converted into an array of float32 values because that is that is the internal storage type for the rs.IntensityDtype.

Inferring MTZ data types

If data is read directly from a MTZ file, the proper MTZ data types will be set automatically. However, in order to facilitate working with MTZ dtypes, there is also support for inferring proper dtypes based on the underlying data and name of a rs.DataSeries or the columns of an rs.DataSet. Inferring the proper dtype is not always possible, but these functions are written to work for most common column names in MTZ files. If you come across common cases that do not seem to be supported, please feel free to file an issue on GitHub.

Inferring dtype for DataSeries:

[7]:

data = rs.DataSeries([0, 1, 2], name="SigI")
print(data)

0    0
1    1
2    2
Name: SigI, dtype: int64

[8]:

data.infer_mtz_dtype()

[8]:

0   0.0
1   1.0
2   2.0
Name: SigI, dtype: Stddev

It is also possible to infer the dtype for all of the columns in a DataSet. To illustrate this, we will read in an MTZ file, set all of the columns to the object dtype, and infer the correct dtypes:

[9]:

dataset = rs.read_mtz("../examples/data/HEWL_SSAD_24IDC.mtz")
dataset.dtypes

[9]:

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

[10]:

dataset = dataset.astype(object)
dataset.dtypes

[10]:

FreeR_flag    object
IMEAN         object
SIGIMEAN      object
I(+)          object
SIGI(+)       object
I(-)          object
SIGI(-)       object
N(+)          object
N(-)          object
dtype: object

[11]:

dataset.infer_mtz_dtypes(inplace=True)
dataset.dtypes

[11]:

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

Switching between Friedel and non-Friedel data types

Several MTZ data types are specific for anomalous data pertaining to Friedel pairs. For applicable rs.DataSeries objects, it is possible to switch between these data types. For data types without a Friedel-equivalent, the rs.DataSeries is returned unchanged:

[12]:

# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="Intensity").to_friedel_dtype()

[12]:

0   0.0
1   1.0
2   2.0
dtype: FriedelIntensity

[13]:

# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="FriedelIntensity").from_friedel_dtype()

[13]:

0   0.0
1   1.0
2   2.0
dtype: Intensity

[14]:

# No Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="MTZInt").to_friedel_dtype()

[14]:

0   0
1   1
2   2
dtype: MTZInt

Writing out MTZ files

Any data that will be written out to a MTZ format file must have an MTZ data type.

[15]:

dataset.dtypes

[15]:

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

[16]:

dataset.write_mtz("temp.mtz")

If there is a non-MTZ dtype in a DataSet, DataSet.write_mtz() will raise a ValueError.

[17]:

dataset["Temp"] = "string"
dataset.dtypes

[17]:

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
Temp                    object
dtype: object

[18]:

dataset.write_mtz("temp.mtz")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 dataset.write_mtz("temp.mtz")

File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/dataset.py:611, in DataSet.write_mtz(self, mtzfile, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
    585 """
    586 Write DataSet to MTZ file.
    587
   (...)
    607     Dataset name to assign to MTZ file
    608 """
    609 from reciprocalspaceship import io
--> 611 return io.write_mtz(
    612     self,
    613     mtzfile,
    614     skip_problem_mtztypes,
    615     project_name,
    616     crystal_name,
    617     dataset_name,
    618 )

File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/io/mtz.py:225, in write_mtz(dataset, mtzfile, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
    191 def write_mtz(
    192     dataset,
    193     mtzfile,
   (...)
    197     dataset_name,
    198 ):
    199     """
    200     Write an MTZ reflection file from the reflection data in a DataSet.
    201
   (...)
    223         Dataset name to assign to MTZ file
    224     """
--> 225     mtz = to_gemmi(
    226         dataset, skip_problem_mtztypes, project_name, crystal_name, dataset_name
    227     )
    228     mtz.write_to_file(mtzfile)
    229     return

File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/io/mtz.py:151, in to_gemmi(dataset, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
    149         continue
    150     else:
--> 151         raise ValueError(
    152             f"column {c} of type {cseries.dtype} cannot be written to an MTZ file. "
    153             f"To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True"
    154         )
    155 mtz.set_data(temp[columns].to_numpy(dtype="float32"))
    157 # Handle Unmerged data

ValueError: column Temp of type object cannot be written to an MTZ file. To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True

As the error message states, it is still possible to write out the MTZ by setting skip_problem_mtztypes=True. This will skip any columns with non-MTZ data types:

[19]:

dataset.write_mtz("temp.mtz", skip_problem_mtztypes=True)