MTZ Data Types
MTZ files use column types to specify what type of crystallographic data is contained within a given column (see MTZ specification). This enables columns to have arbitrary names while ensuring that the column values are interpreted correctly.
In order to ensure that MTZ data types behave as expected in rs.DataSet
objects, we have implemented a set of custom pandas
dtypes to represent the crystallographic data found in MTZ files. This facilitates MTZ file I/O, and makes it possible to write methods that operate only on expected types of crystallographic data.
[1]:
import reciprocalspaceship as rs
import numpy as np
from IPython.display import HTML
Supported MTZ data types
The following MTZ dtypes are available for rs.DataSet
and rs.DataSeries
objects:
[2]:
df = rs.summarize_mtz_dtypes(print_summary=False)
HTML(df.to_html(index=False))
[2]:
MTZ Code | Name | Class | Internal |
---|---|---|---|
D | AnomalousDifference | AnomalousDifferenceDtype | float32 |
B | Batch | BatchDtype | int32 |
K | FriedelIntensity | FriedelIntensityDtype | float32 |
G | FriedelSFAmplitude | FriedelStructureFactorAmplitudeDtype | float32 |
H | HKL | HKLIndexDtype | int32 |
A | HendricksonLattman | HendricksonLattmanDtype | float32 |
J | Intensity | IntensityDtype | float32 |
I | MTZInt | MTZIntDtype | int32 |
R | MTZReal | MTZRealDtype | float32 |
Y | M/ISYM | M_IsymDtype | int32 |
E | NormalizedSFAmplitude | NormalizedStructureFactorAmplitudeDtype | float32 |
P | Phase | PhaseDtype | float32 |
Q | Stddev | StandardDeviationDtype | float32 |
M | StddevFriedelI | StandardDeviationFriedelIDtype | float32 |
L | StddevFriedelSF | StandardDeviationFriedelSFDtype | float32 |
F | SFAmplitude | StructureFactorAmplitudeDtype | float32 |
W | Weight | WeightDtype | float32 |
Internally, these are all stored as numpy
arrays of 32-bit ints or floats. This is because MTZ files only take 32-bit values. It is worth keeping in mind that other data types can be stored in an rs.DataSet
column or rs.DataSeries
; however, only MTZ dtypes can be written out to an MTZ file.
Specifying MTZ data types
It is possible to specify a dtype using the MTZ Code, Name, or Class from the above table:
[3]:
data1 = rs.DataSeries([0, 1, 2], dtype="J")
data1
[3]:
0 0.0
1 1.0
2 2.0
dtype: Intensity
[4]:
data2 = rs.DataSeries([0, 1, 2], dtype="Intensity")
data2
[4]:
0 0.0
1 1.0
2 2.0
dtype: Intensity
[5]:
data3 = rs.DataSeries([0, 1, 2], dtype=rs.IntensityDtype())
data3
[5]:
0 0.0
1 1.0
2 2.0
dtype: Intensity
If you already have an rs.DataSeries
, it is possible to change it to an MTZ dtype:
[6]:
data4 = rs.DataSeries([0, 1, 2], dtype=np.int64)
data4.astype("Intensity")
[6]:
0 0.0
1 1.0
2 2.0
dtype: Intensity
In the example above, the np.int64
array was converted into an array of float32
values because that is that is the internal storage type for the rs.IntensityDtype
.
Inferring MTZ data types
If data is read directly from a MTZ file, the proper MTZ data types will be set automatically. However, in order to facilitate working with MTZ dtypes, there is also support for inferring proper dtypes based on the underlying data and name of a rs.DataSeries
or the columns of an rs.DataSet
. Inferring the proper dtype is not always possible, but these functions are written to work for most common column names in MTZ files. If you come across common cases that do not seem to be supported,
please feel free to file an issue on GitHub.
Inferring dtype for DataSeries
:
[7]:
data = rs.DataSeries([0, 1, 2], name="SigI")
print(data)
0 0
1 1
2 2
Name: SigI, dtype: int64
[8]:
data.infer_mtz_dtype()
[8]:
0 0.0
1 1.0
2 2.0
Name: SigI, dtype: Stddev
It is also possible to infer the dtype for all of the columns in a DataSet
. To illustrate this, we will read in an MTZ file, set all of the columns to the object
dtype, and infer the correct dtypes:
[9]:
dataset = rs.read_mtz("../examples/data/HEWL_SSAD_24IDC.mtz")
dataset.dtypes
[9]:
FreeR_flag MTZInt
IMEAN Intensity
SIGIMEAN Stddev
I(+) FriedelIntensity
SIGI(+) StddevFriedelI
I(-) FriedelIntensity
SIGI(-) StddevFriedelI
N(+) MTZInt
N(-) MTZInt
dtype: object
[10]:
dataset = dataset.astype(object)
dataset.dtypes
[10]:
FreeR_flag object
IMEAN object
SIGIMEAN object
I(+) object
SIGI(+) object
I(-) object
SIGI(-) object
N(+) object
N(-) object
dtype: object
[11]:
dataset.infer_mtz_dtypes(inplace=True)
dataset.dtypes
[11]:
FreeR_flag MTZInt
IMEAN Intensity
SIGIMEAN Stddev
I(+) FriedelIntensity
SIGI(+) StddevFriedelI
I(-) FriedelIntensity
SIGI(-) StddevFriedelI
N(+) MTZInt
N(-) MTZInt
dtype: object
Switching between Friedel and non-Friedel data types
Several MTZ data types are specific for anomalous data pertaining to Friedel pairs. For applicable rs.DataSeries
objects, it is possible to switch between these data types. For data types without a Friedel-equivalent, the rs.DataSeries
is returned unchanged:
[12]:
# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="Intensity").to_friedel_dtype()
[12]:
0 0.0
1 1.0
2 2.0
dtype: FriedelIntensity
[13]:
# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="FriedelIntensity").from_friedel_dtype()
[13]:
0 0.0
1 1.0
2 2.0
dtype: Intensity
[14]:
# No Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="MTZInt").to_friedel_dtype()
[14]:
0 0
1 1
2 2
dtype: MTZInt
Writing out MTZ files
Any data that will be written out to a MTZ format file must have an MTZ data type.
[15]:
dataset.dtypes
[15]:
FreeR_flag MTZInt
IMEAN Intensity
SIGIMEAN Stddev
I(+) FriedelIntensity
SIGI(+) StddevFriedelI
I(-) FriedelIntensity
SIGI(-) StddevFriedelI
N(+) MTZInt
N(-) MTZInt
dtype: object
[16]:
dataset.write_mtz("temp.mtz")
If there is a non-MTZ dtype in a DataSet
, DataSet.write_mtz()
will raise a ValueError
.
[17]:
dataset["Temp"] = "string"
dataset.dtypes
[17]:
FreeR_flag MTZInt
IMEAN Intensity
SIGIMEAN Stddev
I(+) FriedelIntensity
SIGI(+) StddevFriedelI
I(-) FriedelIntensity
SIGI(-) StddevFriedelI
N(+) MTZInt
N(-) MTZInt
Temp object
dtype: object
[18]:
dataset.write_mtz("temp.mtz")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 dataset.write_mtz("temp.mtz")
File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/dataset.py:611, in DataSet.write_mtz(self, mtzfile, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
585 """
586 Write DataSet to MTZ file.
587
(...)
607 Dataset name to assign to MTZ file
608 """
609 from reciprocalspaceship import io
--> 611 return io.write_mtz(
612 self,
613 mtzfile,
614 skip_problem_mtztypes,
615 project_name,
616 crystal_name,
617 dataset_name,
618 )
File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/io/mtz.py:225, in write_mtz(dataset, mtzfile, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
191 def write_mtz(
192 dataset,
193 mtzfile,
(...)
197 dataset_name,
198 ):
199 """
200 Write an MTZ reflection file from the reflection data in a DataSet.
201
(...)
223 Dataset name to assign to MTZ file
224 """
--> 225 mtz = to_gemmi(
226 dataset, skip_problem_mtztypes, project_name, crystal_name, dataset_name
227 )
228 mtz.write_to_file(mtzfile)
229 return
File ~/Documents/github/reciprocalspaceship/reciprocalspaceship/io/mtz.py:151, in to_gemmi(dataset, skip_problem_mtztypes, project_name, crystal_name, dataset_name)
149 continue
150 else:
--> 151 raise ValueError(
152 f"column {c} of type {cseries.dtype} cannot be written to an MTZ file. "
153 f"To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True"
154 )
155 mtz.set_data(temp[columns].to_numpy(dtype="float32"))
157 # Handle Unmerged data
ValueError: column Temp of type object cannot be written to an MTZ file. To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True
As the error message states, it is still possible to write out the MTZ by setting skip_problem_mtztypes=True
. This will skip any columns with non-MTZ data types:
[19]:
dataset.write_mtz("temp.mtz", skip_problem_mtztypes=True)