DataSet
- class reciprocalspaceship.DataSet(data=None, index=None, columns=None, dtype=None, copy=False, spacegroup=None, cell=None, merged=None)[source]
Bases:
DataFrame
Representation of a crystallographic dataset.
A DataSet object provides a tabular representation of reflection data. Reflections are conventionally indexed by Miller indices (rows), but can also be indexed by additional metadata. Per-reflection data can be stored as columns. For additional information about inherited methods and attributes, please see the Pandas.DataFrame documentation.
Attributes
Access acentric reflections in DataSet
Unit cell parameters (a, b, c, alpha, beta, gamma)
Access centric reflections in DataSet
Whether DataSet contains merged reflection data (boolean)
Possible reindexing operations (merohedral twin laws) for DataSet
Crystallographic space group
Methods
__init__
([data, index, columns, dtype, ...])apply_symop
(symop[, inplace])Apply symmetry operation to all reflections in DataSet object.
assign_resolution_bins
([bins, inplace, ...])Assign reflections in DataSet to resolution bins.
canonicalize_phases
([inplace])Canonicalize columns with phase data to fall in the interval between -180 and 180 degrees.
compute_dHKL
([inplace])Compute the real space lattice plane spacing, d, associated with the HKL indices in the object.
compute_multiplicity
([inplace, ...])Compute the multiplicity of reflections in DataSet.
Expands data by applying Friedel operator (-x, -y, -z).
Generates all symmetrically equivalent reflections.
find_twin_laws
([max_obliq, all_ops])Find merohedral and pseudo-merohedral twin laws for cell and spacegroup of DataSet given an obliquity threshold (degrees).
from_gemmi
(gemmiMtz)Creates DataSet object from gemmi.Mtz object.
from_structurefactor
(sf_key)Convert complex structure factors to structure factor amplitudes and phases
Return columns labels for data with complex dtype.
get_hkls
()Get the Miller indices in the DataSet as a ndarray.
Return column labels for data with M/ISYM dtype.
Return column labels for data with Phase dtype.
get_reciprocal_grid_size
([dmin, sample_rate])Determine an appropriate 3D grid size for reflection data.
hkl_to_asu
([inplace, anomalous])Map HKL indices to the reciprocal space asymmetric unit.
hkl_to_observed
([m_isym, inplace])Map HKL indices to their observed index using an
M/ISYM
column.infer_mtz_dtypes
([inplace, index])Infers MTZ dtypes from column names and underlying data.
is_isomorphous
(other[, cell_threshold])Determine whether DataSet is isomorphous to another DataSet.
join
(*args[, check_isomorphous])Join DataSets or named DataSeries using a database-style join on columns or indices.
label_absences
([inplace])Label systematically absent reflections in DataSet.
label_centrics
([inplace])Label centric reflections in DataSet.
merge
(*args[, check_isomorphous])Merge DataSet or named DataSeries using a database-style join on columns or indices.
remove_absences
([inplace])Remove systematically absent reflections in DataSet.
reset_index
([level, drop, inplace, ...])Reset the index or a specific level of a MultiIndex.
select_mtzdtype
(dtype)Return subset of DataSet’s columns that are of the given dtype.
set_index
(keys[, drop, append, inplace, ...])Set the DataSet index using existing columns.
stack_anomalous
([plus_labels, minus_labels, ...])Convert data from two-column anomalous format to one-column format.
to_gemmi
([skip_problem_mtztypes, ...])Creates gemmi.Mtz object from DataSet object.
to_numpy
([dtype, copy, na_value])Convert the DataSet to a NumPy array.
to_pickle
(path, *args, **kwargs)Pickle object to file.
to_reciprocal_grid
(key[, sample_rate, dmin, ...])Set up reciprocal grid with values from column,
key
, indexed by Miller indices.to_reciprocalgrid
(key[, sample_rate, dmin, ...])Deprecated: Set up reciprocal grid with values from column,
key
, indexed by Miller indices.to_structurefactor
(sf_key, phase_key)Convert structure factor amplitudes and phases to complex structure factors
unstack_anomalous
([columns, suffixes])Convert data from one-column format to two-column anomalous format.
write_mtz
(mtzfile[, skip_problem_mtztypes, ...])Write DataSet to MTZ file.
- property acentrics
Access acentric reflections in DataSet
- apply_symop(symop, inplace=False)[source]
Apply symmetry operation to all reflections in DataSet object.
- Parameters:
symop (str, gemmi.Op) – Gemmi symmetry operation or string representing symmetry op
inplace (bool) – Whether to return a new DataFrame or make the change in place
- assign_resolution_bins(bins=20, inplace=False, return_labels=True, format_str='.2f', return_edges=False)[source]
Assign reflections in DataSet to resolution bins.
Notes
If bin edges are provided, any reflections outside of the specified range are dropped.
- Parameters:
bins (int, list, or np.ndarray) – Number of bins or bin edges to use when assigning resolution bins. If bin edges are provided, they must be monotonic (default: 20)
inplace (bool) – Whether to add the column in place or return a copy (default: False)
return_labels (bool) – Whether to return a list of labels corresponding to the edges of each resolution bin (default: True)
format_str (str) – Format string for constructing bin labels
return_edges (bool) – Whether to return bin edges that define the resolution bin boundaries. The bin edges are returned as a 1-dimensional array with bins + 1 entries (default: False)
- Returns:
(DataSet, list), (DataSet, ndarray), (DataSet, list, ndarray) or DataSet
- canonicalize_phases(inplace=False)[source]
Canonicalize columns with phase data to fall in the interval between -180 and 180 degrees. This method will modify the values within any column composed of data with the PhaseDtype.
- Parameters:
inplace (bool) – Whether to modify the DataSet in place or return a copy
- Returns:
DataSet
- property cell
Unit cell parameters (a, b, c, alpha, beta, gamma)
- property centrics
Access centric reflections in DataSet
- compute_dHKL(inplace=False)[source]
Compute the real space lattice plane spacing, d, associated with the HKL indices in the object.
- Parameters:
inplace (bool) – Whether to add the column in place or return a copy
- compute_multiplicity(inplace=False, include_centering=True)[source]
Compute the multiplicity of reflections in DataSet. A new column of floats, “EPSILON”, is added to the object.
- Parameters:
inplace (bool) – Whether to add the column in place or to return a copy
include_centering (bool) – Whether to include centering operations in the multiplicity calculation. The default is to include them.
- expand_anomalous()[source]
Expands data by applying Friedel operator (-x, -y, -z). The necessary phase shifts are made for columns of complex dtypes or PhaseDtypes.
- Returns:
DataSet
- expand_to_p1()[source]
Generates all symmetrically equivalent reflections. The spacegroup symmetry is set to P1.
- Returns:
DataSet
- find_twin_laws(max_obliq=1.0, all_ops=False)[source]
Find merohedral and pseudo-merohedral twin laws for cell and spacegroup of DataSet given an obliquity threshold (degrees).
Notes
With max_obliq=1e-6 and all_ops=False, this method returns the same operators as DataSet.reindexing_ops
For additional information, see the GEMMI symmetry page.
- Parameters:
max_obliq (float) – Obliquity threshold (in degrees) as defined in Le Page, J Appl Cryst (1982). (default: 1.0)
all_ops (bool) – Whether to return all twin operators. If False, only non-redundant operators are returned (coset representative).
- Returns:
List of gemmi.Op
- classmethod from_gemmi(gemmiMtz)[source]
Creates DataSet object from gemmi.Mtz object.
If the gemmi.Mtz object contains an M/ISYM column and contains duplicated Miller indices, an unmerged DataSet will be constructed. The Miller indices will be mapped to their observed values, and a partiality flag will be extracted and stored as a boolean column with the label,
PARTIAL
. Otherwise, a merged DataSet will be constructed.If columns are found with the
MTZInt
dtype and are labeledPARTIAL
orCENTRIC
, these will be interpreted as boolean flags used to label partial or centric reflections, respectively.- Parameters:
gemmiMtz (gemmi.Mtz)
- Returns:
DataSet
- from_structurefactor(sf_key)[source]
Convert complex structure factors to structure factor amplitudes and phases
- Parameters:
sf_key (str) – Column label for complex structure factors
- Returns:
(sf, phase) (tuple of DataSeris) – Tuple of DataSeries for the structure factor amplitudes and phases corresponding to the complex structure factors
See also
DataSet.to_structurefactor
Convert amplitude and phase to complex structure factor
- get_complex_keys()[source]
Return columns labels for data with complex dtype.
- Returns:
keys (list of strings) – list of column labels with complex dtype
- get_hkls()[source]
Get the Miller indices in the DataSet as a ndarray.
- Returns:
hkl (ndarray, shape=(n_reflections, 3)) – Miller indices in DataSet
- get_m_isym_keys()[source]
Return column labels for data with M/ISYM dtype.
- Returns:
key (list of strings) – list of column labels with
M/ISYM
dtype
- get_phase_keys()[source]
Return column labels for data with Phase dtype.
- Returns:
keys (list of strings) – list of column labels with
Phase
dtype
- get_reciprocal_grid_size(dmin=None, sample_rate=3.0)[source]
Determine an appropriate 3D grid size for reflection data.
Returns the smallest grid size that yields a real-space grid spacing of at most dmin/sample_rate (in Å). The returned grid size will be ‘FFT-friendly’ (2, 3, or 5 are the largest prime factors), and will obey any symmetry constraints of the spacegroup.
- Parameters:
dmin (float) – Highest-resolution reflection to consider for grid size
sample_rate (float) – Sets the minimal grid spacing relative to dmin. For example, sample_rate=3 corresponds to a real-space sampling of dmin/3. Value must be >= 1.0 (default: 3.0)
- Returns:
list(int, int, int) – Grid size with desired spacing (list of 3 integers)
- hkl_to_asu(inplace=False, anomalous=False)[source]
Map HKL indices to the reciprocal space asymmetric unit. If phases are included in the DataSet, they will be changed according to the phase shift associated with the necessary symmetry operation.
If
DataSet.merged == False
, and a partiality flag labeledPARTIAL
is included in the DataSet, the partiality flag will be used to construct a proper M/ISYM column. Both merged and unmerged DataSets will have an M/ISYM column added.- Parameters:
inplace (bool) – Whether to modify the DataSet in place or return a copy
anomalous (bool) – If True, acentric reflections will be mapped to the +/- ASU. If False, all reflections are mapped to the Friedel-plus ASU.
- Returns:
DataSet
See also
DataSet.hkl_to_observed
Opposite of DataSet.hkl_to_asu()
- hkl_to_observed(m_isym=None, inplace=False)[source]
Map HKL indices to their observed index using an
M/ISYM
column. This method applies the symmetry operation specified by theM/ISYM
column to each Miller index in the DataSet. If phases are included in the DataSet, they will be changed by the phase shift associated with the symmetry operation.If
DataSet.merged == False
, theM/ISYM
column is used to construct a partiality flag labeledPARTIAL
. This is added to the DataSet, and the M/ISYM column is dropped. IfDataSet.merged == True
, theM/ISYM
column is dropped, but a partiality flag is not added.- Parameters:
m_isym (str) – Column label for M/ISYM values in DataSet. If m_isym is None and a single M/ISYM column is present, it will automatically be used.
inplace (bool) – Whether to modify the DataSet in place or return a copy
- Returns:
DataSet
See also
DataSet.hkl_to_asu
Opposite of DataSet.hkl_to_observed()
- infer_mtz_dtypes(inplace=False, index=True)[source]
Infers MTZ dtypes from column names and underlying data. This method iterates over each column in the DataSet and tries to infer its proper MTZ dtype based on common MTZ naming conventions.
If a given column is already a MTZDtype, its type will be unchanged. If index is True, the MTZ dtypes will be inferred for named columns in the index.
- Parameters:
inplace (bool) – Whether to modify the dtypes in place or to return a copy
index (bool) – Infer MTZ dtypes for named column(s) in the DataSet index
- Returns:
DataSet
See also
DataSeries.infer_mtz_dtype
Infer MTZ dtype for DataSeries
- is_isomorphous(other, cell_threshold=0.5)[source]
Determine whether DataSet is isomorphous to another DataSet. This method confirms isomorphism by ensuring the spacegroups are equivalent, and that the cell parameters are within a specified percentage (see cell_threshold).
- Parameters:
other (rs.DataSet) – DataSet to which it will be compared
cell_threshold (float) – Acceptable percent difference between unit cell parameters
- Returns:
bool
- join(*args, check_isomorphous=True, **kwargs)[source]
Join DataSets or named DataSeries using a database-style join on columns or indices. This method can be used to join lists
rs
objects to a given DataSet.For additional documentation on accepted arguments, see the Pandas DataFrame.join() API.
- Parameters:
check_isomorphous (bool) – If True, the spacegroup and cell attributes of DataSets in other will be compared to those of the calling DataSet to ensure they are isomorphous.
- Returns:
rs.DataSet
See also
DataSet.merge
Similar method with added flexibility for distinct column labels
- label_absences(inplace=False)[source]
Label systematically absent reflections in DataSet. A new column of booleans, “ABSENT”, is added to the object.
- Parameters:
inplace (bool) – Whether to add the column in place or to return a copy
- label_centrics(inplace=False)[source]
Label centric reflections in DataSet. A new column of booleans, “CENTRIC”, is added to the object.
- Parameters:
inplace (bool) – Whether to add the column in place or to return a copy
- merge(*args, check_isomorphous=True, **kwargs)[source]
Merge DataSet or named DataSeries using a database-style join on columns or indices.
For additional documentation on accepted arguments, see the Pandas DataFrame.merge() API.
- Parameters:
check_isomorphous (bool) – If True, the spacegroup and cell attributes of DataSets in other will be compared to those of the calling DataSet to ensure they are isomorphous.
- Returns:
rs.DataSet
See also
DataSet.join
Similar method with support for lists of
rs
objects
- property merged
Whether DataSet contains merged reflection data (boolean)
- property reindexing_ops
Possible reindexing operations (merohedral twin laws) for DataSet
- remove_absences(inplace=False)[source]
Remove systematically absent reflections in DataSet.
- Parameters:
inplace (bool) – Whether to add the column in place or to return a copy
- Returns:
DataSet
- reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=<no_default>, names=None)[source]
Reset the index or a specific level of a MultiIndex.
Reset the index to use a numbered RangeIndex. Using the level argument, it is possible to reset one or more levels of a MultiIndex.
- Parameters:
level (int, str, tuple, list) – Only remove given levels from the index. Defaults to all levels
drop (bool) – Do not try to insert index into dataframe columns.
inplace ; bool – Modify the DataSet in place (do not create a new object).
col_level (int or str) – If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill (object) – If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
allow_duplicates (bool) – Allow duplicate column labels to be created.
names (int, str, tuple, list) – Using the given string, rename the DataSet column which contains the index data. If the DataSet has a MultiIndex, this has to be a list or tuple with length equal to the number of levels.
- Returns:
DataSet or None – DataSet with the new index or None if inplace=True
See also
DataSet.set_index
Set index
- select_mtzdtype(dtype)[source]
Return subset of DataSet’s columns that are of the given dtype.
- Parameters:
dtype (str or instance of MTZDtype) – Single-letter MTZ code, name, or MTZDtype instance to return
- Returns:
DataSet – Subset of the DataSet with columns matching the requested dtype. If no columns of the requested dtype are found, an empty DataSet is returned.
- Raises:
ValueError – If dtype is not a string nor a MTZDtype instance
- set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)[source]
Set the DataSet index using existing columns.
Set the DataSet index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.
- Parameters:
keys (label or array-like or list of labels/arrays) – This parameter can be either a single column key, a single array of the same length as the calling DataSet, or a list containing an arbitrary combination of column keys and arrays.
drop (bool) – Whether to delete columns to be used as the new index.
append (bool) – Whether to append columns to existing index.
inplace (bool) – Modify the DataFrame in place (do not create a new object).
verify_integrity (bool) – Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
- Returns:
DataSet or None – DataSet with the new index or None if inplace=True
See also
DataSet.reset_index
Reset index
- property spacegroup
Crystallographic space group
- stack_anomalous(plus_labels=None, minus_labels=None, suffixes=('(+)', '(-)'))[source]
Convert data from two-column anomalous format to one-column format. Intensities, structure factor amplitudes, or other data are converted from separate columns corresponding to a single Miller index to the same data column at different rows indexed by the Friedel-plus or Friedel-minus Miller index.
This method will return a DataSet with, at most, twice as many rows as the original – one row for each Friedel pair. In most cases, the resulting DataSet will be smaller, because centric reflections will not be stacked. For a merged DataSet, this has the effect of mapping reflections from the positive reciprocal space ASU to the positive and negative reciprocal space ASU, for Friedel-plus and Friedel-minus reflections, respectively.
Notes
A ValueError is raised if invoked with an unmerged DataSet
It is assumed that Friedel-plus column labels are suffixed with (+), and that Friedel-minus column labels are suffixed with (-)
A ValueError is raised if stripping suffixes will lead to a duplicate column name
Corresponding column labels are expected to be given in the same order
- Parameters:
plus_labels (str or list-like) – Column label or list of column labels of data associated with Friedel-plus reflections
minus_labels (str or list-like) – Column label or list of column labels of data associated with Friedel-minus reflections
suffixes (list of strings) – Suffixes to identify column labels associated with Friedel-plus and Friedel-minus reflections. Only consulted if plus_labels and minus_labels are None. Defaults to (“(+)”, “(-)”)
- Returns:
DataSet
See also
DataSet.unstack_anomalous
Opposite of stack_anomalous
- to_gemmi(skip_problem_mtztypes=False, project_name='reciprocalspaceship', crystal_name='reciprocalspaceship', dataset_name='reciprocalspaceship')[source]
Creates gemmi.Mtz object from DataSet object.
If
dataset.merged == False
, the reflections will be mapped to the reciprocal space ASU, and a M/ISYM column will be constructed.If boolean flags with the label
PARTIAL
orCENTRIC
are found in the DataSet, these will be cast to theMTZInt
dtype, and included in the gemmi.Mtz object.- Parameters:
skip_problem_mtztypes (bool) – Whether to skip columns in DataSet that do not have specified MTZ datatypes
project_name (str) – Project name to assign to MTZ file
crystal_name (str) – Crystal name to assign to MTZ file
dataset_name (str) – Dataset name to assign to MTZ file
- Returns:
gemmi.Mtz
- to_numpy(dtype=None, copy=False, na_value=<no_default>)[source]
Convert the DataSet to a NumPy array.
This method will attempt to infer a consensus numpy dtype from the dtypes of the DataSet columns. If the DataSet is composed of all int32-backed MTZ dtypes and does contain NaN values, the returned dtype will be int32. For all other combinations of MTZDtype, the returned dtype will be float32. If the DataSet contains dtypes other than MTZDtype, the default Pandas behavior is used (see Pandas documentation).
- Parameters:
dtype (str or np.dtype) – The dtype to pass to np.asarray()
copy (bool) – Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary. (default: False)
na_value (Any) – The value to use for missing values. The default value depends on dtype and the dtypes of the DataSet columns.
- Returns:
np.ndarray
- to_pickle(path, *args, **kwargs)[source]
Pickle object to file.
This can be useful for saving non-MTZ compatible data files for future use. For additional documentation on accepted arguments, see the Pandas DataFrame.to_pickle() API.
- Parameters:
path (str) – File path where the pickled object will be stored.
See also
- to_reciprocal_grid(key, sample_rate=3.0, dmin=None, grid_size=None)[source]
Set up reciprocal grid with values from column,
key
, indexed by Miller indices.Notes
The data being arranged on a reciprocal grid must be compatible with a numpy datatype.
Any missing Miller indices are initialized to zero.
If explicitly provided, grid_size supersedes dmin and sample_rate for grid size determination.
The grid size determined using sample_rate and dmin will depend on the cell parameters of the dataset. If the grid size must be consistent across different isomorphous cell parameters, grid_size can be explicitly provided.
- Parameters:
key (str) – Column label for value to arrange on reciprocal grid
sample_rate (float) – Sets the minimal grid spacing relative to dmin. For example, sample_rate=3 corresponds to a real-space sampling of dmin/3. (default: 3.0)
dmin (float) – Highest-resolution reflection to consider for grid size. If None, dmin will be set to the highest resolution reflection in the dataset. The reflections used to populate the grid will also be truncated to dHKL >= dmin (default: None)
grid_size (array-like (len==3)) – If given, provides the explicit dimensions for 3D reciprocal grid. If None, grid size will be set based on sample_rate and dmin. If provided, this grid size will be used regardless of the values provided as sample_rate and dmin
- Returns:
numpy.ndarray
- to_reciprocalgrid(key, sample_rate=3.0, dmin=None, gridsize=None)[source]
Deprecated: Set up reciprocal grid with values from column,
key
, indexed by Miller indices.Warning
This function is deprecated. Use
to_reciprocal_grid()
instead.Notes
The data being arranged on a reciprocal grid must be compatible with a numpy datatype.
Any missing Miller indices are initialized to zero.
If explicitly provided, gridsize supersedes dmin and sample_rate for grid size determination.
The grid size determined using sample_rate and dmin will depend on the cell parameters of the dataset. If the grid size must be consistent across different isomorphous cell parameters, gridsize can be explicitly provided.
- Parameters:
key (str) – Column label for value to arrange on reciprocal grid
sample_rate (float) – Sets the minimal grid spacing relative to dmin. For example, sample_rate=3 corresponds to a real-space sampling of dmin/3. (default: 3.0)
dmin (float) – Highest-resolution reflection to consider for grid size. If None, dmin will be set to the highest resolution reflection in the dataset. The reflections used to populate the grid will also be truncated to dHKL >= dmin (default: None)
gridsize (array-like (len==3)) – If given, provides the explicit dimensions for 3D reciprocal grid. If None, grid size will be set based on sample_rate and dmin. If provided, this grid size will be used regardless of the values provided as sample_rate and dmin
- Returns:
numpy.ndarray
- to_structurefactor(sf_key, phase_key)[source]
Convert structure factor amplitudes and phases to complex structure factors
- Parameters:
sf_key (str) – Column label for structure factor amplitudes
phase_key (str) – Column label for phases
- Returns:
rs.DataSeries – Complex structure factors
See also
DataSet.from_structurefactor
Convert complex structure factor to amplitude and phase
- unstack_anomalous(columns=None, suffixes=('(+)', '(-)'))[source]
Convert data from one-column format to two-column anomalous format. Provided column labels are converted from separate rows indexed by their Friedel-plus or Friedel-minus Miller index to different columns indexed at the Friedel-plus HKL.
This method will return a smaller DataSet than the original – Friedel pairs will both be indexed at the Friedel-plus index. This has the effect of mapping reflections to the positive reciprocal space ASU, including data for both Friedel pairs at the Friedel-plus Miller index.
Notes
A ValueError is raised if invoked with an unmerged DataSet
- Parameters:
columns (str or list-like) – Column label or list of column labels of data that should be associated with Friedel pairs. If None, all columns are converted to the two-column anomalous format.
suffixes (tuple or list of str) – Suffixes to append to Friedel-plus and Friedel-minus data columns
- Returns:
DataSet
See also
DataSet.stack_anomalous
Opposite of unstack_anomalous
- write_mtz(mtzfile, skip_problem_mtztypes=False, project_name='reciprocalspaceship', crystal_name='reciprocalspaceship', dataset_name='reciprocalspaceship')[source]
Write DataSet to MTZ file.
If
DataSet.merged == False
, the reflections will be mapped to the reciprocal space ASU, and a M/ISYM column will be constructed.If boolean flags with the label
PARTIAL
orCENTRIC
are found in the DataSet, these will be cast to theMTZInt
dtype, and included in the output MTZ file.- Parameters:
mtzfile (str or file) – name of an mtz file or a file object
skip_problem_mtztypes (bool) – Whether to skip columns in DataSet that do not have specified MTZ datatypes
project_name (str) – Project name to assign to MTZ file
crystal_name (str) – Crystal name to assign to MTZ file
dataset_name (str) – Dataset name to assign to MTZ file