Other miscellaneous utilites

rs.scaleit

Run CCP4’s scaleit on the given data.

usage: rs.scaleit [-h] -r ref data_col sig_col -i mtz data_col sig_col
                  [-o OUTFILE] [--ignore-isomorphism]
-h, --help

show this help message and exit

-r <ref> <data_col> <sig_col>, --refmtz <ref> <data_col> <sig_col>

MTZ to be used as reference for scaling using given data columns. Specified as (filename, F, SigF) or (filename, I, SigI)

-i <mtz> <data_col> <sig_col>, --inputmtz <mtz> <data_col> <sig_col>

MTZ to be scaled to reference using given data columns. Specified as (filename, F, SigF) or (filename, I, SigI)

-o <outfile>, --outfile <outfile>

MTZ file to which scaleit output will be written

--ignore-isomorphism

Allow poorly isomorphous inputs to be scaled. By default (no flag) poorly isomorphous inputs will raise an error.

rs.precog2mtz

Convert precognition ingegration results to .mtz files for mergning in Careless.

usage: rs.precog2mtz [-h] [--remove-sys-absences]
                     [--spacegroup-for-absences SPACEGROUP_FOR_ABSENCES]
                     --spacegroup SPACEGROUP --cell CELL CELL CELL CELL CELL
                     CELL [-o MTZ_OUT]
                     ii_in [ii_in ...]
ii_in

Precognition .ii file(s)

-h, --help

show this help message and exit

--remove-sys-absences

Optionally remove systematic absences from the data according to –spacegroup or –spacegroup-for-absences if supplied.

--spacegroup-for-absences <spacegroup_for_absences>

Optionally use a different spacegroup to compute systematic absences. This may be useful for some EF-X data.

--spacegroup <spacegroup>

The spacegroup of the data

--cell <cell>

The unit cell supplied as six floats. For example, –spacegroup 34. 45. 98. 90. 90. 90.

-o <mtz_out>, --mtz-out <mtz_out>

Name of the output mtz file.

rs.rfree

Create an mtz containing rfree flags

usage: rs.rfree [-h] [-o OUTFILE] [-f FROM_FILE] [-c a b c alpha beta gamma]
                [-sg SPACEGROUP] [-d DMIN] -r RFRACTION [-s SEED]
-h, --help

show this help message and exit

-o <outfile>, --outfile <outfile>

Output MTZ filename

-f <from_file>, --from-file <from_file>

Use the cell and spacegroup from the specified mtz file. Either this or –cell and –spacegroup must be provided. If no –dmin is provided, dmin will be inferred from this file.

-c <a> <b> <c> <alpha> <beta> <gamma>, --cell <a> <b> <c> <alpha> <beta> <gamma>

Cell for output mtz file containing rfree flags. Specified as (a, b, c, alpha, beta, gamma)

-sg <spacegroup>, --spacegroup <spacegroup>

Spacegroup for output mtz file containing rfree flags

-d <dmin>, --dmin <dmin>

Maximum resolution of reflections to be included

-r <rfraction>, --rfraction <rfraction>

Fraction of reflections to be flagged as Rfree

-s <seed>, --seed <seed>

Seed to random number generator for reproducible Rfree flags

rs.scaleit

Run CCP4’s scaleit on the given data.

usage: rs.scaleit [-h] -r ref data_col sig_col -i mtz data_col sig_col
                  [-o OUTFILE] [--ignore-isomorphism]
-h, --help

show this help message and exit

-r <ref> <data_col> <sig_col>, --refmtz <ref> <data_col> <sig_col>

MTZ to be used as reference for scaling using given data columns. Specified as (filename, F, SigF) or (filename, I, SigI)

-i <mtz> <data_col> <sig_col>, --inputmtz <mtz> <data_col> <sig_col>

MTZ to be scaled to reference using given data columns. Specified as (filename, F, SigF) or (filename, I, SigI)

-o <outfile>, --outfile <outfile>

MTZ file to which scaleit output will be written

--ignore-isomorphism

Allow poorly isomorphous inputs to be scaled. By default (no flag) poorly isomorphous inputs will raise an error.

rs.extrapolate

Make extrapolated structure factors for refinement.

Equations

with reference:

F_{esf} = f * (F_{on} - F_{off}) + F_{ref} SigF_{esf} = sqrt( ( (f**2)*(SigF_{on}**2) ) + ( (f**2)*(SigF_{off}**2) ) + (SigF_{ref}**2))

with calc:

F_{esf} = f * (F_{on} - F_{off}) + F_{calc} SigF_{esf} = sqrt( ( (f**2)*(SigF_{on}**2) ) + ( (f**2)*(SigF_{off}**2) ) )

where f, is the extrapolation factor.

Notes

  • F_{off} and F_{calc} can be the same MTZ file, as done in Hekstra et al, Nature (2016). In that case, the equation for SigF_{esf} is adjusted to use (f-1)**2 for SigF_{off} to avoid double-counting in the error propagation.

  • At most one of F_{ref} and F_{calc} can be specified. If neither is specified, F_{calc} will be set to F_{off}.

  • After computing |F_{esf}|, any negative structure factor amplitudes are converted to positive values. This is to ensure that they are handled correctly downstream in phenix, and because they are technically amplitudes of complex numbers and the phase should just be flipped by 180 degrees.

usage: rs.extrapolate [-h] -on mtz f_col sig_col -off mtz data_col sig_col
                      [-calc mtz data_col] [-ref mtz data_col sig_col]
                      [-f FACTOR] [-o OUTFILE]
-h, --help

show this help message and exit

-on <mtz> <f_col> <sig_col>, --onmtz <mtz> <f_col> <sig_col>

MTZ to be used as on data. Specified as (filename, F, SigF)

-off <mtz> <data_col> <sig_col>, --offmtz <mtz> <data_col> <sig_col>

MTZ to be used as off data. Specified as (filename, F, SigF)

-calc <mtz> <data_col>, --calcmtz <mtz> <data_col>

MTZ to be used as calc data. Specified as (filename, F). At most one of -calc and -ref can be specified.

-ref <mtz> <data_col> <sig_col>, --refmtz <mtz> <data_col> <sig_col>

MTZ to be used as ref data. Specified as (filename, F, SigF). At most one of -calc and -ref can be specified.

-f <factor>, --factor <factor>

Extrapolation factor

-o <outfile>, --outfile <outfile>

Output MTZ filename

rs.mle_dw_extrapolate

Runs maximum likelihood estimation of model parameters (r,p) for DW-Extrapolator.

Notes

  • Uses scipy.optimize to minimize negative log likelihood

  • For more efficient runs, can run optimization on a subset of reflections in the datsets; control this

using the –subset flag

usage: rs.mle_dw_extrapolate [-h] --onmtz ONMTZ --offmtz OFFMTZ
                             [--use_structure_factors f_col sigf_col]
                             [--use_intensities i_col sigi_col]
                             [--nsamples NSAMPLES] [--nproc NPROC]
                             [--init_r INIT_R] [--init_p INIT_P]
                             [--bounds_r lower_bound upper_bound]
                             [--bounds_p lower_bound upper_bound]
                             [--maxiter MAXITER] [--seed SEED]
                             [--subset SUBSET] [--disable_progress_bar]
                             [--out OUT]
-h, --help

show this help message and exit

--onmtz <onmtz>

.mtz file for perturbed dataset

--offmtz <offmtz>

.mtz file for ground state dataset

--use_structure_factors <f_col> <sigf_col>, -use_SF <f_col> <sigf_col>

Use structure factors from French-Wilson scaling. Specified as (F, SigF)

--use_intensities <i_col> <sigi_col>, -use_I <i_col> <sigi_col>

Use integrated intensities. Specified as (I, SigI)

--nsamples <nsamples>, -n <nsamples>

Number of Monte Carlo samples (default 1e4)

--nproc <nproc>

Number of processes (default: cpu_count)

--init_r <init_r>

Initial guess for r

--init_p <init_p>

Initial guess for p

--bounds_r <lower_bound> <upper_bound>

Bounds for r

--bounds_p <lower_bound> <upper_bound>

Bounds for p

--maxiter <maxiter>

Max optimizer iterations

--seed <seed>

Random seed for MC samples

--subset <subset>

Optional number of reflections to randomly subsample for faster runs

--disable_progress_bar
--out <out>, -o <out>

Where to write JSON results

rs.dw_extrapolate

Runs DW-Extrapolator, a Bayesian inference procedure to infer excited state structure factors in perturbative crystallography datsets.

Equations

The underlying model assumes that ground state (GS) and excited state (ES) structure factors have correlation r and that the observed “on” state structure factors are given by F^{ON} = (1-p)*F^{GS} + p*F&{ES}.

Notes

  • At minimum, two .mtz’s for the off and on data need to be provided

  • DW-Extrapolator can be run using French-Wilson scaled structure factors or integrated intensities

usage: rs.dw_extrapolate [-h] -on ONMTZ [ONMTZ ...] -off OFFMTZ [OFFMTZ ...]
                         [-use_SF f_col, sigf_col f_col, sigf_col]
                         [-use_I i_col, sigi_col i_col, sigi_col]
                         [-n NSAMPLES] [-r RDW] [-p ES_FRACTION] [-f FACTOR]
                         [-o OUTFILE] [--nproc NPROC] [--default_scan]
                         [--disable-progress-bar] [--seed SEED]
-h, --help

show this help message and exit

-on <onmtz>, --onmtz <onmtz>

.mtz file for perturbed dataset

-off <offmtz>, --offmtz <offmtz>

.mtz file for ground state dataset

-use_SF <f_col, sigf_col>, --use_structure_factors <f_col, sigf_col>

Use structure factors from French-Wilson scaling. Specified as (F, SigF)

-use_I <i_col, sigi_col>, --use_intensities <i_col, sigi_col>

Use integrated intensities. Specified as (I, SigI)

-n <nsamples>, --nsamples <nsamples>

Number of importance samples

-r <rdw>, --rDW <rdw>

Double Wilson r (correlation) parameter

-p <es_fraction>, --es-fraction <es_fraction>

Excited state fraction p

-f <factor>, --factor <factor>

Extrapolation factor f = 1/p

-o <outfile>, --outfile <outfile>

Output file name

--nproc <nproc>

Number of processors for multiprocessing

--default_scan

Run default scan with r=0.9 and p from 0.05 to 0.5 in steps of 0.05

--disable-progress-bar

Disable tqdm progress bar

--seed <seed>

Random seed for generating Monte Carlo samples