NAME
dm - density modification package, release 1.6, 28/7/94
SYNOPSIS
dm HKLIN foo.mtz HKLOUT bar.mtz [ SOLIN foo.msk ] [ SOLOUT
bar.msk ] [ NCSIN1 foo1.msk [ NCSIN2 ... ] ]
REFERENCE
K. Cowtan (1994), Joint CCP4 and ESF-EACBM Newsletter on
Protein Crystallography, 31, p34-38.
DESCRIPTION
Dm is a package which applies real space constraints based
on known features of a protein electron density map in order
to improve the approximate phasing obtained from
experimental sources. Various information can be applied,
including such diverse elements as the following (see the
MODE keyword):
SOLV Solvent flattening [8]
HIST Histogram mapping [9]
AVER NCS averaging [2,6]
SKEL Skeletonisation [1,7]
SAYR Sayre's equation [5,9]
FLIP Solvent flipping and density truncation [10]
The program has three phase extension and combination modes,
which are selected by the appropriate choice of keywords.
Note that an arbitrary choice of keywords ignoring the
recommended scheme can lead to a worse map. The combination
mode is determined by the COMBINE keyword, if this keyword
is omitted then the program runs in Free-Sim mode, as it is
very hard to make a map worse in this mode:
Reflection Omit mode [11]
This mode gives best results with most combinations of
density modification constraints. It does however
drastically increase the computation time, and is
generally used only with SOLV, HIST methods.
`Solomon' mode [10]
This mode is based on J. P. Abrahams Solomon package,
and gives a good map very quickly. It can however only
be used with the FLIP, AVER methods.
Free-Sim (dm) mode [11]
This mode does not give as good a final map as the
other modes, but can be used reliably with any
combination of density modification constraints. In
this mode a Free-R like quantity is also generated.
Calculation of scale and B-factor for the data are
automatic. This is performed by comparison with an
empirically derived database of map variance at different
resolutions, and is more reliable than the conventional
Wilson plot.
Non-crystallographic symmetry averaging can be performed for
both proper and improper symmetries, and different NCS
averaging operations can be applied to different parts of
the protein. (Thanks to Dave Schuller for his help with
this). Input masks may be on any grid and axis order.
Skeletonisation is by the core-tracing algorithm of Swanson
[7]. This is faster than Greer's algorithm and allow
adjustment of the skeletonisation parameters without
recalculating the skeleton. As a result the skeletonisation
calculation is rendered largely automatic.
Using `dm' in Reflection Omit Mode
Selected by `COMBINE OMIT'. Implies `SCHEME ALL' and `NCYCLE
AUTO'. In this mode a reflection omit calculation is used
to reduce dependency between initial and modified F's. Phase
combination (between Fobs and omitted Fmod) is by the SigmaA
method. The time taken is proportional to the number of free
sets into which the data is divided (at least 10, default
20). Any density modification may be used, but AVER, SKEL,
SAYR will usually be too slow. The real-space-free-residual
is used to automatically stop the calculation, because after
a few cycles the map will stop improving and start to
deteriorate. A typical command file might contain the
following:
SOLC <solc>
MODE SOLV HIST [AVER]
COMBINE OMIT [SETS <numsets>]
[AVER...]
LABI ...
Using `dm' in Solomon Mode
Selected by `COMBINE SIGMAA'. Implies `SCHEME ALL' and
`NCYCLE AUTO'. This mode combines solvent flipping and
density truncation, with optional averaging. Phase
combination is by the SigmaA method. The real-space-free-
residual is used to automatically stop the calculation,
because after a few cycles the map will stop improving and
start to deteriorate. If averaging is used however the
calculation is much more stable and `NCYC' can be used to
increase the number of calculation cycles. A typical command
file might contain the following:
SOLC <solc>
MODE FLIP [AVER]
COMBINE SIGMAA
[AVER...]
Using `dm' in Free-Sim Mode
This mode can be used with any density modification except
solvent flipping. Phase combination is by the Free-Sim
method. A Free-R-like quantity is generated. NCYC <ncycle>
and COMBINE FREE <ncross> should be given. Any other
keywords except REAL or FLIP may be used. A typical command
file might contain the following:
SOLC <solc>
MODE [SOLV] [HIST] [AVER] [SKEL] [SAYR]
NCYC 10
SCHEME RES FROM 3.5
COMBINE FREE 1 [SETS <numsets>]
[AVER...]
LABI ...
...
Free Indicators
There are two Free indicators that `dm' can use. The first
is the density modification Free-R (defined in the same way
as the refinement Free-R). This is calculated in the Free-
Sim and Omit modes. Unfortunately, while effective for
refinement, it is a poor indicator of the progress of
density modification. A better indicator (due to J. P.
Abrahams) is the real-space-free-residual. This is
calculated by omitting two small spheres of protein and
solvent from the density modification. The flatness of the
solvent sphere and the histogram fit in the protein sphere
provide a better indication of progress.
INPUT/OUTPUT FILES
HKLIN
Input mtz file - This should contain the conventional (CCP4)
asymmetric unit of data (see CAD).
HKLOUT
Output mtz file.
SOLIN
Input solvent mask - This overrides the automatic Wang mask
determination. The input mask can have any grid and axis
ordering, and may have any extent from the protein region of
a single asymmetric unit to the whole cell.
NCSIN<i>
Input NCS averaging masks - These are used with the AVER
option. The input masks can have any grid or axis ordering,
and may cover a single monomer or the whole multimer.
SOLOUT
Output solvent mask - This will be on the program grid with
default axis order, and will cover the whole unit cell.
MAJOR KEYWORDS
(SOLC and MODE are compulsory)
MODE [SOLV] [HIST] [AVER] [SKEL] [SAYR] [FLIP]
Select the calculation to be performed:
SOLV = Solvent flattening
HIST = Histogram mapping
AVER = Non-crystallographic symmetry averaging
SKEL = Skeletonisation
SAYR = Sayre's equation
FLIP = solvent flipping and protein truncation (Solomon
mode)
SOLC <solc> [MASK <solvfrac> <protfrac>] [MEAN <solvval>
<protval>]
<solc>
= solvent content. ALWAYS INPUT THE CORRECT SOLVENT
CONTENT HERE TO ENSURE CORRECT SCALING. 0.0=all
protein, 1.0=all solvent.
MASK <solvfrac> <protfrac>
- used to set different mask volumes to the above for
histogram matching and solvent flattening.
<solvfrac> = fraction of cell to be masked as solvent.
<protfrac> = fraction of cell to be masked as protein.
If <solvfrac>+<protfrac> < 1.0 then there will be a
buffer region between solvent and protein which is
neither histogram matched or solvent flattened. This
feature is provided by popular demand, but makes things
worse in most of my test cases.
MEAN <solvval> <protval>
- used to set mean density for solvent and protein
regions. This affects scaling and density modification.
<solvval> = mean density in solvent region.
<protval> = mean density in protein region.
(defaults 0.32, 0.43 electrons per cubic angstrom)
RESOLUTION <rmin> <rmax>
Resolution range of reflections to include in the
calculation. By the end of the calculation all the
reflections in this range will be included, however at the
start only a subset are used, chosen on the basis of the
scheme card.
(default is the whole range of the input mtz file)
NCYCLE <ncycle> | AUTO
Number of cycles of phase extension to perform.
<ncycle>
= Number of cycles over which to perform phase
extension. Use 10 cycles for a quick result, try more
(20-100) but check the free-R factor. (Free-Sim mode).
AUTO = Run until the real-space-free residual stops
decreasing, then stop. This is used in the Reflection
Omit/Solomon modes, where running the calculation for
too many cycles can cause the map to get worse.
(defaults <ncycles>=10)
SCHEME ALL | AUTO | RES | MAG | FOM [[ FROM <res> ] [ FRAC
<frac> ]]
ALL - Use all reflections for the whole calculation. This
is used in the Reflection Omit/Solomon modes.
RES - perform phase extension in resolution steps, starting
with the low resolution data.
MAG - perform phase extension in magnitude steps, starting
with the largest reflections.
FOM - perform phase extension in FOM steps, starting with
the best phased data.
AUTO - perform phase extension using a combination of the
above chosen on the basis of what the data set looks
like. This option will also pick a reasonable value for
<frac>.
FRAC <frac>
- fraction of the input data to use as a starting set.
FROM <res>
- sets <frac> to the fraction of the data within a
resolution sphere radius <res>.
(default: AUTO)
COMBINE OMIT | SIGMAA | FREE <ncross> [ SETS <numsets> ]
OMIT Use reflection omit combination scheme, as part of the
reflection-omit mode.
SIGMAA
Use SigmaA combination, as part of the Solomon mode.
FREE <ncross>
Use Free-Sim phase combination. <ncross> = number of
times each step is performed to provide statistics for
the free-R and phase weighting.
For <ncross>=1 a changing random set of reflections are
omitted each cycle for the free-R factor.
For <ncross>=2 a fixed set is chosen (using the free-R
flag if available) and omitted for the free-R factor,
then the cycle is run a second time using all the
reflections.
For <ncross> > 2 (<ncross>-1) multiple free-R sets are
generated, then on the <ncross>-th cycle all
reflections are included.
The total time taken is proportional to the product of
these two values. Use <ncross> = 1 for large structures
where the time becomes a significant factor, otherwise
use <ncross> = 2. Only use <ncross> > 2 for small
structures where the statistics are particularly poor
(< 5000 reflections).
SETS <numsets>
<numsets> is the number of free sets into which the
data will be divided. These are used both in Free-Sim
and Reflection Omit modes. In reflection omit mode the
calculation time increases in proportion to the number
of free sets.
(defaults: FREE, <ncross>=1, <numsets>=20)
LABIN FP=.. SIGFP=.. [PHIO=.. FOMO=..] [HLA=.. HLB=.. HLC=..
HLD=..] [FDM=..] [PHIDM=..] [FOMDM=..] [FREE=..]
Normally just the first four columns (FP,SIGFP,PHIO,FOMO)
are input. However if you have Hendrickson-Lattman
coefficients you may want to input these to the program as
well (the difference is marginal except for SIR data). If
you want to start from the end of a previous density
modification calculation then the PHIDM, FOMDM columns are
used.
FP = F magnitude
SIGFP
= standard deviation, 0 for unmeasured
PHIO = best initial phase estimate
FOMO = weight attached to PHIO
If PHIO and FOMO are omitted, no phase recombination is
performed.
HLA-HLD
= Hendrickson Lattman coefficients
FDM,PHIDM,FOMDM
= map coefficients of the starting map to which density
modification is to be applied. e.g. from a previous
density modification calculation (phase and weight) or
difference map coefficients from SIGMAA (magnitude and
phase). FDM must be on the same scale as FP.
FREE = free-R flag (only used if <ncross> > 1)
LABOUT PHIDM=.. FOMDM=.. [FCDM=.. PHICDM=..]
Normally just the first two columns are output. Don't use
the other two unless you are a very clever person.
PHIDM
= modified phase
FOMDM
= weight attached to PHIDM
FCDM = F from final modified map before phase recombination
PHICDM
= Phase from final modified map before recombination
OTHER KEYWORDS
SKEL [ LENGTH <joinlen> <endlen> ] [ BFAC <bfac> ] [ EVERY
<nskl> ]
Perform iterative skeletonisation on the map. Cycles of
skeletonisation are interspersed with cycles of conventional
density modification.
<joinlen>
= length of skeleton in Angstrom/residue to generate
between density peaks.
<endlen>
= length of skeleton in Angstrom/residue to generate in
`trailing ends'.
<bfac>
= temperature factor to apply to the sharpened map
before skeletonisation.
<nskl>
= apply skeletonisation instead of every <nskl>-th
density modification cycle.
(defaults <joinlen>=6.0 <endlen>=6.0 <bfac>=45
<nskl>=3)
See also the document `dm_skeletonisation'.
AVERAGE <nncs> [REF [STEP <dr> <dphi>] [EVERY <nref>]]
[OVERLAP]
Set a NCS symmetry averaging operator. This card is followed
by <nncs> rotation/translation matrices on subsequent lines
in either CCP4 or O/RAVE format.
CCP4 Formats (see also the program `lsqkab')
ROTA EULER <alpha> <beta> <gamma> (Euler angles)
TRAN <t1> <t2> <t3>
or ROTA POLAR <omega> <phi> <kappa> (Polar angles)
TRAN <t1> <t2> <t3>
or ROTA MATRIX <r11> <r12> <r13> <r21> <r22> <r23> <r31>
<r32> <r33>
TRAN <t1> <t2> <t3>
O/RAVE Format
OMAT
r11 r21 r31
r12 r22 r32
r13 r23 r33
t1 t2 t3
(note that the rotation matrix is transposed with
respect to CCP4 matrix format)
where
x' = r11 x + r12 y + r13 z + t1
y' = r21 x + r22 y + r23 z + t2
z' = r31 x + r32 y + r33 z + t3
These are the operations which map the density in the region
covered by the input mask onto the other equivalent regions.
The first operator must be the identity matrix. The mask is
input in CCP4 mask (map mode 0) format on the input file
label NCSIN1, and should cover just one monomer or averaging
domain, NOT the whole unit cell. The mask grid need not
agree with the program grid.
If you want to apply different NCS operations to different
domains of the protein, use multiple AVER cards, and
multiple input masks. The first AVER card corresponds to the
mask on NCSIN1, the second to NCSIN2, etc. The masks should
be defined in the same multimer in the unit cell, or at
least in close proximity to one another.
The REF, STEP and EVERY cards will enable refinement of the
NCS rotation matrices between averaging cycles. The REF card
enables the refinement of a particular set of NCS
parameters. Note that the STEP card allows different
refinement step sizes can be used for different domains,
however all but one EVERY card will be ignored. The refined
matrices will be written out at the end of the log file.
<dr> = step size for refinement of positional parameters in
Angstrom.
<dphi>
= step size for refinement of rotational parameters in
degrees.
<nref>
= the number of phase extension cycles between each
parameter refinement.
The OVERLAP card forces overlap removal for all NCS-masks.
This was the default mode of operation for old versions of
`dm' which did not support multimer masks; it must not be
used if the NCS-mask covers a more than one monomer. Note
that the ncs-correlation statistics may be less reliable
when using a multimer mask.
(defaults <dr>=0.5 A, <dphi>=2.5 degrees, <nref>=3)
See also the document `dm_ncs_averaging'
GRID <nx> <ny> <nz>
Set the grid for the calculation. You may want to do this if
you want to include your own mask or dump a map or mask.
(defaults: minimum efficient factors above Nyquist spacing)
WANG <radius> <mode> [ LIMITS <rhomin> <rhomax> ]
Set the averaging radius and mode for calculating the
solvent mask.
<radius>
= radius of averaging sphere (Angstroms)
<mode>
= 0: Use weighting scheme w=constant (Spherical top
hat)
<mode>
= 1: Use weighting scheme w=1-(r/R) (Wang's method)
<mode>
= 2: Use weighting scheme w=1-(r/R)**2
Heavy atoms can bias the mask calculation procedure,
resulting in a mask of spheres around the heavy atom sites.
The LIMITS card can be used to set the values at which the
electron density is truncated before smoothing. To truncate
heavy atoms set <rhomax> to the maximum electron density due
to non-heavy atoms at the appropriate resolution.
(defaults <radius>=8.0 <mode>=1 <rhomin>=0.32 <rhomax>=2.0
e/A^3)
FLIP <flipfac> [TRUNC <fraction>]
<flipfac>
= amount by which to multiply density shifts with
respect to solvent flattening. 1.0=flattening.
2.0=flipping.
TRUNC <fraction>
= fraction of the protein region to truncate. The
truncation level will be set so that this fraction is
below it.
(defaults: <flipfac>=2.0 <fraction>=0.3)
REAL [SOLV <sx> <sy> <sz> <sr>] [PROT <px> <py> <pz> <pr>]
Set the coordinates and radii (in Angstrom) of the spherical
patches of density where the density modification
constraints will be omitted in order to provide a real-free
indicator of progress. If <sr> or <pr> is negative the
Solvent or Protein free indicator will be omitted.
(defaults: <sr>=4.0 <pr>=4.0, coordinates chosen from
solvent mask)
SCALE <scale> <bfac>
Override internal scaling and scale input data by F^2 =
<scale> * exp (<bfac> * s / 2.0) * F^2. Scaling is critical
to histogram mapping and Sayre's equation. In some cases you
may want to override the B-factor, but run without this card
first, and consider long and hard before changing scale.
LOOKING AT YOUR OUTPUT
Look at the free-R factor: This is listed both in the course
of the output, and also at the end in an Xloggraph table.
Expect some noise from cycle to cycle in the Free-R if you
are not using NCYC <n> with FREE 2 or greater.
The Xloggraph output, as well as showing the free-R factor,
gives some information on the quality and completeness of
the input data, and also a plot of the data fit against a
standard protein data set.
For NCS-averaging calculations, correlations are calculated
between related areas of density. These are summarised at
the end of the log file, and error or warning messages will
be generated if the initial values are too low: this is a
good indication of errors in the input matrices or mask.
COMMON PROBLEMS
A NCS-averaging mask file may not cover a volume larger than
the unit cell, otherwise the following error is generated:
`ccpmskin - Mask file bigger than unit cell?!'
This is unlikely to happen except in spacegroups with very
low symmetry (P1, P2, P21). If it does then it is likely
that the mask is padded with larger borders of zeros, or
that it covers more than one monomer, or that there is
significant overlap between symmetry equivalents of the
mask. Check the volume of the `set' area of the mask, if it
is much bigger than the volume of a single molecule then the
mask is certainly at fault.
AUTHOR
Kevin D. Cowtan, Department of Chemistry, University of York
email: cowtan@yorvic.york.ac.uk
REFERENCES
1. Baker D., Bystroff C., Fletterick R., Agard D. (1994)
Acta Cryst D49 429-439
2. Bricogne, G. (1974) Acta Cryst A30 395-405
3. Brunger, A. T. (1992) Nature 355, 472-474.
4. Cowtan K. D., Main, P. (1993) Acta Cryst D49 148-157
5. Sayre, D. (1974) Acta Cryst A30 180-184
6. Schuller D. (1995) in preparation
7. Swanson, S. (1994) Acta Cryst D50 695-708
8. Wang, B. C. (1985) Methods in Enzymology 115, 90-112
9. Zhang, K. Y. J., Main P. (1990) Acta Cryst A46 377-381
10. Abrahams, J. P. (1995) Acta Cryst D51 (in press)
11. Cowtan, K. D., Main, P. (1995) Acta Cryst D51 (in
press)
SEE ALSO
cad(1), lsqkab(1), xloggraph(1), dm_skeletonisation.doc,
dm_ncs_averaging.doc.
EXAMPLES
#
#[ a simple solvent/histogram calculation ]
#
dm hklin gmto.mtz hklout gmtodm.mtz << my-data
SOLC 0.35
MODE SOLV HIST
NCYCLE 10
LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM
LABOUT PHIDM=PHI1 FOMDM=W1
my-data
#
#[ a better solvent/histogram calculation, ]
#[ takes 20x as long, but gives a great map ]
#[ using reflection omit ]
#
dm hklin gmto.mtz hklout gmtodm.mtz << my-data
SOLC 0.35
MODE SOLV HIST
NCYCLE AUTO
SCHEME ALL
COMBINE OMIT
LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM FREE=FreeR_flag
LABOUT PHIDM=PHI1 FOMDM=W1
my-data
#
#[ a quick solvent flipping calculation, ]
#[ very fast and gives a good map using ]
#[ Solomon mode ]
#
dm hklin gmto.mtz hklout gmtodm.mtz << my-data
SOLC 0.35
MODE FLIP
NCYCLE AUTO
SCHEME ALL
COMBINE SIGMAA
LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM FREE=FreeR_flag
LABOUT PHIDM=PHI1 FOMDM=W1
my-data
#
# NON-CRYSTALLOGRAPHIC SYMMETRY AVERAGING
#[ a three fold averaging calculation ]
#[ This could also be done in Solomon mode,]
#[ or Omit mode if you have enough time ]
#
dm hklin chmimir.mtz hklout dmchm.mtz \
ncsin1 chmi.msk \
<< MY-DATA
SOLC 0.52
RESO 1000.0 2.1
NCYC 10
MODE SOLV HIST AVER
SCHEME AUTO
AVER 3 REF
ROTA POLAR 0.0 0.0 0.0
TRANS 0.0 0.0 0.0
ROTA POLAR 113.28130 103.41944 120.33858
TRANS 43.635 38.059 62.726
ROTA POLAR 66.58067 -76.78019 119.69176
TRANS 82.989 15.401 -8.928
LABI FP=F SIGFP=SIGF PHIO=PHIB FOMO=FOM
LABO PHIDM=PHIDM FOMDM=FOMDM
END
MY-DATA
#
# MULTI-DOMAIN AVERAGING
#[ a two fold averaging calculation with ]
#[ two domains and refinement of the 2nd ]
#[ set of averaging matrices. ]
#[ WARNING: IF YOU DONT KNOW WHAT MULTI- ]
#[ DOMAIN AVERAGING IS, YOU DONT NEED IT ]
#
dm hklin hpattj.mtz hklout dm1.mtz \
ncsin1 cwnads.mask ncsin2 cwglobs.mask \
<< EOF-dm
SOLC 0.57
MODE SOLV HIST AVER
NCYCLE 40
AVERAGE 2
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0
0.0 0.0 0.0
-0.71389002 -0.69492584 0.08611962
-0.69635397 0.69129372 -0.19136506
0.07357326 -0.19652288 -0.97735721
115.37364197 54.98566055 67.00005341
AVERAGE 2 REF
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0
0.0 0.0 0.0
0.75830859 0.65183645 0.00883542
0.65189570 -0.75824565 -0.00975925
0.00033828 0.01316060 -0.99991322
17.30371666 -47.10081482 68.99727631
LABIN FP=FP SIGFP=SIGFP PHIO=PHIml FOMO=FOMml -
HLA=HLA HLB=HLB HLC=HLC HLD=HLD
LABOUT PHIDM=PHIDM FOMDM=FOMDM
EOF-dm
#
# NOTE: If you don't know what multi-domain averaging is,
# you don't need it. Use the ncs averaging example, not
# the multi-domain example.
#