HEAVY is a general-purpose macromolecular x-ray diffraction phasing
package. It can be used to scale and manipulate data to make maps,
get isomorphous and anomalous differences, import and export,
convert from binary to formatted files, convert MAD data to
a form similar to SIR+anomalous differences, and prepare data for
difference refinement. It contains the HEAVY and HASSP routines
that allow searches for solutions to a Patterson function and
refinement of heavy atom parameters in the MIR method.
To run HEAVY on the DEC-ALPHAs add the following line to your .login file:
alias heavy /joule2/programs/heavy/heavy
Also you can check some example files in the directory /joule2/programs/heavy.
************************ COPYRIGHT NOTICE *********************************
Los Alamos National Laboratory
Copyright, 1994. The Regents of the University of California.
This software was produced under a U.S. Government contract
(W-7405-ENG-36) by Los Alamos National Laboratory, which is
operated by the University of California for the U.S. Department
of Energy. The U.S. Government is licensed to use, reproduce,
and distribute this software. Permission is granted to the
public to copy and use this software without charge, provided
that this Notice and any statement of authorship are reproduced
on all copies. Neither the Government nor the University makes
any warranty, express or implied, or assumes any liability or
responsibility for the use of this software.
******************************************************************************
Tom Terwilliger Los Alamos National Laboratory
please send all correspondence to the author at: "terwilliger@lanl.gov"
version 3.0 of August, 1994
******************************************************************************
INTRODUCTION
This is a program that is designed to analyze MIR data and
calculate and analyze Patterson and Fourier maps. It can also be used
to convert MAD data into a form that is similar to SIR+anomalous
data, and this data can then be analyzed in just the way that standard
SIR+anomalous is analyzed. It can also be used to prepare data for
difference refinement.
GETTING STARTED:
This package uses the binary "DORGBN"-style data files. You will
probably need to use the routine IMPORT (part of UTIL) to convert your
data to this format. Note that "unobserved" data can be entered as either
"-1.000" or "0.000" in this package, but cannot be blank.
The space-group information in this package is read from a symmetry file
(SYMFILE) that you write. It contains the symmetry equivalents from
the International Tables.
It is a good idea to start out by setting your preferences for
symmetry file and fft grids at the very beginning using "PREFERENCES" to
set them and save them. This will result in a small file called
"preferences.dat" that the program will consult every time you run it.
Please see the section on PREFERENCES in the documentation below.
You can run HEAVY interactively or using keywords. It may be easiest to
start out using it interactively so you get an idea of how the program
works. The keyword mode has more options, however, and is a better way
to use the program in general. See *** KEYWORD in the documentation
below.
You may find it convenient to use two screens when running HEAVY,
one for the program and the other for displaying the relevant portion
of this documentation file with the listing of keywords you may wish to set.
ABOUT THIS DOCUMENTATION:
This file describes each of the options in the HEAVY package. If you are
looking for information on a particular command in HEAVY, say "LOCALSCALE",
then search this file for the string: "*** LOCALSCALE" and you will find
yourself at the writeup for LOCALSCALE.
At the end of the documentation on the programs is a section on the formats
of files used in this package, including symmetry files and binary DORGBN files.
The documentation ends with a description of how to use
HASSP and HEAVY to solve heavy atom derivatives.
-----------------------------------------------------------------------------
If you have any difficulty using these programs, please contact me by e-mail at
the address listed above. I will be happy to attempt to assist you. Further,
if you find any bugs or make any substantial modifications to these programs, I
would like to hear about them.
You are free to modify these programs as you wish, but they may not be used in
any commercial package without permission.
-----------------------------------------------------------------------------
**** DOCUMENTATION FOR HEAVY
This is a program to manipulate data files, to scale data using a local
scaling routine, to calculate maps, to analyze maps, to convert MAD data
to pseudo-SIR+anomalous data, to convert convert native and mutant
structure factor data to a form useful for "difference refinement", to
search for heavy atom sites in a Patterson (HASSSP) and to refine heavy atom
parameters (routine HEAVY within this program).
This program runs interactively or using KEYWORDS. It should be
mostly self-explanatory.
------------------------------------------------------------------------------
COMPILING AND RUNNING THIS PROGRAM USING HEAVYV3.SCRIPT:
The program is supplied with a set of test data and a script file that
will run the data through and demonstrate some capabilities. Put all the
files in one directory. Then...
On a VAX system, just use:
$FOR HEAVY
$LINK HEAVY
Then run it with:
$RU HEAVY
SCRIPT ! YOU CAN THEN RUN THE TEST DATA THROUGH WITH THIS FILE:
HEAVYV3.SCRIPT
END
On an SGI, you probably need extra scratch space (8 MB) for compiling it.
Its easist to do this as follows, using a directory on a disk with lots of
extra space:
mkdir scratchspace
setenv TMPDIR scratchspace
f77 heavy.f -v -static -o heavy.out
chmod +x heavy.out
Then run it with:
heavy.out
overwrite ! Note on the sgi you need to overwrite files or it will dump.
script
heavyv3.script
end
------------------------------------------------------------------------------
The options in HEAVY are:
MAPS: calculate Fourier and Patterson maps,
GETISO: calculate differences between two data columns,
GETANOM: convert from F+ and F- to Fbar and DelAnom,
GETPHASES: convert from A,B coefficients to F and phase,
IMPORT: read in formatted file with h,k,l,data, stripping off any text,
EXPORT: write out h,k,l, data without any titles,
BTOF: convert from binary dorgbn file to formatted one,
FTOB: convert from formatted dorgbn file to binary one,
FILEMERGE: combine or extract data columns from dorgbn files,
MERGE: merge equivalent reflections and write out asymmetric unit,
MADMRG: create pseudo-SIR+anom data from MAD data,
PEAKSEARCH: find peaks in Fourier map,
FFTTOBOSS: convert asymmetric unit of FFT to any portion of cell in BOSS format
FFTTODSN6: convert asymmetric unit of FFT to any portion of cell in DSN6 format
FFTTOFSFOUR: convert asymmetric unit of FFT to FSFOUR format
LOCALSCALE: scale one dataset to another with local scaling,
COMPLETE: determine completeness of a dataset,
WEIGHTS: generate weighting for atomic refinement using Fo-Fc and sigmas,
FDIFF: Create pseudo-mutant dataset for difference refinement,
MAPTOASYM: Map a PDB file onto the asymmetric unit of the crystal,
MAPTOOBJECT: Map atoms to equivalent position close to specified object,
HEAVY: refine heavy atom parameters,
HASSP: search for solutions to a Patterson function,
PREFERENCES: change symmetry file, cell dimensions, and grids for FFT,
VIEW: view a binary dorgbn file ,
SCRIPT: read in commands from a file,
LOG: copy output to a log file,
OVERWRITE: overwrite output files if version numbers are not present,
KEYWORD: use keywords to set parameters instead of prompting,
HELP: get information on this program,
END: end the program
These options are described in detail below.
------------------------------------------------------------------------------
**** MAPS -- calculate Patterson and Fourier maps
This routine reads in data from a dorgbn-style file and calculates any of
a variety of maps. It requires that you have already specified a symmetry file
of matrices of equivalent positions, the cell dimensions of your space
group and a grid for your map using the routine "PREFERENCES" at some
earlier time or that the file "preferences.dat" exists.
The command structure for this routine is:
1. IMAPTYPE (0 for Patterson, 1 for Fourier)
2. File containing data
--If Patterson--
3. ICOL (coumn number of containing |F|**2
or column number of column containing F or del F
4. KTYPE 1 if icol contains F or del F; 0 if icol contains F**2
--if Fourier--
3. JFOURTYPE
0 = read in A, B directly
1 = read in F or Del F and a phase in degrees
2 = read in delF ano and a phase in degrees
3= read in Fo,Fc, and phase for (Fo-Fc)exp(i PHIc)
4 = read in Fo,Fc, and phase for (2Fo-Fc)exp(i PHIc)
3a. Column numbers for A,B or column number for F or del F or del F Ano,
or column numbers for Fo,Fc
3b. Column number for phase if Ktype >0
4. Resolution range to consider
5. The name of the output FFT file that contains just an asymmetric unit of
the map (or whatever you have specified as such in your preferences).
The routine will calculate the FFT. If this is a Fourier, the program
willthen prompt to ask if you would like to find the peaks or minima
in the map (see PEAKSEARCH). For both Fouriers and Pattersons, the program
will prompt you to ask if you would like to convert the map to
BOSS format, expanding the map to any range of grid points you would
like (see FFTTOBOSS). It will also ask if you want to convert it to
DSN6 format (for reading into TOM or O). This will only work on an SGI.
It will also ask if you want to convert the map to
FSFOUR format for display with MAPVIEW (see FFTTOFSFOUR).
KEYWORDING inputs:
KEYWORDS VALUES
PATTERSON Patterson map
NCOLPATT**2 Column # in input map for squared Patterson coefficients
-or-
NCOLPATT column # containing Patterson coefficients
FOURIER Fourier map, reading in A, B coefficients directly
NCOLFA column for A
NCOLFB column for B
NATFOURIER same as isofourier, below
ISOFOURIER F exp(iPHIc) map (F = del iso or F, phase = PHI)
ANOFOURIER F exp(i[PHIc-90]) map (F = del Ano, phase = PHI - pi/2)
NCOLF column for F or Del F
NCOLPHI column for PHI (in degrees)
FOFC Fo-Fc exp(i PHIc) map
2FOFC 2Fo-Fc exp(i PHIc) map
NCOLFOBS column for Fo
NCOLFC column for Fc
NCOLPHI column for PHI (in degrees)
INFILE name of input DORGBN file with data
FOURCOFILE name of optional output file with A,B coefficients
FFTFILE name of output file with FFT in UCLA format
PEAKFILE name of optional file with nlist high and low peaks
(program finds peaks if this file is defined)
BOSSFILE name of optional file with FFT in BOSS format
(program writes bossfile if it is defined)
DSN6FILE name of optional file with FFT in DSN6 format
(program writes DSN6 file if it is defined)
(NOTE: this only works on an SGI!)
FSFOURFILE name of optional file with FFT in FSFOUR format
(program writes FSFOUR file if it is defined)
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
(DMIN and DMAX are required)
NLIST number of high and low peaks to list to peakfile
ISYMMETRY number of symmetry equivalents for each peak to list
(put a big number to get all within all adjacent
unit cells)
PDB list peaks in PDB format
FRACT list peaks in fractional format
-------------------------------------------------------------------------------
**** GETISO
**** GETANOM
**** GETPHASES
These are routines to take information from several columns of a dorgbn
file and to generate a new file containing either (1) isomorphous differences
between datasets, (2) data in the form F+,F- converted to Fbar and del Ano, or
(3) A,B coefficients converted to structure factor amplitude and
phase in degrees.
The command structures and operation of each of these routines is
shown below:
GETISO:
1. Input dorgbn file
2. Output dorgbn file (1 column of "del F")
3. Overall title for output file
4. Columns for Fnat and sigma of Fnat in input file (0 if absent)
5 Columns for Fder and sigma of Fder in input file (0 if absent)
6. Minimum ratio of Fnat/sig or Fder/sig to consider (all others tossed)
7. Resolution range to consider (e.g., 2.5 100.)
The program then calculates Del F = Fder-Fnat for the selected reflections
and writes it to the output dorgbn file. If either Fder or Fnat is missing
(less than the minimum ratio of F/sig) the reflection is ignored.
GETANOM:
1. Input dorgbn file
2. Output dorgbn file (4 columns: Fbar,sig of Fbar, Del Ano, sig of Del Ano)
3. Overall title for output file
4. Columns for: F+, sigma of F+ in input file
5. Columns for: F-, sigma of F- in input file
6. dmin,dmax (resolution range to consider, unless previously specified)
The program then calculates Fbar = (F+ + F-)/2
and del Ano = (F+ - F-), for the selected reflections and writes it to
the output dorgbn file. If F+ or F- are missing (F less than or equal to
0), the one present is written out as Fbar and del Ano and sig of
del Ano are set to 0.0.
GETPHASES
1. Input dorgbn-style file
2. Output file (2 data columns. F and phase)
3. Title for output file
4. Column numbers for A, B = F cos(PHI), F sin(PHI)
5. dmin,dmax (resolution range to consider, unless previously specified)
Routine calculates F and PHI in degrees from A and B and writes them
to the output file. Data where A or B are both equal to -1.0 are rejected as
are data where A and B are both equal to 0.
KEYWORDING:
For each of these routines, you need to specify:
INFILE name of dorgbn file with data
OUTFILE name of output dorgbn file
FILETITLE optional title for output file
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
For GETISO
NNATF column number for Fnat
NNATS column number for sigma of Fnat
NDERF column number for Fder
NDERS column number for sigma of Fder
RATMIN minimum ratio of F/sig to include at all
For GETANOM:
NCOLFP column number for Fplus
NCOLSFP column number for sigma of Fplus
NCOLFM column number for Fminus
NCOLSFM column number for sigma of Fmimus
For GETPHASES:
NCOLFA column number for F cos(phi) = A
NCOLFB column number for F sin(phi) = B
-------------------------------------------------------------------------------
**** IMPORT
**** EXPORT
These are utilities to bring formatted data into dorgbn format and to
write out formatted data without titles (BTOF writes out titles).
The routine IMPORT has several options. You can simply read the data
from a formatted file in, assuming it is h,k,l, and columns of data. You can
also swap indices (as H->K, K->L, L->K) as you read it in. You
can also sort the data and map it to the asymmetric unit of the space
group. Ordinarily you will want to sort and map the data, as some of
the other routines in the package (notably FILEMERGE) assume that the
data has been sorted in a particular order of hkl.
When you IMPORT data, it is essential that the input file has the same number
of data columns in every line of the file. The input file can have text
in the middle of a data column, this text will be ignored.
EXPORT writes out a formatted version of a dorgbn-style file, with no
titles. It can be read back in with IMPORT.
The commands for IMPORTing data are:
1. Input formatted data file name
(the program will then type the first 3 lines of the file as read in, and
then again after stripping off any text)
2. Output dorgbn-style file name
3. Overall title for output file
4. Number of columns of data (not counting h,k,l) in input file
5a...Title for each of these columns of data
6. Overall scale factor to apply to all data
7. lsort: 'y' to sort and map data, 'n' to leave it as is
8. lswap: 'y' to swap indices hkl
(only read if lswap='y'): 8a. HNEW: index H will be mapped to HNEW
That is, if you want to map old H->new K, old K->new L, old L ->new K, then
you specify HNEW = "K"
Commands for EXPORT
1. Input dorgbn-style file name
2. Output formatted file name
KEYWORDING:
IMPORT ignores keywords. Use it interactively.
EXPORT keywords:
INFILE name of file to be exported
OUTFILE name of output formatted file
-------------------------------------------------------------------------------
**** BTOF
**** FTOB
Binary TO Format conversion (BTOF)
Format TO Binary conversion (FTOB)
These routines convert data in DORGBN-structured binary files to and
from formatted files. See documentation on DORGBN files at the end of this
filefor a description of the DORGBN file structure.
These programs are interactive:
BTOF will prompt for "INPUT FILE>" which is the binary file to be
converted to formatted structure, and, "OUTPUT FILE>" which is the
new formatted file.
FTOB will prompt for "INPUT FILE>" which is the formatted file to be
converted to DORGBN structure, and "OUTPUT FILE>" which is the
new DORGBN file.
You may wish to alter these programs to read your standard data files and
convert them to DORGBN files.
KEYWORDING:
INFILE name of input file
OUTFILE name of output file
-------------------------------------------------------------------------------
**** FILEMERGE: Binary reflection data manipulation
This routine allows manipulation of binary data files that are in
the "dorgbn" format. You can extract one or more "columns" of data from
a file, duplicate columns from a file, or combine parts of different files.
This routine is based on the UCLA program DORGBN. Data are stored in
binary files in the form h,k,l,resolution,and columns of "data".
At the beginning of the file are an overall title for the dataset and
individual titles for each "data" column.
The program is intended to be run interactively. The command format
is:
1. # of Files to be opened for input (up to 4)
2a... Input file names 1...n.
3. Output file name
4. Title for output file. This title should describe the
contents of the entire file.
5. Command lines. Each command can specify a range of columns of
data from some particular file to be incorporated into the output file.
These data columns are incorporated in the order in which the commands are
specified. Command input parameters: (these are 1 or 2 digit integers and an
optional title, all separated by commas and no blanks):
1. NFILE: The number of the file treated.
2. ICOL,JCOL,TITLE: the RANGE of columns to be copied. If you want to
copy just column #3, specify: 3,3 here.
The title is an optional title to be substituted for
that already associated with the first column in the range.
If the specified range includes more than one column and this
title field is used, the titles for all the other columns in
the range specified by this command must be input in the very next records.
If this field is blank, the old title will be used.
7. More command lines.
8. Blank line to signify the end of input.
An additional useful feature of the program is that if a command
is entered with a valid file number but with the column numbers missing or
incorrect, the titles of the columns on that file are printed for the user.
EXAMPLE OF USE OF DORGBN. (ALL CHARACTERS BEFORE THE ">" ARE TYPED BY
PROGRAM)
DORGBN
OUTPUT FILE>KI.PAT (THE NEW DORGBN-STYLE FILE)
INPUT FILE 1>KI.DRG (THE STARTING DORGBN FILE)
INPUT FILE 2> (CARRIAGE RETURN TO INDICATE END
OF LIST OF INPUT FILES)
OVERALL TITLE>PATT COEFFICIENTS FOR KI.DRG (TITLE)
COMMAND>1,1,1,PATT COEFFS COLUMN 1 FROM KI.DRG (WRITE COLUMN
1 FROM
STARTING DORGBN FILE INTO NEW FILE.
NOTE THAT H,K,L AND RES ARE ALSO
WRITTEN AUTOMATICALLY)
COMMAND> (CARRIAGE RETURN TO INDICATE END OF
COMMANDS)
FORTRAN STOP
-------------------------------------------------------------------------------
**** MERGE
This is a routine that merges measurements of structure factor amplitudes and
rejects outliers. It summaryizes the quality of the dataset in a listing of
R-factors on I and on F.
The method followed by the program is:
1. group equivalent reflections together, analyze 1 group at a time.
2. get mean, sd for this group
3. reject observations differing from mean by >4 sigma
4. reject reflection outright if Chi-squared is greater than and ikeepflag=0
5. calculate stats based on what's left
6. figure out the relationship between sigmas in the input files and
reasonable estimates of the true sigmas by assuming that the reduced
chi-square would equal 1.0 if the correct sigmas were present. The data
are fit to the equation,
Sig**2(I)=Sig**2(Poisson)+( A*I)**2
and all sigmas are corrected with this factor.
6. write out mean, SEM for the reflection
Control parameters for MERGE are:
1. Resolution range to consider (e.g., 2.5 100)
2. ikeepflag =0 to toss reflections with high chisqr, 1 to keep them
3. Output dorgbn-style file
4. Number of input files
5a. First input dorgbn-style file... It is ASSUMED that columns 1,2 are your
values of F and sigma. (If this is not true, you need to run FILEMERGE
first to create such a file)
5b. Another input data file name (if more than one) .
Notes: The input data files do not need to have
data in any particular order or to have complete datasets.
The data are written out starting with minimum H,K,L and incrementing
L fastest, then K, then H.
The program reports the number of rejects as NNN + MMM where
NNN = the number rejected as being too far from the mean for that
reflection and MMM is the number of reflections rejected completely with
chisqr > 20.
KEYWORDS:
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
NFILES # of input files (1 to 4)
INFILE(1) input file 1
INFILE(2)... input file 2 (up to 4 files)
KEEPALL keep all reflections, regardless of merging chisqr (default)
TOSSBAD toss reflections with merging chisqr> 20
Note: KEEPALL and TOSSBAD also apply to LOCALSCALE
OUTFILE output file
-------------------------------------------------------------------------------
**** MADMRG: Convert MAD data into a form similar to that used
in the analysis of SIR data + anomalous differences.
Reference: Terwilliger, T. C. (1994). MAD phasing: treatment of
dispersive differences as isomorphous replacement information.
Acta Cryst. D50, 17-23.
Function: MADMRG reads in measurements of Fbar and the anomalous differences
DelAno at several wavelengths for each reflection. From this data and the
known values of f' and f" for the anomalously scattering atoms at
these wavelengths, the program estimates (1) the magnitude of the
structure factor corresponding to all atoms except the anomalous
scatterer (Fo),(2) the "isomorphous" difference that would be
measured +/- the anomalous scatterer at a standard wavelength,
and (3) the anomalous difference that would be measured
at this standard wavelength. In this way, the MAD data is converted to a form
identical to that used in the analysis of SIR+anomalous differences data.
Method: Assume that structure factor due to anomalous scatterer is not
large compared to that due to all other atoms. Then iso differences among
various wavelengths are proportional to differences in (f+f') for
the anomalous scatterer, and ano diffs at each wavelength
are proportional to f". Scale all the ano diffs to a common
wavelength, then average them. Take all the iso diffs (e.g., L3-L1, L3-L2,
L1-L2), scale each iso diff by:
(f+f' at std wavelength)/(difference in f+f' at the 2 wavelengths)
to obtain estimate of what would be measured for the structure factor amplitude
due to the entire structure at the standard wavelength minus the structure
factor amplitude of the entire structure without the anomalously scattering
atoms. That is, estimate, delta Fiso (+/- ano scatterer at standard
wavelength). Finally, obtain estimates of what delta Fiso would be at
each wavelength by scaling std Fiso by f+f' at that lambda. This
allows the program to obtain estimates of Fo, the structure factor
amplitude due to all non-anomalously scattering atoms from each
value of (Fbar - Fiso at that lambda). These estimates of Fo are averaged.
The result of these manipulations is a pseudo-SIR+anomalous differences dataset.
The "native" structure factor amplitude is Fo, the estimate of the structure
factor amplitude due to all non-anomalously scattering atoms at the standard
wavelength. The "derivative" structure factor amplitude is Fo plus the
isomorphous difference, delta Fiso, corresponding to the contribution of the
anomalously scattering atoms at the standard wavelength. The anomalous
difference is the averaged anomalous difference, scaled to the value
at the standard wavelength.
Generally, the standard wavelength is chosen to be one well away from the
absorption edge of the anomalously scattering atoms, so that f' is small or
negligible. This is not essential, however.
USING MADMRG
Required scattering factor table:
Table of scattering factors for heavy atom at various
wavelengths. Read from madmrg.STD or any other file name you specify
if you use keywording. (modify existing file if
necessary. All data is in form used in Int Tables vol. IV.)
Note: madmrg.std is currently set up for selenium. It is trivial to
change it for any other atom
The order of scatter factors is a1,a2,a3,a4..., not a1,b1,a2,b2...
as listed in the new International Tables, volume C, p.500-502.
f' and f" spectra are listed in "Macromolecular Crystallography with
Synchrotron Radiation" by John Helliwell, Cambridge University Press 1992.
Appendix table A3.2
Control parameters:
1. Resolution range to consider (i.e., 2.5 100)
2. File containing scattering factors (MADMRG.STD)
3. Input dorgbn-style data file.
4. Output dorgbn-style data file.
5. Number of protein residues in asymmetric unit
6. Number of anomalously scattering atoms in asymmetric unit.
7. Number of wavelengths represented in data file
--for each wavelength:--
8a. Title for this wavelength
9a. wavelength number
10a. Columns in input file for Fbar,sigma of Fbar, DelAno, sig of Delano at
this wavelength
--------
11. Wavelength # to refer all data to for output. This is usually the
wavelength far from the absorption edge for the anomalous scatterer.
KEYWORDS:
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
INFILE input file
OUTFILE output file
STDFILE file containing scattering factor info (e.g., MADMRG.STD)
NRES # of protein residues in asymmetric unit
NANOMALOUS # of anomalously scattering atoms in the a.u.
NSET # of wavelengths in dataset
For each wavelength, specify:
LABEL(1) label for wavelength 1 (LABEL(2) for wavelength 2...etc)
JLAMBDA(1) wavelength ID for this wavelength (usually 1 for lambda 1)
NCOLFBAR(1) Fbar for this wavelength
NCOLSFBAR(1) sigma of Fbar
NCOLDELF(1) Del F ano (Fplus - Fminus)
NCOLSDELF(1) sigma of del F ano
JSTD wavelength ID for wavelength to be considered the STANDARD
Notes on MADMRG:
Detailed description of input parameters:
OUTPUT FILE: This a DORGBN-style output file containing 7 columns of
data. They are:
1 madmrg est of Fp-zero ("Fnative")
2 madmrg sig of fp-zero ("sig of Fnative")
3 madmrg est of del iso
4 madmrg sig of del iso ("Sig of Fderiv")
5 madmrg est of del ano ("Delano")
6 madmrg sig of del ano ("Sig of Delano")
7 madmrg: MOCK FDER ("Fderiv";
equal to Fp-zero + del iso)
To use this data as MIR + anomalous differences, simply use columns 1 and
2 as Fp (native F) and sigma; columns 7 and 4 as Fder (derivative F) and
sigma; and columns 5 and 6 as delano and sigma.
Please note: The output of MADMRG is set up to be used as "Mock" native and
derivative data. When you refine heavy atom parameters using this mock
dataset, you must define a heavy atom type that has scattering factors
identical to those you use in MADMRG at the "standard" wavelength. That is,
if you define lambda 3 as "standard" in madmrg and f" at lambda 3 is 8.9933,
then when you get to heavy atom refinement with routine HEAVY
you will need to define an atom type "L3" (or something) that has all the
right scattering factor information including an f" of 8.9933. Use the
keyword NEWATOMTYPE in HEAVY to do this easily.
PLEASE NOTE: when you use this data in your MIR program, DO NOT refine
an overall scale factor and B for the "derivative." The overall scale
factor and B of the derivative relative to the (pseudo) native are
absolutely perfect to start with (because of the way the derivative has
been set up). In this package, use the flag "NOREFINESCALE" for the
derivative.
The reason to use column 4 as sigma of Fder is that heavy atom refinement
programs such as HEAVY assume that errors in Fp and Fder are independent.
In this case they are not. Suppose you estimated the error in Fder
by combining errors in Fo and deliso. Then your heavy atom refinement
program would estimate the error in Fder-Fnat by combining the
errors you give it for Fder (based on errors in deliso + Fo) and the
errors you give it for Fnat (the error in Fo). The estimates of errors in
Fder-Fnat would therefore contain the errors in Fo twice. If you use
column 4 as sigma of Fder, the errors in Fder-Fnat will be correctly
calculated based on deliso and Fo.
Input data file: This input data must be scaled carefully. MADMRG does not
scale your data for you.
# OF PROTEIN RESIDUES. The program assumes that the B-factor for the
anomalously scattering atoms is similar to that for all other atoms.
Using the number of protein residues and the number of anomalously
scattering atoms on the next line, the program estimates the rms
value of structure factor amplitudes due to anomalously scattering
atoms as a function of resolution. Each of these are for the asymmetric unit.
Annotated and condensed example using MADMRG, with comments in []
Scattering form factors read for 3 wavelengths. [read from MADMRG.STD]
1: lambda = 0.9797. a(4),b(4),c,fp,fpp for S,Se,N:
6.905 5.203 1.438 1.586 1.468 22.215 0.254 56.172 0.867 0.319 0.557
17.001 5.820 3.973 4.354 2.410 0.273 15.237 43.816 2.841 -9.851 2.858
12.213 3.132 2.013 1.166 0.006 9.893 28.997 0.583-11.529 0.000 0.000
Set # 1 Label: l1 [lambda 1 data]
Set # 2 Label: l2 [lambda 2 data]
Set # 3 Label: l3 [labmda 3 data]
Set: 1 2 3
Lambda #: 1 2 3
Lambda : 0.9797 0.9794 0.9000
N protein : 644 644 644 [# of protein atoms in a.u.]
# of S or Se: 2 2 2 [# of anom scattering atoms]
Fxn Se : 1.0000 1.0000 1.0000 [fractional substitution of
S with Se, always 1.0 for
all
other atoms]
Col for Fbar: 1 5 9 [columns in input data file]
Col for Sig : 2 6 10
Col DelfAno : 3 7 11
Col for sig : 4 8 12
Which of the 3 wavelengths is to be defined as the "standard"
at which all values of Fa will be calculated? >
3, std wavelength #
Wavelength # 3 with lambda=0.9000, corresponding to
data set 3 will be used as the standard wavelength.
Output file: DATA.MAD
col 0 : madmrg output: *
col 1 : madmrg est of Fp-zero *
col 2 : madmrg sig of fp-zero *
col 3 : madmrg est of del iso *
col 4 : madmrg sig of del iso *
col 5 : madmrg est of del ano *
col 6 : madmrg sig of del ano *
col 7 : madmrg: MOCK FDER *
Based on space groupC2 the first acentric reflection in this
dataset is: ( -25 1 2).
[space group-specific test for centric reflns]
Based on space group C2 the first centric reflection in this
dataset is: ( -24 0 1).
1763 REFLECTIONS READ FROM THIS FILE
Form factors at lambda = 0.9797 [lambda 1 form factors as fn of reso
[If your anomalously scattering atom is
100% subtituted, ignore the "S" data.
If your atom is not Se, you have
altered MADMRG.STD and "Se" means
whatever atom you put in there.]
dmin: 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
f (S): 14.52 12.96 12.20 11.88 11.58 11.32 11.09 10.86 10.62 10.37
f" (S): 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56
f (Se): 21.64 19.40 18.28 17.80 17.33 16.92 16.57 16.19 15.79 15.36
f" (Se): 2.86 2.86 2.86 2.86 2.86 2.86 2.86 2.86 2.86 2.86
f (N): 6.20 5.42 5.01 4.82 4.65 4.49 4.36 4.21 4.06 3.90
f" (N): 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Form factors at lambda = 0.9794
dmin: 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
f (S): 14.52 12.96 12.20 11.88 11.58 11.32 11.09 10.86 10.62 10.37
f" (S): 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56
f (Se): 22.86 20.62 19.50 19.01 18.55 18.14 17.78 17.40 17.00 16.58
f" (Se): 4.88 4.88 4.88 4.88 4.88 4.88 4.88 4.88 4.88 4.88
f (N): 6.20 5.42 5.01 4.82 4.65 4.49 4.36 4.21 4.06 3.90
f" (N): 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Form factors at lambda = 0.9000
dmin: 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
f (S): 14.52 12.96 12.20 11.88 11.58 11.32 11.09 10.86 10.62 10.37
f" (S): 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56
f (Se): 29.87 27.63 26.51 26.03 25.56 25.15 24.80 24.42 24.02 23.59
f" (Se): 3.28 3.28 3.28 3.28 3.28 3.28 3.28 3.28 3.28 3.28
f (N): 6.20 5.42 5.01 4.82 4.65 4.49 4.36 4.21 4.06 3.90
f" (N): 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Scaling factors to renormalize data:
"all" = relative expected total rms F at this lambda
"prot" = relative expected protein rms F at this lambda
"SE or S to Se" = relative expected Se or S rms F
at this lambda (relative to Se at standard wavelength)
For dataset # 1 Label: l1 (*
lambda = 0.9797
dmin: 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
scale (all): 0.98 0.98 0.98 0.98 0.98 0.98 0.97 0.97 0.97 0.97
scale (prot): 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (SEorS to Se): 0.72 0.70 0.69 0.68 0.68 0.67 0.67 0.66 0.66 0.65
anom ratio: 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
For dataset # 2 Label: l2 (*
lambda = 0.9794
dmin: 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
scale (all): 0.99 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.97
scale (prot): 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (SEorS to Se): 0.77 0.75 0.74 0.73 0.73 0.72 0.72 0.71 0.71 0.70
anom ratio: 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
For dataset # 3 Label: l3 *
lambda = 0.9000
dmin: 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
scale (all): 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (prot): 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (SEorS to Se): 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
anom ratio: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Enter name of .drg file containing reflection data>
Summary of averaging results.
Total of 1763 reflections written out
Type Mean Chi Naveraged Chi cutoff n rejected
Iso 0.7203 1763 3.0000 0
Ano 0.9044 1763 3.0000 0
Fbar 0.5980 1763 3.0000 0
[ These are estimates of sqrt(chi**2) for estimation of isomorphous and
anomalous differences, and Fo. They should be near 1.0. Reflections for which
chi**2 is > 3.0 are tossed.]
Summary of merging statistics by shell:
shell Nobs Mult RMS value rms error Norm RMS value *
1 4.000 Iso : 732. 3.0 125.544 48.835 0.376 0.146
Ano: 732. 3.0 28.096 14.242 0.084 0.043
Fbest: 732. 3.0 333.477 40.914 1.000 0.123
2 3.000 Iso : 1031. 3.0 86.414 33.233 0.347 0.133
Ano: 1031. 3.0 21.607 10.912 0.087 0.044
Fbest: 1031. 3.0 248.971 27.400 1.000 0.110
All Iso : 1763. 3.0 104.456 40.448 0.364 0.141
Ano: 1763. 3.0 24.510 12.403 0.085 0.043
Fbest: 1763. 3.0 287.094 33.676 1.000 0.117
[ This table says that: there were 732 isomorphous differences estimated in
the shell from 4 A to infinity. There were an average of 3 estimates of each
isomorphous difference (one from each lambda). The rms isomorphous difference
was 125.544. The rms error estimate was 48.835. The rms isomorphous difference
normalized to the rms estimate of Fo was 0.376, and the normalized sigma was
0.146.
Note that the errors in the isomorphous and anomalous differences are quite
large, even in this test case with model data.]
------------------------------------------------------------------------------
**** PEAKSEARCH
This is a routine that finds high and low points in a Fourier map. It is
not applicable to Patterson maps (use HASSP for that purpose). It assumes
that the FFT has been calculated over the entire asymmetric unit and uses
the symmetry file to map neighboring grid points on to the asymmetric unit.
It reports the highest peaks in the map, with the height being the highest
value of the FFT on a grid point and the coordinates being the centroid of
the peak.
The routine can read in an FFT written by UTIL or it can be called at the
end of routine MAPS. Note that it cannot read BOSS format data.
Control parameters for PEAKSEARCH
1. Name of FFT file containing the asymmetric unit of the map
2. NPEAK, the number of positive and negative peaks to list (maximum = 100)
3. Name of file to write the peaks out to
4. ISYMMETRY= # of symmetry equivalents to write out for each peak.
If you want all peaks in the region used for FFT calculations, use
ISYMMETRY=0. If you want all in the region used for FFTTOBOSS, then use
ISYMMETRY=-1.
5. Format to write peak list (PDB format with waters or fractional)
The routine will write out NPEAK highest and NPEAK lowest peaks to the
output file. The "B-factor" in the PDB format file is the peak height/1000.
The final column in the fractional-format file is the peak height/1000.
KEYWORDS:
FFTFILE name of fft-containing file
NLIST number of peaks (high, same # of low) to list
ISYMMETRY # of symmetry equivalents to list for each peak
-1 for all within FFTTOBOSS region, 0 for all within
FFT region. 1= default= just 1 for each peak
PEAKFILE name of output file
PDB write peaks in PDB format
FRACT write peaks in fractional format
POSITIVEONLY only list positive peaks
NEGATIVEONLY only list negative peaks
------------------------------------------------------------------------------
**** FFTTOBOSS
This is a routine that converts an asymmetric unit of an FFT in the UCLA
FFT format to any region of the map in the BOSS format. This routine is
applicable to Fourier maps, but can be used with Patterson maps as long
as the output region is contained within the input FFT.
Control parameters:
1. Name of file containing FFT map
2. 0 for Patterson, 1 for Fourier map
3. Output file (BOSS format)
4. Title for output file
The output map is calculated using the same grid as the input FFT, but
the endpoints in x,y, and z can be different. The program generates the
entire unit cell from the input FFT if the endpoints of the output BOSS map
are not contained within the input FFT.
The grids for the input FFT and the output BOSS map are set in
PREFERENCES.
The output map is scaled so that the rms value of the map is 5.000.
KEYWORDS
FFTFILE name of FFT-containing file
BOSSFILE name of output BOSS format file
FILETITLE optional title for file
------------------------------------------------------------------------------
**** FFTTODSN6
This is a routine that converts an asymmetric unit of an FFT in the UCLA
FFT format to any region of the map in the DSN6 (brick) format for TOM/FRODO
or O. This routine is applicable to Fourier maps, but can be
used with Patterson maps as long as the output region is contained
within the input FFT.
Control parameters:
1. Name of file containing FFT map
2. 0 for Patterson, 1 for Fourier map
3. Output file (DSN6 format)
4. Title for output file
The output map is calculated using the same grid as the input FFT, but
the endpoints in x,y, and z can be different. The program generates the
entire unit cell from the input FFT if the endpoints of the output BOSS map
are not contained within the input FFT.
The grids for the input FFT and the output map (same as for the BOSS-format
map) are set in PREFERENCES.
The output map is scaled so that the rms value of the map is 5.000.
KEYWORDS
FFTFILE name of FFT-containing file
DSN6FILE name of output DSN6 format file
FILETITLE optional title for file
------------------------------------------------------------------------------
**** FFTTOMAPVIEW
This is a routine that converts an asymmetric unit of an FFT in the UCLA
FFT format to a format compatible with the PHASES package. This is
useful for displaying the map using MAPVIEW in the PHASES package.
Control parameters:
1. Name of file containing FFT map
2. Output file (PHASES-compatible format)
The output map is calculated using the same grid as BOSS-style output
maps. This grid may be set in PREFERENCES and is called IBOSSGRID.
For Pattersons, this grid must be contained within the FFT grid, IFFTGRID
The grid for the input FFT is set in PREFERENCES.
KEYWORDS
FFTFILE name of FFT-containing file
MAPVIEWFILE name of output BOSS format file
------------------------------------------------------------------------------
**** LOCALSCALE
This is a package to scale a "derivative" dataset to a "native" dataset using
local scaling. In this method the scale factor for a particular reflection
is based on the ratio of derivative:native for reflections surrounding this
reflection. This method is useful because the scale factor is not restricted
to any particular function of position in reciprocal space.
In this implementation, at least 30 reflections surrounding the reflection
to be scaled are used to obtain a scale factor. Additionally, the reflections
used in obtaining a scale factor are always chosen so that they form a
complete sphere around the reflection of interest (inasmuch as possible).
Initial Wilson scaling is carried out before local scaling.
Data files: The program expects to read in two data files: one for the
native dataset and one for the derivative. The native dataset is expected
to have h,k,l, F and sigma (at least). The derivative dataset is expected
to have h,k,l, F, and sigma, and, if desired, del F ano and sigma of del
F ano. The scale factor obtained for the derivative F is applied to all of
the derivative data.
A dorgbn-style file is written out containing the scaled derivative data.
If you wish to have the derivative and native data in the same file, then
follow this with the routine "FILEMERGE" and merge the two files.
Command parameters:
1. Resolution range to consider (all data outside the range is ignored)
2. Data file containing native data
3. column numbers for native F and sigma (use "0" if no native sigma)
4. Data file containing derivative data
5. column numbers for derivative F, sigma, del Ano, sigma of del ano
(use "0" if missing; you need to input 4 numbers here)
6. Name of output data file for scaled derivative data
7. Title for output file
8. inotoss = 1 if you want to keep reflections with del F much greater
than expected from the sigmas and the rms deviations for other
reflections; 0 if you want to toss these reflections.
Suggestion: use "1" for most cases, but if you are about to
use GETISO to calculate a Patterson or difference Fourier, you
might want to exclude these outliers and use "0".
9. minimum number of reflections to use in scaling. Suggested value: 30
10. minimum ratio of F/sig for native or derivative to be read in at all.
Suggested value: "0.0". The program only uses data with F/sig>3.0
for actually calculating scale factors, so this value does not
affect your scale factors. It only affects what data is scaled
and your R-factors at the end and what data is written out.
KEYWORDS:
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
INFILE(1) name of file with Native data
INFILE(2) name of file with Derivative data to scale to Native
OUTFILE name of output file with scaled derivative data
NNATF column # for F of native data
NNATS column # of sigma of F of native data
NDERF column # for F of deriv data
NDERS column # for sigma of F of deriv data
NANOF column # of anomalous difference (Fplus-Fminu) of deriv data
NANOS column # of sigma of anomalous difference
(note: be sure to set those you don't want to 0)
FILETITLE optional title for output file
KEEPALL keep reflections even with high differences (default)
TOSSBAD Toss reflections if differences between native and derivative
are more than 3 * the rms found for other reflections.
Note: KEEPALL and TOSSBAD also apply to MERGE
ANCUT minimum # of reflections to use to scale a reflection (30.)
RATMIN minimum ratio of F/sigma to include
Notes:
1. A value of 0 or less for fnat or fder is assumed to mean
data are not measured. A value of 0.0 or -1.0 for del f ano is assumed to
mean the data are not measured also.
2. If sigmas are not supplied at all, then a value of 1.0000 will be assumed.
This can affect what data are read in if you specify a minimum F/sig >0.0
------------------------------------------------------------------------------
**** COMPLETE
This is a routine to determine the completeness of a dataset. It maps
input data to the asymmetric unit of the space group and calculates the
percentage of data that is present.
Control parameters:
1. Resolution range to consider (e.g., 2.5 100)
2. Name of file containing data to be examined.
3. Column numbers for F and sigma of F (enter 0 if sigma not present)
4. Minimum ratio of F/sigma to include
KEYWORDS:
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
INFILE name of file to be examined
NNATF column # for F of data
NNATS column # of sigma
RATMIN minimum ratio of F/sigma to include
------------------------------------------------------------------------------
**** WEIGHTS
This is a routine to generate weighting factors for atomic refinement.
The weighting factors are based on both experimental sigmas and on rms
values of (Fobs-Fcalc)**2 in ranges of resolution. The premise for this
type of weighting is that the atomic model used to generate Fcalc is
incomplete. This leads to an expected difference between Fobs and Fcalc
that is larger for centric reflections than for acentrics by a factor of
sqrt(2). The errors in the fit of the model to the data are divided into
two parts, one due to errors in measurement and one due to errors in the
model. It is assumed that errors in measurement are reasonably well known.
If this is not the case, then just do not include them (see below).
The errors in the model are estimated in a shell of resolution as
E**2 = [ < (Fobs-Fcalc)**2 > - ] / Q
where Q=1 for centric reflections and 0.5 for acentric reflections.
The weighting factor applied to a particular reflection is then:
WEIGHT = 1/( Q * E**2 + Sigma**2 )
where Q is again 0.5 for acentric reflections and 1 for centric ones.
Reflections where Sigma is not >0 or Fobs is not > ratmin*sigma or Fcalc is
not >0 are ignored and not written out.
Control parameters:
1. Resolution range to consider (e.g., 2.5 100)
2. Name of file containing Fobs,sigma and Fcalc data.
3. Column numbers for F and sigma of F and Fcalc (enter 0 if sigma not present)
4. Output file name
5. Minimum ratio of F/sigma to include= ratmin
KEYWORDS:
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
INFILE name of file with Fobs,sigma, and Fcalc
OUTFILE name of output file in X-PLOR format with h,k,l,fobs,sig,weight
NNATF column # for F of data
NNATS column # of sigma
NCOLFC column # of Fcalc
RATMIN minimum ratio of F/sigma to read in at all
------------------------------------------------------------------------------
**** FDIFF
This routine creates a pseudo-mutant dataset for difference refinement.
It is used in cases where a "WT" structure has been refined and a
"mutant" dataset is available, and where it is the differences between
the WT and mutant structures that is of interest. This routine
takes Fcalc for the WT dataset and Fobs for the WT and mutant
datasets to create a pseudo-mutant dataset: Fdiff and sigma. These are
then used just as if they were Fobs,mutant and sigma in refinement of the
mutant structure.
The value of Fdiff is given by:
Fdiff = Fc (WT) + (Fobs, mutant - Fobs, WT)
Control parameters for FDIFF:
1. Input file with Fc (WT), Fo (WT), sig(WT), Fo(MUT), sig(MUT)
2. Output file name for Fdiff, sigma and Delta=(Fo,MUT-Fo,WT)
3. Column number in input file for Fc(WT)
4. Column numbers for Fo(WT) and sigma of Fo(WT)
5. Column numbers for Fo(MUT) and sigma of Fo(MUT)
6. Overall title for output file
7. Resolution range to consider (dmin, dmax)
The program ignores reflections where any of these F's are missing (not
positive)
KEYWORDS:
INFILE input file with WT Fc, Fo and MUT Fo
OUTFILE output file with FDIFF,sig,Del
FILETITLE optional title for output file
NCOLFC column # for WT Fc (WT Fcalc) in input file
NCOLFOWT column # for WT Fo (WT Fobs) in input file
NCOLSWT column # for sigma of WT Fo
NCOLFOMUT column # for MUT Fo (MUT Fobs) in input file
NCOLSMUT column # for sigma of MUT Fo
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
------------------------------------------------------------------------------
**** MAPTOASYM
**** MAPTOOBJECT
These routines map the atoms in a PDB file, one by one, using crystallographic
symmetry. MAPTOASYM maps atoms into the asymmetric unit of the space group,
as defined by the symmetry file and the (FFTGRID) grid specified for FFT
calculations, assumed to contain the asymmetric unit. MAPTOOBJECT maps
atoms to their symmetry equivalents closest to an atom in a second PDB
("object") file.
Inputs:
MAPTOASYM
1. Input PDB file name
2. Output PDB file name.
MAPTOOBJECT
1. Input PDB file name
2. PDB file name containing ojbect to map atoms close to
3. Output pdb file name
KEYWORDS
INFILE(1) Input pdb file name
INFILE(2) Object pdb file name for MAPTOOBJECT
OUTFILE output pdb file name.
--for MAPTOOBJECT--
DISMIN Only atoms with closest distance to atom in object between
DISMAX dismin and dismax will be written out. Default: 0. to 1000000.
------------------------------------------------------------------------------
**** PREFERENCES
This is a routine to set up the user's preferences for
matrices of equivalent positions, cell parameters, and grids for FFT
calculations. If desired, the preferences will be saved in a file
called "preferences.dat". If a file with this name exists in the
default directory at startup of the program, the preferences will be
read in and it is not necessary to enter them unless they need to
be changed.
This routine should usually be run interactively. Prompts are given
for a file containing the symmetry information, for cell
parameters and for the grids for FFT calculations. If called when
keywording is in use, the routine simply saves all current values.
The grids for FFT calculations may be somewhat confusing. There are 3 grids.
One (IFFTGRID) is for calculating an asymmetric unit of a Fourier. For this
grid,the values of starting and ending grid points must be in the range from
0 to the number of grid points for the cell translations.
The second grid (IPATTGRID) is for the asymmetric unit of a Patterson. The same
restrictions apply as for the Fourier.
The third grid (IBOSSGRID)is for creating output BOSS-format maps
from the asymmetric unit of an FFT. The range of grid points in
this map may be anywhere in the range of -256 to 25 in any direction. It can
therefore be adjusted to draw the output map around the contours of
a protein molecule, for example. For this BOSS grid, the number of
grid points in the lattice along x,y, and z is identical to that for the
FFT, only the starting and ending grid points are different.
The cell parameters, symmetry file and the grids for the Fourier,
Patterson, and Boss maps can be set using keywords as well.
Note on grids:
All grids for pattersons, Fourier, or other output must have unit
cell translations that are multiples of 2 and 3 and no other numbers.
Furthermore, any translations in the equivalent positions must correspond to
an integral number of grid units. This means that along an axis with a
6-fold screw axis, the cell translation must be a multiple of 6, and in a
centered cell, the cell translation along those directions must be a
multiple of 2.
You may always use any of the following grids for the cell translation, as
they are multiples of 6,2,and 3: 6, 12, 18, 24, 36, 48, 54, 72, 96,
108, 144, 162, 192, and 216.
The usual way to set up your grids is as follows. Suppose you have cell
dimensions of 30 x 40 x 90 A, and the resolution of your data is to 2 A.
You will want a grid that is about 1/3 of the resolution, or a 0.7 A
spacing. This would be 40 along x, 53 along y, and 120 along z. Since it
is best to use one of the grids listed above, just choose the closest one
to each, usually going to the higher of the two closest ones: 48 x 54 x 128.
Now suppose the asymmetric unit of your unit cell is 0-1, 0-1/2, 0-1/2. Then
your grid statement for your fouriers will be:
FFTGRID 0 48 48 0 27 54 0 64 128
That is, your FFT will go from 0 to 48 in x where the cell translation is 48,
0 to 27 in y where the cell translation is 54, and 0 to 64 in z where the cell
translation is 128.
The grid for your Patterson maps will usually be the same as for your fouriers,
except that the asymmetric unit will usually be half the asymmetric unit of
the fourier. That is, your Patterson grid statement will look like:
PATTGRID 0 24 48 0 27 54 0 64 128
Finally, you may want to output Fourier maps that are larger than the
asymmetric unit of your crystal. For example, you may want the map to
surround your molecule in the crystal. You can specify any range from -256
to 256 in each direction for this grid, but you cannot change the numbers
corresponding to the cell translation. Suppose your molecule goes from
-0.25 to 0.5 in x, 0 to .5 in y, and .25 to .75 in z. Your BOSSGRID is then:
BOSSGRID -12 24 0 27 32 96
Note that you only need 6 numbers for BOSSGRID as you are not respecifying
the cell translations.
The keywords you may wish to set for PREFERENCES are:
KEYWORD
CELL a,b,c,alpha,beta,gamma (angstroms, degrees)
SYMFILE name of space group symmetry file
MEPFILE name of space group symmetry file (same as SYMFILE)
FFTGRID 9 integers specifying grid for fourier
These are 3 for each direction x,y,z: nxs,nxe,nx, etc...
nxs is starting grid point in x, nxe is ending grid point,
nx is # of grid points corresponding to the entire cell edge.
PATTGRID 9 integers specifying grid for patterson, same form as fftgrid
BOSSGRID 6 integers specifying range of grid for Boss. Here only the
starting and ending grid points in each direction are
input, as the cell edge is defined by the fftgrid.
------------------------------------------------------------------------------
**** VIEW
This is a routine that allows you to view binary dorgbn-style files.
Control parameters:
1. Input dorgbn-style file
2. number of records to print out
KEYWORDS:
INFILE input file name
NLIST # of records to write out
------------------------------------------------------------------------------
**** SCRIPT
This is a routine that allows you to shift the input controls from the
current input to a specified file. The input starts out with unit 5
(the terminal). When you specify a script file to use as input, the
program expects to find a valid command as the first line in the file,
i.e., "MAPS", or "VIEW". All commands will be interpreted just as if they
were typed from the terminal. Script files may call script files as well.
Any time there is an error in reading a script file at the command level,
the script file is closed and control is returned to the file that called
it, or the terminal if the script file was called from there.
Control parameters:
1. File name to read input from
KEYWORDS:
INFILE file name to read input from
------------------------------------------------------------------------------
**** LOG
LOG allows you to copy output to a log file.
Input is:
1. Name of new log file
Note: if you already have a log file open, then it will prompt you to
see if you want to close it and open a new one.
KEYWORD operation: If keywording is in use, then if the keyword
"LOGFILE" has been associated with a file name, that file will be the
log file. If a log file is already open, the request will be ignored.
------------------------------------------------------------------------------
**** KEYWORD
Allows parameters to be set using keywords instead of prompts. After
specifying "KEYWORD", you can enter as many keywords with their values
as you want. End the setting of keywords with "DONE". Now when you
enter commands, the program will use the values you set with keywords.
If you have not specified a needed keyword, the program will quit the
routine it is in.
To return to prompts, specify "KEYWORD" and then "PROMPT".
Example:
$ru heavy
! go into keyword mode:
KEYWORD
! set a few parameters:
NLIST 10
INFILE tmp.drg
! go back into command mode:
DONE
! execute a command
VIEW
This sets up the keyword NLIST to 10 and INFILE to tmp.drg, then
runs the command VIEW.
The values of keywords that you set apply to any routine in the program. They
are remembered and need not be reentered if they have not changed. This
means you can set keywords, return to prompt mode, and when you go back
into keyword mode the values will still be there, as modified by any inputs
you made in prompt mode. The only exception to this is that names input files in
MADMRG are not retained.
Keywords are read sequentially and if you retype one, it supercedes the
previous value.
Keywords for each routine are listed in the documentation for that routine.
Some general keywords applicable to most routines are:
DMIN minimum resolution to consider
DMAX maximum resolution to consider
SYMFILE matrices of equivalent positions file
CELL 6 Cell dimensions and angles
INFILE name of input data file. If there are more than one,
specify INFILE(1), INFILE(2) etc...
OUTFILE name of output data file
------------------------------------------------------------------------------
**** HELP
This command issues a list of commands that may be entered at the command
level of the program.
------------------------------------------------------------------------------
**** END
This ends the program. Also QUIT, STOP, EXIT do the same.
------------------------------------------------------------------------------
**** HASSP - Heavy atom search program. Version 3.0 of August, 1994
HASSP is a routine for searching for solutions to a difference Patterson
function. The only essential inputs to the routine are (1) a map file
containing the patterson function, and (2) the region to be searched for
solutions (usually the asymmetric unit of the real cell).
SUMMARY OF INPUT PARAMETERS (SEE SECTIONS VI AND VII)
READ FROM DEFAULT INPUT.
LINE PARAMETERS
__ ___________________
1 NAME OF FILE CONTAINING OUTPUT FROM FFT
2 XS,XE; YS,YE; ZS,ZE (6F10.0) (...SEARCH REGION, FRACTIONAL)
3 NAME OF LOG FILE, IF NOT YET DEFINED.
----------------*** INPUT FILE MAY END HERE OR ANYWHERE AFTER HERE ----*
IF DEFAULTS ARE TO BE USED
5 BLANK LINE OR NAME OF FILE CONTAINING LOCAL SYMMETRY
6 [DISCRM,ICRMAX] (2I6) defaults = 1.0, 0
(DISCRM=MIN. RATIO OF PEAK:SURROUNDINGS TO BE "ISOLATED"
ICRMAX=MAXIMUM NUMBER OF PEAKS TO TRY AS CROSS VECTORS
7 [IHASSPTYPE] (DEFAULT=-5) (I5)
|IHASSPTYPE| = 2..SEARCH FOR SINGLE-SITE SOLUTIONS USING HARKER VECTORS
3..SYMMETRY SEARCH FOR 2-SITE SOLUTION GIVEN CROSS-VECTOR
5 OR 0 ..SINGLE SITE SEARCH, THEN TWO-SITE SEARCH , THEN 6
6..SEARCH FOR ADDITIONAL SITES GIVEN 1 OR MORE STARTING
SOLUTIONS
NOTE: IF IHASSPTYPE < 0 , CROSS-VECTORS OR STARTING SOLUTIONS ARE
LINES 8A... (REQUIRED FOR ITYPE = -6). OTHERWISE PATTERSON
PEAKS WILL BE USED AS TRIAL CROSS-VECTORS
8 [NSIGNF,SPAT,SSIN,SDUB,STRP,SSFT] (I10,5F10.0)
...NSIGNF=0 IF SIGNIFICANCE IS TO BE TESTED
...SPAT...SSFT..MINIMUM PROBABILITIES FOR NON-
RANDOMNESS REQUIRED TO KEEP PEAKS IN ROUTINES
"PATPK", "SINGLE","DOUBLE","TRIPLE","SIFT"
[9A X1,Y1,Z1, (3F10.0)
[9B X2,Y2,Z2, (....LIST OF TRIAL CROSS-
. VECTORS OR SITES, ONLY
. NEEDED IF ITYPE < 0 )
KEYWORD INPUTS:
LOGFILE NAME OF FILE FOR OUTPUT SUMMARY (REQUIRED UNLESS
YOU ALREADY HAVE A LOGFILE OPEN)
SYMFILE SYMFILE NAME IF NOT PREVIOUSLY DEFINED
FFTFILE NAME OF FILE CONTAINING THE PATTERSON FFT
PATTGRID GRID FOR PATTERSON (IF NOT PREVIOUSLY DEFINED)
SEARCHREGION REGION TO SEARCH (XS,XE,YS,YE,ZS,ZE) (DEFAULTS=0.0)
IHASSPTYPE CONTROL OF WHAT IS TO BE DONE (DEFAULT=-5) dha: it's 5
DISCRM RATIO OF PEAK HEIGHT OVER SURROUNDINGS TO USE
(DEFAULT=1.0)
ICRMAX MAXIMUM # OF PEAKS TO TRY IN 2-SITE SEARCH (DEFAULT=0)
NOSPEC CONTROL OVER IGNORING SYMMETRY #'S OF SPECIAL POSITIONS
(DEFAULT=0, DO NOT IGNORE)
NSIGNF 0 IF SIGNIFICANCE OF PEAKS IS TO BE TESTED (DEFAULT=0)
SPAT MINIMUM PROBABILITY FOR NON-RANDOMNESS TO KEEP A PEAK
IN ROUTINE "PATPK". (DEFAULT=0.0)
SSIN AS SPAT, BUT FOR SINGLE-SITE SEARCHES (DEFAULT=0.0)
SDUB AS SPAT, FOR TWO-SITE SEARCHES (DEFAULT=0.95)
STRP AS SPAT, FOR 3-SITE SEARCHES (DEFAULT=0.0)
SSFT AS SPAT, FOR SIFTING THROUGH 3-SITE SOLUTIONS
(DEFAULT=0.95)
TRIALSITE FRACTIONAL COORDINATES OF A TRIAL SITE OR CROSS VECTOR
(USED IF IHASSPTYPE < 0)
I. INTRODUCTION
This program uses a space-group symmetry minimum method to obtain sets of
atomic sites consistent with a patterson function. For description of input
parameters see section VII. The usual procedure followed in using this program
is:
(1) (ITYPE=2) Search for single-atom solutions to patterson function.
Also adjust parameter (DISCRM) to obtain a reasonable number of isolated
patterson peaks (see section II).
(2) (ITYPE=5; parts 5a, 5b, and 5c are automatically carried out.)
Generate a list of isolated peaks in patterson one at a time. For each
peak assume it is a true cross-vector between two atomic sites in the struc-
ture. Given the position of site 1, site 2 is immediately given by site 1 plus
the cross vector.
(a) A search for the position of site 1 relative to the origin is then
carried out (unless space group has no symmetry). At each search grid point
(corresponding to the position of site 1), all positions in unit cell equiva-
lent by space group symmetry to site 1 and site 2 are identified. All self and
cross vectors between these (NEQUIV * 2) sites are calculated, and the minimum
of the patterson function over this set of sites is noted. The best position
of site 1 relative to the origin is taken to be that which yields the maximum
of this minimum function. (subroutine DOUBLE)
(b) At this point we have a two-site solution to the patterson function
(plus space-group symmetry). Next we find additional sites which are consis-
tent with these sites (and all other additional sites). This search is carried
out over entire supplied grid region. Each grid point is taken as a trial ad-
ditional site, space-group-related sites are calculated, and the minimum value
of the patterson function at each of the self and cross vectors due to this new
site and the starting sites is noted. At the end of this search, a list of
maxima in the search are obtained. Each site corresponding to a maximum is
"consistent" with the starting sites (but not necessarily to each other).
(subroutine TRIPLE).
(c) Go through list of potential solutions and extract a group which has
no negative self or cross vectors. (subroutine SIFT). The best solution obta-
ined is printed along with a list of minimum values of the patterson function
at the cross vectors between sites and an analysis.
II. Searching for isolated peaks in patterson function (always done).
(Subroutine PATPK).
A search is carried out over the input patterson function for isolated
eaks. The positions of these peaks are stored (Subroutine SAVEP), sorted on
the basis of symmetry (Subroutine SORT), and listed along with peak heights
after elimination of origins and redundancies.
Grid points corresponding to isolated peaks in the patterson function are
defined as those which have the following properties:
(1) The value of the patterson function at this grid point is not less
than at any adjacent point in the map. (At edges of the map, use patterson
symmetry to generate value of patterson at neighboring grid point).
(2) A box less than 9 grid points on an edge, centered at the grid point
in question, may be constructed so that all values of the patterson function on
the surface of this box are less than (the peak value/DISCRM). This insures
that this peak is isolated (and is therefore likely NOT to be several peaks
close together).
The parameter DISCRM (Default=1.0) is user-determined.
III. Searching for single-site solutions to patterson function (ITYPE =
2) (Subroutine SINGLE).
A map is calculated over the range supplied (XS-XE; YS-YE; ZS-ZE), ex-
cept that search is not carried out over axis which are not fixed (all three in
space group P1). The value of the map at each grid point is the minimum of va-
lues of the patterson functon at the (NEQUIV-1) Harker vectors corresponding to
this grid point. Peaks in this map are stored (Subroutine SAVEP), sorted ac-
cording to symmetry (Subroutine SORT), and listed after elimination of redun-
dancies.
For points in general positions, the peak height listed is simply the min-
imum value of the patterson function at the (NEQUIV-1) Harker vector associated
with this point. For points in special positions, the listed height is the
minimum of the values of the patterson function at each of the (NEQUIV-1) Hark-
er vectors divided by the number of times a Harker vector associated with this
point falls on that position. For example, in space group P222, an atom at
(x,y,z) yields Harker vectors (0,0,0), (2x,2y,0), (2x,0,2z), and (0,2y,2z). If
x=0 and y=0, though, (0,0,0)=(2x,2y,0) and (2x,0,2z)=(0,2y,2z) and there is
only one unique Harker vector (excluding the origin), which is repeated twice.
The value of the peak height listed would be 1/2 the height at (0,0,2z).
The probability that a given peak of height A in this function is due to a
random combination of peaks is roughly given by:
P=(1.- (1.- p(A)**M )**N ) , where,
A= minimum value of (value of patterson function at Harker vectors
divided by expected noise at that position).
p(A) is probability of observing a value of A or higher on a given try.
M=number of independent Harker vectors examined for this peak
N=number of independent grid points used in search for peaks.
The noise in the map is taken to be the RMS value of the patterson func-
tion if this is a general position. If it is a position of higher symmetry,
the noise = sqrt(SIGMA) * the symmetry number of this position (see VI C).
The number of independent grid points used in the search for peaks would
roughly be equal to the number of reflections used to make up the map if re-
flections at all resolutions contributed equally. A better estimate of this
numbr is probably the number of peaks+valleys in the patterson map. In this
routine, we actually use 2* the number of peaks.
IV. Searching for two-site solutions to patterson function given a
cross-vector (x,y,z) between the sites (ITYPE=3 or 5, Subroutine DOUBLE).
A map is calculated over the supplied region (XS-XE; YS-YE; ZS-ZE). At
each grid point (u,v,w) , the value of the map is the minimum of:
(1) Values of patterson function at Harker vectors due to atom at (u,v,w)
(2) Values of patterson function at Harker vectors due to atom at
(u+x,v+y,w+z)
(3) Values of patterson function at each cross vector between (u,v,w) or a
position equivalent by space group symmetry to (u,v,w) and (u+x,v+y,w+z)
Positions of peaks in this map (corresponding to [u,v,w] and
[u+x,v+y,w+z]) are stored (Subroutine SAVEP), sorted according to symmetry (Su-
broutine SORT), and listed. If ITYPE=5, subroutine SIFT is called with the top
peak in this map.
The measures of peak heights and probabilities of random occurence for
this routine are similar to those for the single-atom search (see section III).
V. Searching for additional solutions to patterson function given a set
of N starting solutions. (ITYPE = 5) (Subroutines SIFT and TRIPLE).
This search is carried out in two parts. First, positions in the supplied
region which have non-negative self vectors and non-negative cross vectors with
the starting solutions are identified. These new solutions are not necessarily
consistent with each other, however. Next, a set of sites which are completely
self-consistent are extracted from these possible solutions. NOTE: a maximum
of 4 new sites are added to list of soluions in each pass through subroutine
SIFT (if local symmetry is used, 4 sites unique by local symmetry and
space-group symmetry). If more than 4 additional sites are desired, you must
run the program again with ITYPE = -6, and list the current solution in lines
9a....
(A) (Subroutine TRIPLE) A map is calculated over the range supplied
(XS-XE; YS-YE; ZS-ZE). At each grid point (u,v,w), the value of the map is
the minimum of:
(1) Values of the patterson function at Harker vectors due to atom at
(u,v,w)
(2) Values of the patterson function at each cross vector between
(u,v,w) (or a position equivalent by space-group symmetry) and each one of
the of the "known" solutions.
Peaks in this map are stored (Subroutine SAVEP), sorted according special
or general positions, and saved.
(B) (Subroutine SIFT) Each peak from subroutine TRIPLE is considered as a
potential additional site, beginning with the highest peak:
(1) If this new site is equivalent by space group symmetry to a site alre-
ady in list of solutions, forget it.
(2) If this new site has positive cross-vectors with all sites currently
in list of solutions, add this site to the list of solutions.
VI. Some technicalities.
(A) GRID -- The grid used for all searches is exactly the same as the grid
for the input patterson map, but each time a peak is found, all neighboring
grid points are tested on a grid twice as fine and the highest of these test
values is used. Values of the patterson function between grid points are in-
terpolated. Do not use a grid coarser than 1/3 the resolution for the input
patterson map. Also don't bother to use a grid finer than 1/6 the resolution.
NOTE: the input patterson map must be on a grid such that the symmetry elements
lie on grid points. That is, if there is a two-fold axis at 1/12 in z, then
the z-axis must be divided into a number of grid points that is a multiple of
12. The easiest way to be safe is to make sure all unit cell translations
are multiples of 12.
(B) SIGNIFICANCE TESTS -- Difference patterson functions have a consider-
able amount of noise if acentric reflections are present. (For each acentric
reflection, the expected error [|Fph-Fp| - |fh|] is roughly equal to |Fph-Fp|).
It can be shown that SIGMA, the RMS noise in the map is roughly equal to the
RMS value of the patterson function.
Peaks in the patterson map which have a height much less than SIGMA are
therefore likely to be unrelated to atomic sites. On special positions, the
RMS noise in the map will be sqrt(NSYM)*SIGMA, where NSYM is the symmetry
number of this position (see VI C).
In order not to include too many peaks due to this noise in any of the
searches carried out, a significance test is made for each peak if ISIGNF=0
(default). A peak is rejected if there is a probability less than SIGNIF that
no peak of this height or higher would occur by chance in this search (see
III. SIGNIF is selected by the user for each routine (SPAT, SSIN, etc.)
(C) SYMMETRY NUMBERS OF POSITIONS IN REAL AND PATTERSON CELLS -- (Subrou-
tines SPECR AND SPECP). For this program, the symmetry number of a position in
a real or patterson cell is the number of ways that a symmetry operator in the
group (patterson or real cell) can map the point onto itself (within a toler-
ance of 2 grid units). The symmetry number of a general position is 1, for a
point on a dyad, it is 2, etc..
(D) The matrices of equivalent positions for this space group are read
from a SYMMETRY file (SEE SYMMETRY in the documentation)
(E) INPUT PATTERSON MAP -- This map is assumed to have been calculated
with X-across, Y-down, Z-sections, where X, Y, and Z are those used in the ma-
trices of equivalent positions. The map must have 1 record for each line of X
across, with a total of NY*NZ records. The map is unformatted REAL*4 with no
header. The total number of elements in the map must not exceed 200,000.
(F) SEARCH REGION -- Except for patterson peak searches, all searches are
carried out one section at a time. The grid searched is NX by NY where,
NX = (XE - XS)* IPCELL(1) + 3
NY = (YE - YS)* IPCELL(2) + 3
where XS, XE, YS, YE are input by the user. For single atom searches and
origin searches, however, if the region covered by the input patterson function
is smaller than this, the latter region is used NX*NY must not exceed 20000.
G. NON-CRYSTALLOGRAPHIC (LOCAL) SYMMETRY. If local symmetry elements exist in
the crystal structure and are known, it is very useful to include them, at
least part of the time, in using this program. If local symmetry is to be in-
cluded, you must specify the file to read it from on line 5 of the input file.
The local symmetry file has the format:
RECORD 1: NNCR (number of sets of local symmetry element and regions to
follow)
NEXT 2*NNCR RECORDS (in sets of 2, all REAL numbers, NOT INTEGERS)
R11 R21 R31 R12 R22 R32 R13 R23 R33 T1 T2 T3
XS,XE; YS,YE; ZS,ZE
where, R11, R21, are rotation matix elements just as for cryst-
allographic symmetry, T1, T2 are translations, in fractional
coordinates, NOT divided by 12.
XS, XE, etc. define the region in which this symmetry
operation is valid.
NOTE 1: Only specify symmetry elements for ONE asymmetric unit of crystal.
It is not necessary to include the identity.
The maximum value allowed for NNCR is 24.
The maximum number of local symmetry operations which may be
applied to a PARTICULAR point in the unit cell is 12,
not including the identity.
NOTE 2: If local symmetry is specified, during single-atom searches and
two-atom searches, only the UNIQUE positions will be listed, you must apply
local symmetry yourself in order to generate the complete list of sites. In
searches for additional sites, however, all sites are listed (except those re-
lated by space-group symmetry, of course.) When sites are input (ITYPE = -6) in
searches for additional sites, it is not necessary to specify all the local
symmetry-related sites; these will be generated automatically. If you wish to
generate local symmetry-related sites from a set of unique sites, you may input
them with ITYPE = -6 and a search region of (0-0,0-0,0-0), the entire set of
sites will then be listed for you.
H. Obtaining an analysis of a solution that you input yourself. In order
to generate a list of local symmetry-related points and minimum self- and
cross-vectors corresponding to a set of unique sites you specify, use ITYPE =
-6 with a zero search region (blank line).
I. Maximum number of sites in a solution set. In the absence of local
symmetry, this program will yield up to a 6-site solution in one pass. If more
sites are to be located, run program again using ITYPE = -6 ; specify the
known sites on lines 9a-9f. Up to 4 additional sites may be located on each
pass. If local symmetry is present, NSYM * 6 sites may be found on the first
pass and NSYM * 4 on each additional pass. A maximum of 40 sites may be found
in any one solution.
J. Use of ITYPE=-6 to search for causes of questionable results. If
ITYPE=-6 is specified along with a small (or zero) search region, an analysis
of the sites you input on lines 9a-... will be printed. This analysis
includes the minimum values of the self- and cross-vectors for this set
of sites. If you know that the Patterson function is large at all these
vectors for a given group of sites, then this procedure will help you
determine if there is anything unusual about your map.
VII. DESCRIPTION OF INPUT PARAMETERS
A. IPCELL(1),IPCELL(2),IPCELL(3) are cell dimensions in grid units for
input patterson map. The map runs from 0 - NPATXE across, 0 - NPATYE down, and
0 - NPATZE in sections. The map must be at least the asymmetric unit of the
patterson function (1/2 the asymmetric unit of the real cell), but may be more.
If it is more than asymmetric unit, the estimates of number of peaks+valleys
will be too high by the ratio of volume included to the actual asymmetric unit.
This will throw probability calculations off slightly (a factor of two makes
little difference, however).
NPATXE, NPATYE, and NPATZE must be positive or zero.
B. DISCRM is described above (section II)
C. XS, XE; YS, YE; ZS, ZE are boundaries of the search region in frac-
tional coordinates in the X, Y, and Z directions (defined by input patterson
map, see VI D). Note that in many space groups it is not necessary to search
over the entire asymmetric unit for single-site solutions. Similarly, for
two-site origin searches, one-half of asymmetric unit is always sufficient.
For searches for additional sites, the entire asymmetric unit must be speci-
fied. This program does not do all this automatically at this time.
D. ICRMAX (default=1) is the maximum number of tries to make if ITYPE = 3
or 5 The top ICRMAX isolated patterson peaks in general positions are used as
trial cross vectors, one at a time. If there are less than ICRMAX isolated
peaks in general positions, those in special positions are used also for a
total of ICRMAX.
E. NOSPEC (default=0) is 1 if the difference between general and special
positions is to be ignored in sorting peaks.
F. ITYPE controls the method used.
1. If ITYPE is greater than or equal to 0, isolated patterson peaks
will be identified. The top ICRMAX of them will be used
in 2-site searches if ITYPE = 3 or 5.
2. If ITYPE is less than 0, lines 8a,8b,... will be read and
used as:(a) trial cross-vectors, each considered separately
(ITYPE = -3 or ITYPE = -5),or
(b) a starting solution, all sites read are part of
the same solution (ITYPE = -6).
3. If |ITYPE| is:
2 ...Search for single-site solutions using Harker vectors
(section III).
3 ...Search for two-site solutions related by a cross-vector
(an isolated patterson peak if ITYPE = 3, an input
cross-vector (line 8a) if ITYPE = -3 ).
0 or 5 ...First do 3, then for each two-site solution, do 6.
6 ...Search for additional sites given 1 or more starting
solutions. ITYPE =6 is not allowed. IF ITYPE = -6
a list of starting sites is read on lines 8a,8b,...
G. NSIGNF (default =0) is 1 if signficance tests are not to be carried
out. Use NSIGNF for difference Patterson functions, NSIGNF=1 for Patterson
functions with no noise.
H. SPAT, SSIN, SDUB, STRP, SSFT (default= 0., 0.0, 0.95, 0.0, 0.95) are
minimum values of the probability that (no random peaks will be found greater
than this peak) in order to keep peaks.
SPAT is not used at present.
SSIN is for single-atom searches.
SDUB is for two-atom origin searches.
STRP is for calculation of map in searches for new sites
given starting sites (see section V A).
SSFT is for selecting sites which are consitent with each other
in searches for new sites given starting sites (section V B).
J. X1, Y1, Z1, etc. (lines 8a, 8b, etc) are fractional coordinates
for cross-vectors or starting solutions. These are read if ITYPE < 0.
EXAMPLE OF USE OF PROGRAM HASSP:
SPACE GROUP C2.
! set keywords:
KEYWORD
SYMFILE C2.SYM
LOGFILE c2.log
FFTFILE c2.patt
SEARCHREGION 0. 0.5 0 0.5 0 0.5
IHASSPTYPE 5
ICRMAX 10
DONE
! go:
HASSP
RESULTS OF HASSP (ANNOTATED)
"HASSP -- PATTERSON SEARCH AND SUPERPOSITION PROGRAM. 13:40:48 12-MAY-94
INPUT PARAMETERS:
...[ LIST OF INPUT PARAMETERS AND DISCUSSION OF SPACE GROUP SYMMETRY
IS PRODUCED BY PROGRAM HERE]...
INPUT PATTERSON MAP HAS 36465 ELEMENTS AND AN RMS VALUE OF 0.21687E+06,
SCALED TO 1000.0
NUMBER OF DEGREES OF FREEDOM IN PATTERSON MAP WHICH IS ABOUT
EQUAL TO THE NUMBER OF PEAKS+VALLEYS IN MAP: 462
...[THIS NUMBER OF DEGREES OF FREEDOM IS USED IN ESTIMATING PROBABILITIES OF
FINDING
SOLUTIONS BY CHANCE]....
LIST OF ISOLATED PATTERSON PEAKS: GENERAL POSITIONS.
PEAK X Y Z HEIGHT
1 0.156 0.344 0.125 10135.
2 0.125 0.344 0.313 10082.
3 0.070 0.219 0.172 2869.9
4 0.414 0.406 0.031 2356.5
....ETC...[ THESE PEAKS WILL BE USED AS POSSIBLE CROSS-VECTORS.
THOSE ON SPECIAL POSITIONS
(BELOW) WILL ONLY BE TRIED AFTER THOSE ON THIS LIST ARE EXHAUSTED.]
LIST OF ISOLATED PATTERSON PEAKS: SPECIAL POSITIONS.
PEAK X Y Z HEIGHT SYMM #
1 0.469 0.500 0.188 9822.5 2
2 0.281 0.000 0.438 9478.0 2
FOR EACH TRIAL SOLUTION IN SEARCHES, THE "HEIGHT" IS THE MIMINUM VALUE, OVERall
PREDICTED PATTERSON VECTORS, OF: (VALUE OF THE PATT FN)/(THE # OF PREDICTED
VECTORS WHICH FALL ON THIS POINT)
THE SYMMETRY NUMBER IS THE SYMMETRY OF THIS POSITION IN THE PATT FN.
LIST OF MAJOR PEAKS IN SINGLE-ATOM SEARCH:GENERAL POSITIONS
PEAK X Y Z HEIGHT PROB THAT THIS IS chaNCE
1 0.484 0.000 0.094 9822.5 0.000
2 0.141 0.000 0.219 9478.0 0.000
3 0.055 0.000 0.094 3322.7 0.669
LIST OF MAJOR PEAKS IN SINGLE-ATOM SEARCH:SPECIAL POSITIONS
PEAK X Y Z HEIGHT SYMM # PROB THAT THIS I
S BY CHANCE
14 0.000 0.000 0.000 25739. 2 1.000
TWO-ATOM ORIGIN SEARCH. TRY # 1 . CROSS-VECTOR BETWEEN SITES = (0.156,0.344,0.1
25)
MAXIMUM AGREEMENT FOUND IN SEARCH WITH THE TWO SITES:
(0.484,0.000,0.094) AND (0.641,0.344,0.219).
PROBABILITY THAT THIS PAIR OF SITES CORRESPONDS TO CHANCE PEAKS =0.000
--------*ANALYSIS OF THIS SOLUTION:--------**
SITE X Y Z PROB HAS SAME CORRESPONDS
HARKER VECTORS
AS SITE:
1 0.484 0.000 0.094 0.000 0 1 (X+.000 ,Y+.000, Z+.000)
2 0.641 0.344 0.219 0.000 0 2 (X+.500 ,Y+.344, Z+.
SOLUTIONS LISTED BELOW SHARE HARKER VECTORS WITH
AT LEAST ONE SITE ABOVE YET ARE COMPLETELY CONSISTENT
WITH ABOVE SITES
3 0.984 0.125 0.078 0.000 1 1 (X+.500 ,Y+.125, Z+.
LIST OF CROSS-VECTORS IN PATTERSON FUNCTION ; FIRST TEN ONLY :
1 2 3
1 9822.5 10082.2 2097.4
2 10082.2 9478.0 2151.0
3 2097.4 2151.0 5852.6
...[ ETC FOR OTHER TEST CROSS-PEAKS]...
NOTE: SEE "USING HASSP AND HEAVY" FOR MORE DISCUSSION OF THIS OUTPUT.
-------------------------------------------------------------------------------
**** HEAVY -- Heavy atom refinement Version 3.0
HEAVY is a general-purpose heavy atom refinement routine. It can be used to
carry out either phase refinement or origin-removed Patterson refinement, as
well as to calculate coefficients for native Fourier and difference Fourier
maps.
Major changes from versions 1.0 - 1.5:
1. Sigmas of anomalous differences read in explicitly
2. Program is now compatible with MAD data after conversion to
MIR format with MADMRG.
3. All inputs now can be entered by KEYWORDing
4. Refinement and phasing can be carried out using isomorphous
differences, anomalous differences, or both.
The input parameters will be read from standard input file
KEYWORDING inputs . These are most conveniently entered using a SCRIPT file
NOTE: any values previously defined do not need to be specified. If you
run HEAVY a second time without quitting the main program and do not specify
any new parameters, the routine will start where it left off and carry out
another set of refinements of the same type that you specified the last time
you ran it.
NOTE also: average residuals are maintained throughout. This means that if
you want to refine a completely new set of data, you must start the program
over.
KEYWORD Values
LOGFILE REQUIRED name of output file for summary of results.
If a logfile has already been defined, this is ignored.
CELL cell parameters
DMIN minimum d-spacing to consider
DMAX maximum d-spacing to consider
NEWFILE file with updated inputs to be created
INFILE input data file
KOUT type out output, if any. DEFAULT = 0 (no binary output)
KOUT TYPE OF OUTPUT # of columns of data
__ _____________________ ______________________
0.... NONE
2.... DIFFERENCE FOURIER FOR KDER 2
A=m(Fder-Fnat)cos(PhiBest)
B=m(Fder-fnat)sin(PhiBest)
3.... ANOM DIFF FOURIER FOR KDER 2
A=m(DelAno)cos(PhiBest-90)
B=m(DelAno)sin(PhiBest-90)
4.... RESIDUAL MAP FOR KDER 2
A=m(Fder-|Fnat+FH|)cos(PhiBest)
B=m(fder-|Fnat+FH|)sin(PhiBest)
(where Fnat+FH is the vector sum of Fnat
and the heavy atom FH)
6.... NATIVE FOURIER 2
A=m(Fnat)cos(PhiBest)
B=m(Fnat)sin(PhiBest)
7.... PHASES AND FIGURE OF MERIT 3
PhiBest (in degrees), PhiMostProbable,
and figure of merit
8.... HENDRICKSON-LATTMAN COEFFS 4
9.... HEAVY ATOM S. FACTORS FOR KDER 4
A, B= real and imaginary parts of normal
scattering from heavy atom.
C, D= real and imaginary parts of anomalous
scattering from heavy atom.
Here, m=the figure of merit, PhiBest is the "Best" phase,
PhiMostProbable is the the most probable phase.
KDER derivative to include in output
OUTFILE output file name, required if KOUT is not 0
FILETITLE optional title of this run
IANGLE phasing angle, minimum=5, default=5
INANAL PHASE ANALYSIS. DEFAULT=0
1 for printing of extensive heavy atom statistics
INRESD RESIDUAL AND STATISTICS. DEFAULT = 0
-1 No residuals or statistics calculated.
0 zeroth cycle added before first refinement
cycle. During zeroth cycle residuals and
statistics are calculated and printed.
No statistics are calculated on other cycles.
1 Residuals and statistics calculated every
cycle and printed according to INPRNT.
Note: residuals are only calculated for
derivatives with INPHAS = 1.
INOSIG USE OF SIGMAS. DEFAULT = 0 (use sigmas).
1 if sigmas from input data file are not to be used.
INHEND USE OF HENDRICKSON-LATTMAN COEFFICIENTS. DEFAULT=0 (don't use)
1 if Hendrickson-Lattman coefficients are to be calculated,
useful for outputting phase probability distribution in
this form (KOUT=8). HEAVY does not do phase combination.
INPRNT PRINTING OF SHIFTS. DEFAULT =0 (don't print)
1 if shifts (and statistics, if any) are
to be printed on every cycle. Default
is to print statistics on first cycle,
shifts and statistics on last.
JALT USE OF PHASES IN REFINEMENT. DEFAULT = 0 (Patterson refinement)
0 is to use origin-removed Patterson refinement.
1 is to use phase refinement at most probable phase
JALT and KALT are set automatically if you use a procedure
(IHEAVYPROC > 0)
KALT USE OF DERIVATIVE BEING REFINED IN PHASING. DEFAULT=0(don't use)
0 is not to use derivative being refined in phases
1 is to use all available derivatives in phasing
NCYCLE Number of cycles of refinement to be carried out if a
PROCEDURE is NOT used (see IHEAVYPROC). Maximum = 30
Default = 0
IREFCY List of derivative numbers to be refined during the NCYCLE
cycles of refinement if a procedure is NOT used. Default = 0
i.e., 1,1,1,1,0 means refine deriv #1 on cycles 1-4 and
calculate phases, get residuals, figure of merit, etc
on cycle #5. Note that you don't get these statistics on
cycles in which you refine with Patterson refinement.
IHEAVYPROC RUN a procedure with HEAVY. Default =0 (no procedure)
Available procedures:
1 = NREP cycles of refinement of each deriv that has INPHASE
specified, refining only occupancy.
2 = as 1, but refining only xyz. Fixes coordinates of best
atom in each deriv in polar space group in polar directions
unless another atom is already fixed by user.
3 = as 2, but refining xyz and occ
4 = as 3, but refining xyz, occ, B
5 = 1, then 2, then 3, then 4
6 = phased refinement to obtain relative coordinates among
derivatives for polar directions in polar space groups.
Fix and phase with best derivative. Refine just coordinates
in polar directions for all other derivatives with INPHASE
specified. This should be followed by #5 again.
NREP # of refinements of each deriv in procedures with
IHEAVYPROC > 0
SMALL minimum ratio of derivative structure factor amplitude
(F Deriv) to RMS lack-of-closure for use in
refinement or residuals. DEFAULT=0.
FMIN minimum native F for any action. Default=0.
FOMMIN minimum figure of merit for use in phased
portion of refinement. Default =0.
BMIN minimum isotropic atomic B allowed. Default =0.
THR Keywords to set threshold and dampling factors for shifts:
ACL if SHIFT > THR *sigma of SHIFT, SHIFT=SHIFT*ACL
Defaults are 0. and 0.5
FSIGMIN MINIMUM ratio of F/sig to include. DEFAULT =1.0
NNATF column number in input file for native F
NNATS column number for sigma of native F
NBST optional column number for "best" phase in input file
NMP optional column number for most probable phase in input file
NFIGM optional column number for figure of merit in input file
INOLD Flag for using phases from input file in phasing when they
are not available from current data. default = 0. To use
input phases, inold=1
ANATSCALE Overall scale factor applied to ALL data before any
other scaling. DEFAULT = 1.0
SIGNATSCALE Scale factor applied to native sigmas after overall scaling
DEFAULT = 1.0
NEWATOMTYPE XXXX, where XXXX is the name of the new atom type. The name
of an atom can have up to 4 characters. This
keyword allows you to enter the scattering factor information
for an atom type that is not supplied with the program. This
should be followed by the next 5 keywords (AVAL, BVAL,
CVAL,FPRIMV,FPRPRV)
AVAL 4 real numbers from the International Tables corresponding to
the "A1", "A2", "A3", and "A4"
values for scattering factors for a new atom type
BVAL 4 real numbers for "B" values
CVAL 1 real number for "C" value
FPRIMV 1 real number for f' value for the new atom type
FPRPRV 1 real number for f" value for the new atom type
DERIVATIVE 1 (..2,3,4...derivative #)
Keyword indicating start of a new derivative. When you
enter the command "KEYWORD", then as soon as the keyword
"DERIVATIVE" appears the first time, all heavy atom
information is zeroed, and the program assumes you are typing
in information on derivative #1. The next time "DERIVATIVE"
appears, it assumes you are typing in information on
derivative #2, and so forth. As long as you do not type
"DERIVATIVE" again, all heavy atom information will be
maintained and updated as the refinements progress, even if
you use other routines in the package, such as HASSP or MAPS.
GOTODERIV Keyword that allows you to alter parameters for a particular
derivative already entered. See notes below.
GOTOATOM Keyword that allows you to alter parameters for a particular
atom already entered. See notes below
LABEL Title for this derivative
NCOLFBAR Column number for derivative F
NCOLSFBAR Column number for sigma of derivative F
NCOLDELF Column number (optional) for anomalous difference
NCOLSDELF Column numbaer for sigma of anomalous difference
INPHASE Keyword indicating this derivative is to be used in phasing
DEFAULT = not to include in phasing
INANO Keyword indicating the anomalous differences are to be used
for this derivative. DEFAULT = not to include anom diffs
ISOONLY Use only isomorphous differences for phasing and refinement
(overrules INANO)
ANOONLY Use only anomalous differences for phasing and refinement,if
available (you need to specify INANO also).
DERSCALE Dividing scale factor applied to all this derivative data
after overall scale factor has been applied. DEFAULT=1.0
NOREFINESCALE Do not refine overall scale factor. Default = refined
DERTEMP Dividing B-factor to apply to deriv data. DEFAULT =0.
REFINETEMP Refine B-factor applied to deriv data. DEFAULT= not refined
SIGDERSCALE Scale factor to apply to derivative sigmas after all above
scaling is applied. DEFAULT = 1.0
ATOMNAME XXXX, where XXXX is the atom type of an atom to be refined.
This name must appear in the
DATA statements at the start of routine "HEAVY" or you can
enter them using the NEWATOMTYPE keyword. Atoms
supplied with the package are:
"I- ", "IR+3", "PT+2", "AU+1", "HG ","HG+2", and "U "
When you type "ATOMNAME" it assumes you are typing in a new
atom and it zeroes out all the parameters for this new atom.
If you want to go back to this atom later (i.e., in another
cycle) use the keywords GOTODERIV and GOTOATOM to identify
this atom.
When you have multiple sites for a particular derivative,
use ATOMNAME XXXX for the first, then input all the data on
that site, then start the next site with ATOMNAME YYYY, and
do forth.
OCCUPANCY Fractional occupancy of this atom
BVALUE Temperature factor for this atom. Anisotropic temperature
factors are no longer supported.
XYZ Fractional coordinates of this atom
Control of refinement of this atom. These are cumulative,
so you can refine x and y using REFINEX and REFINEY.
REFINENONE Don't refine anything...This is used to reset all the
refinement flags to zero if you have previously set them
up and want to change them.
REFINEALL Refine x,y,z,occupancy, and B
REFINEOCCB Refine occupancy and B
REFINEXYZ Refine x,y,z
REFINEX Refine x
REFINEY Refine y
REFINEZ Refine z
REFINEOCC Refine occupancy
REFINEB Refine B
EIS Optional list of estimated rms isomorphous lack-of-closure
residuals in 8 resolution ranges
EAD Optional list of estimated rms anomalous lack-of-closure
residuals in 8 resolution ranges
FPHBAR Optional list of estimated rms derivative F in 8
resolution ranges
FHBAR Optional list of estimated rms heavy atom F in 8
resolution ranges
SIGBAR Optional list of estimated rms derivative sigma in 8
resolution ranges
Notes:
0. If you want to change which parameters for which atoms are refined after
you have already set up the atoms and refinement parameters, then you have
to use a special way to reset them. The reason you have to do something
special is that if you say "DERIVATIVE" then the routine assumes you are
inputting data for a new derivative, so you can't go back to a previous one
with that command. Instead, you type:
GOTODERIV 2 ( to go to derivative #2)
GOTOATOM 3 (to atom #3 in deriv #2)
REFINENONE (set all refinement flags back to zero)
REFINEXYZB (or whatever you want to refine for this atom)
GOTOATOM 1 ( now do atom 1 in deriv 2)
GOTODERIV 1 (now do derivative 1)
Please note: you must reset all the refinement flags back to zero with
REFINENONE if you want to turn any refinement flags off. Then turn the
ones you want back on.
1. The refinement against an origin-removed Patterson map is a way of
refining heavy atom parameters of each derivative independently, and is
particularly useful because the occupancies of heavy atom sites are
quite accurately estimated. See the tutorial on USING HASSP and HEAVY at
the end of this document for hints on using this refinement method.
When using this package, the recommended refinement method is this one,
with JALT=0 and KALT=0.
This refinement minimizes the sum over all reflections of,
R = WGT * DEL**2
with respect to heavy atom parameters. WGT is a weighting factor, and
DEL is defined as:
DEL = (Fph-Fnat)**2 - K*FH**2 - < (Fph-Fnat)**2 - K*FH**2 >
where the average <> is taken in a shell of resolution and FH is the magnitude
of the calculated heavy atom structure factor.
K is 1 for centric reflections, 1/2 for acentric reflections.
2..HINTS FOR INTERPRETING STATISTICAL OUTPUT FROM HEAVY:
A. Many of the values listed at the end of a set of refinements are
more-or-less self explanatory. This should include the number of reflections
read, within resolution limits, and greater than the minimum figure of merit.
As these statistics are usually printed for a cycle in which refinement is
not carried out, the number of reflections used to refine is usually zero
in this listing.
Other values listed at the end of a set of refinements include:
RMS HEAVY ATOM F: The rms value of the calculated heavy atom F
in the resolution range
RMS PHASE AVG'D RESIDUAL: This is the rms value of the difference between
calculated and observed derivative F, where it is averaged not only
over all reflections, but over all phases for each reflection, weighted
by the phase probability
RMS(FH)/RMS(E): This is the ratio of the rms heavy atom F to the rms
phase averaged residual
CENTRIC R FACTOR: This is <| |Fder-Fnat| - |FH| | >/< |Fder-Fnat| >
RMS DERIVATIVE F: This is the rms value of Fder
RMS SIGMA OF FPH: This is the rms sigma of Fder
RMS SIGMA OF FP: This is the rms sigma of Fnat
RMS OBSERVED DIFFERENCE: For anomalous differences, this is the rms value
of DelAno= (F+ - F-)
RMS CALCULATED DIFFERENCE: This is the rms calculated anomalous difference
MEAN RATIO OF ISO TO ANO: This is the ratio of calculated |FH| due to
normal scattering relative to that due to anomalous scattering. If
all anomalous scatterers are identical, this is equal to (f+f')/f"
for that anomalous scatterer.
RMS(RES HA SF+LACK OF ISO SF): This is an estimate of the total errors in
the heavy atom model plus lack of isomorphism that remain. It is
obtained from the rms phase averaged residual and the rms native and
derivative sigmas.
RMS LACK OF ISOMORPHISM SF: This is an estimate of the remaining lack of
isomorphism. It is based on a comparison of the anomalous and
isomorphous differences that remain
RMS RESIDUAL HEAVY ATOM SF: This is an estimate of the remaining heavy atom
structure factor, based on the anomalous differences and the errors
in measurement.
CENTRIC LOC: This is an estimate of the "centric" lack-of-closure residual,
obtained using both centric and acentric reflections and correcting
acentric lack-of-closure residuals by a factor of 2. These residuals
are all corrected for errors in measurement, so that if the derivative
is "solved" and there is little lack of isomorphism, these values
should all be near zero.
ANOMALOUS LOC: This is the lack-of-closure error for anomalous differences,
corrected for errors in measurement.
The lack-of-closure residuals, miscellaneous statistics, and structure
factor tables are all calculated in shells of resolution defined by
DMIN and DMAX. These shells are designed to contain equal numbers of
centric reflections and are thus equally spaced in 1/D**2.
The shells are determined by the relations:
THMIN = 1./(2.*DMAX)**2
THMAX = 1./(2.*DMIN)**2
For a reflection with SSQOLA = (sin theta/lambda)**2, the shell is:
NORDER = 1 + INT (8 * (SSQOLA - THMIN)/(THMAX-THMIN) )
3. Description of normal refinement/phasing cycles.
A. Refinement vs. origin-removed Patterson map.
Input parameters: all defaults used
NCYCLE = 1 to 30
IREFCY(I) = 1,1,1,2,2,2.....6,6,6,0
results:
Zeroth cycle: phases calculated for all derivatives identified with
INPHASE using input lack-of-closure residuals. New lack-of-closure residuals
are calculated for these derivatives. Statistics are printed.
Cycles 1 through NCYCLE-1: in this example, IREFCY(I) is zero on
last cycle, but non-zero for all other cycles. For each cycle when
IREFCY(I) is non-zero: no phases are calculated
no new residuals are calculated
derivative IREFCY(I) is refined as described above
Note that only 1 derivative is refined at a time and all are independent.
Therefore in polar space groups, the coordinate(s) of at least one atom
in each derivative must be fixed. In space group P1 parameters for a
single heavy atom may not be refined at all. If two atoms are present,
the occupancy, xyz, B of one of them only may be refined.
Cycle NCYCLE: IREFCY(NCYCLE)=0 in this example, so this cycle is
like the zeroth cycle: phases are calculated, new residuals calculated.
If KOUT is non-zero, output data are calculated as well.
B. Refinement by minimization of lack-of-closure at most probable phase.
Input parameters: all default except JALT=1, KALT=0.
Results: identical to the above example except:
(1) phases will be calculated every cycle
(2) derivatives will be refined by minimization of (Fph-Fc)**2
This is not the recommended manner of using HEAVY in this package. In
most circumstances origin-removed Patterson refinement is much more
accurate. There are some instances in which phase refinement may be
useful, however. One is when it is necessary to correlate the origins
in different derivatives. In space group C2, for example, the y-coordinate
is indeterminate. That means that if you have two derivatives and refine
them independently, you will not have refined the relative y-coordinates
of the atoms in the two derivatives (though you will have refined the
relative y-coordinates of atoms within each derivative). You might wish
to use phase refinement to carry this out, using one derivative to phase
and refining y-coordinates in the other derivative. In practice, however,
these relative y-coordinates can be obtained even more accurately by
simply calculating a difference Fourier for one derivative, phasing with
the other derivative. The centroid of the peak corresponding to the
heavy atom site (which can be found, for example, by PEAKSEARCH in this
package) will give you the relative y-coordinate you need with very
good accuracy, and refinement of this coordinate is unnecessary.
Note that still only 1 derivative may be refined at one time.
(If you really want to phase only once per refinement of all derivatives,
calculate phases during one run and write them out with KOUT=7. Then
merge file containing phases with input DORGBN file (3 extra columns).
Then run HEAVY with INPHAS=0 for each derivative and specifying INOLD=1.
Also set INRESD=-1. The program will then use the input phases during
phase refinement if JALT=1. Its probably faster
to just phase each time.)
C. Just calculating phases and a map or other output.
Input: all default, except NCYCLE >0
If the input lack-of-closure residuals are ok., set INRESD = -1 so
that new residuals will not be calculated and a zeroth cycle will not
be included. Otherwise leave INRESD = 0.
Specify the type of map with KOUT, the derivative (if applicable) with KDER.
D. Carrying out a procedure with IHEAVYPROC. Heavy has the capability
of carrying out an ordered sequence of refinements. These are useful if
you want to carry out refinement in a semi-automatic fashion. When
you specify a procedure with iheavyproc, you need to specify all the
parameters that you want refined at all. Then the procedure you choose
decides which parameters to refine on which cycles. Usually you will
specify REFINEALL for all atoms, then let the procedures decide which
to refine. If you use a procedure, the program will automatically fix
all coordinates that cannot possibly be refined. For example, in
space group C2 one atom in each derivative must have y fixed if
origin-removed Patterson refinement is used, because the y-direction
is polar. The program will fix the coordinate(s) of the atom that
is the strongest in each derivative. If you have already fixed the
coordinate(s) of an atom in a derivative (by not specifying that they
be refined) then the program will just fix the atom you chose and not
fix any others.
Note that you can carry out any series of refinements that you want
by setting up all your keywords for the first type of refinement,
initiating refinement with the command HEAVY, then going back to
KEYWORD mode, specifying the next type of refinement without changing
or setting any other parameters unless you want to, then initiating
the next refinement cycles with HEAVY, and so on. For example, you
might type in all your heavy atom parameters, finishing with
...
NREP 5
IHEAVYPROC 2
DONE
! now refine 5 cycles with iheavyproc=2
HEAVY
KEYWORD
NREP 7
IHEAVYPROC 4
DONE
! now refine 7 cycles with iheavyproc=4
HEAVY
This sequence of commands results in 5 cycles of refinement of xyz of
all atoms that you specified refinement of xyz, then 7 cycles of
refinement of xyz,occ, and B of all atoms that you specified these
parameters to be refined in. You can do this sort of thing
in any order and ad infinitum if you wish.
Note that there is no procedure to refine just thermal factors. With this
package there is no need to alternately refine occupancies and thermal
factors. If there is insufficient data (i.e., very low resolution) to
refine both occupancies and thermal factors, then set the thermal factors
to any reasonable value and just refine the occupancies.
------------------------------------------------------------------------------------
**** SYMMETRY Symmetry files
Space group symmetry is read in from a formatted file. The first
line is the number of symmetry equivalents (NSYM). The next NSYM lines
are the symmetry equivalents in this space group, as illustrated below:
Space group P6122:
12 ! number of symmetry equivalents to follow
x,y,z
-y,x-y,z+1/3
-x+y,-x,z+2/3
-x,-y,z+1/2
y,-x+y,z+5/6
x-y,x,z+1/6
y,x,-z+1/3
-x,-x+y,-z+2/3
x-y,-y,-z
-y,-x,-z+5/6
x,x-y,-z+1/6
-x+y,y,-z+1/2
Note: You can also use a MEP (matrices of equivalent positions) file in the
way that was used in previous versions of this program if you wish.
-------------------------------------------------------------------------------
**** DORGBN files
The files used in this package are binary files with data sorted by hkl.
. Format of the binary (FORTRAN unformatted) data file:
1. INTEGER*4 NCOL -
the number of columns of data in the file.
2. (LOGICAL*1) TITLE(80) - A title to describe the
contents of the file.
3. NCOL more titles, one for each column of data.
4. Data records - IH,IK,IL,RES,(F(I),I=1,NCOL)
1. IH,IK,IL - INTEGER*4 The indices of the reflection.
2. RES - The d-spacing in Angstroms.
3. F(I) - Data. These can be structure factors, sigmas, phase
information stored as phase, figure-of-merit, etc. When data are missing for
one or more columns the value -1.0 is stored.
-------------------------------------------------------------------------------
**** USING HASSP and HEAVY in heavy atom searches and refinement
What is the goal in heavy atom searches and heavy atom refinement?
In the MIR method, one generally has available a single dataset corresponding
to a "native" macromolecule, and one or more datasets corresponding to (more-
or-less) isomorphous "derivatives" containing a limited number of "heavy atoms"
bound at specific locations.
The goal of the heavy atom searches and refinement in the MIR method is to
obtain a list of sites of heavy atoms in each derivative of interest, along
with measures of occupancy and disorder of these sites. Of particular
importance, of course, is that this list of sites contain only correct sites,
as incorrect sites will bias both the phasing and the measures of quality of
the phasing.
There are several steps in obtaining and refining heavy atom positions. These
include (with an example of a program or package that carries out each step):
1. calculation of a difference Patterson function for each derivative
(FFT)
2. search for single or multiple site solutions to Pattersons (HASSP)
3. refining heavy atom parameters (HEAVY)
4. cross-Fourier or self-Fourier analysis to evaluate and identify
sites (FFT)
How will we know if a set of heavy atom sites is correct?
There are several tests that can help show whether a heavy atom site is
correct. Here are some that should usually be tried:
1. The difference Patterson shows the self (Harker) peaks predicted for this
site.
2. If there is more than one site in the derivative, the difference Patterson
shows the predicted cross-vectors.
3. Refinement of occupancy and thermal factors yields an occupancy that is in
the range of 0.2 to 1.0 (if the data is on an absolute scale), and a B in the
range of about 10 to 100 (if the data has been measured to a resolution better
than about 4 A).
4a. A difference Fourier for the derivative of interest, phased only with
other derivatives and removing any sites in common between the derivative of
interest and the other derivatives, shows this site as one of the higher peaks
in the Fourier.
-or-
4b. If SIR data is available only, a difference Fourier phased only with the
other sites in the derivative, shows this as one of the higher peaks in the
Fourier.
What this tutorial will do.
In this tutorial, a simple example using model data for one derivative with
anomalous information will be used to demonstrate the use of HASSP and HEAVY in
heavy atom determination and refinement. The data used here will actually be
based on model MAD data that has been converted to MIR format using MADMRG (1),
but the treatment is identical to that for any other SIR+anomalous data.
In the model data to be considered here, the space group is C2, with cell
parameters of a=76.1, b=28.0, c=42.4, and b=103.1 degrees.
There are two "true" heavy atom sites. They are at the fractional
coordinates and have the occupancies and thermal factors listed below:
x y z occupancy B
0.141 0.344 0.219 1.0 20.0
0.484 0.500 0.093 1.0 20.0
This model data contains 1763 data from 10 to 3 A, and includes "random"
errors added to the model structure factor amplitudes. In the model,
the "derivative" is perfectly isomorphous with the native.
**** I. Searching for heavy atom positions in a derivative: HASSP
In the MIR method, it is usually only necessary to "solve" one derivative by
Patterson methods. This is because once one derivative is solved, or even one
site in one derivative is found, the much more powerful difference Fourier
method can be used to solve all the others. For these other derivatives, the
Patterson function is only needed as a check on the solution. Nevertheless,
finding even one derivative that can be solved by Patterson methods is often
the hardest step in the MIR procedure.
What does HASSP do?
HASSP (Heavy Atom Search and Superposition Program) is simply an automated
procedure for identifying potential solutions to a difference Patterson
function (4). It has an additional and very useful feature of roughly
estimating the probability that a particular solution could have occurred by
chance alone, so you have an idea of the significance of a solution as well.
The basic idea used in HASSP is quite straightforward: place a "test" heavy
atom or atoms at every possible position in the asymmetric unit of the unit
cell, and predict all the self and cross vectors from this arrangement of
sites. Then look at the difference Patterson function at all these locations,
and note the lowest value of the Patterson at all of these self and cross
vectors. A high value of this "minimum function" means all the peaks are
present, a low value means at least some are missing. This is just what you
would do by hand, but here it's all done for you.
Significance of a solution.
The significance of a solution found by HASSP is a function not just of how
high the minumum function value is for that solution. It also depends on how
many peaks were predicted as well as on the noise level in the map and the
number of different solutions that were tried. For example, suppose we look
for single-site solutions to a map at a resolution of 3 A in a space group with
a single Harker vector, and where the cell dimensions are about 30 x 30 x 30A.
In this case we have a very good chance of coming up with a solution that is 2
times the rms value in the map--and being wrong. That is because we have
effectively looked at about 100 possible solutions ( even if we choose a finely
spaced grid, the independent solutions we examine will be spaced by about the
resolution of the map in each of two directions), and about 1 in 50 peaks in a
(random) map will be greater than twice the rms value. On the other hand, if
we are in a space group with the same cell dimensions and the map was
calculated to the same resolution, but the map has 3 Harker sections, the
chance that we come up with a single-site solution that matches all 3 Harker
sections with heights of twice the rms in the map is only about 0.008. This
is because we have looked at about 1000 possible solutions (10 along each of 3
directions) , and the chance that any one solution would give all 3 peaks
greater than 2 times the rms would be about (1/50)**3. Therefore the chance
that at least one of the 1000 test solutions would, by chance, give this height
at all 3 peaks is about 1-(1-(1/50)**3)**1000, or about 0.008.
Single-site searches.
The simplest Patterson search is simply to look at the Harker sections and
identify heavy atom positions consistent with the major peaks on these
sections. In HASSP, all of the positions in the unit cell that give different
sets of Harker vectors are tested, and a list of the positions that yield the
highest values of the minumum function are listed.
Example in space group C2
4 equivalent positions: (x,y,z); (-x,y,-z); (1/2+x,1/2+y,z); (1/2-x,1/2+y,-z)
Unique portion of real cell: 0 to 1 in x; 0 to 1/2 in y; 0 to 1/2 in z.
Symmetry of the Patterson cell: this is the symmetry of the real cell plus
inversion:
8 equivalent positions: (u,v,w); (-u,-v,-w); (-u,v,-w); (u,-v,w) and each of
these + (1/2,1/2,0)
Unique portion of Patterson cell: 0 to 1/2 in x; 0 to 1/2 in y; 0 to 1/2 in z.
An atom at (x,y,z) has 1 unique Harker (self) vector in the Patterson (all
other self vectors are related to this one by inversion or symmetry of the
space group): (u,v,w)=(2x,0,2z)
Sample output from HASSP on space group C2:
THIS SPACE GROUP HAS 2 SETS OF 2 IDENTICAL GROUPS
OF EQUIVALENT POSITIONS RELATED BY CENTERING TRANSLATIONS:
GROUP 2 IS RELATED TO GROUP 1 BY THE TRANSLATION: (0.500,0.500,0.000)
THE FUNDAMENTAL SET OF 2 ROTATION MATRICES AND TRANSLATION VECTORS FOR THIS
SPACE GROUP IS:
1 0 0 0.000 -1 0 0 0.000
0 1 0 0.000 0 1 0 0.000
0 0 1 0.000 0 0 -1 0.000
(ASIDE FROM CENTERING, IF ANY) THERE ARE 4 EQUIVALENT POSITIONS IN THE
PATTERSON:
1 0 0 -1 0 0 -1 0 0 1 0 0
0 1 0 0 -1 0 0 1 0 0 -1 0
0 0 1 0 0 -1 0 0 -1 0 0 1
THERE ARE 1 UNIQUE HARKER VECTORS:
2 0 0 0.000
0 0 0 0.000
0 0 2 0.000
COORDINATES ALONG Y-AXIS ARE DETERMINED ONLY TO WITHIN A CONSTANT.
Single-site search in C2: just look at all values of x from 0 to 1/2, all z
from 0 to 1/2. For each, look at the value of the Patterson at (2x,0,2z) and
note the highest peaks:
LIST OF MAJOR PEAKS IN SINGLE-ATOM SEARCH:GENERAL POSITIONS
PEAK X Y Z HEIGHT PROB THAT THIS
IS BY CHANCE
1 0.484 0.000 0.094 9822.5 0.000
2 0.141 0.000 0.219 9478.0 0.000
3 0.055 0.000 0.094 3322.7 0.669
4 0.430 0.000 0.000 3132.8 0.794
Note that the top two solutions in this single-atom search correspond (in x and
z) to the two "true" sites in the derivative. The value of "y" cannot be
determined from the Patterson. The height of the Harker vectors for these two
solutions are 9.5 and 9.8 times the rms of the map, and obtaining peaks this
high by chance is of course very unlikely. The next two possible solutions are
3 times the rms of the map, and each of these has a good chance of appearing by
chance (which they did).
Two-site solutions to a Patterson.
In a two-site search, two sites are considered together as a possible solution
to the Patterson. Their self-vectors and all cross-vectors are considered and
the minimum height of all these peaks is noted. As it is a bit time-consuming
to test all possible pairs of atoms in the asymmetric unit, a shortcut is used
here and only pairs of atoms that are related by a cross-vector found in the
map are considered. In this way it is certain that at least one of the cross-
vectors predicted from the pair of atoms will be a high peak in the map. In
practice, then, a high peak in the map (X,Y,Z) is noted, then atom A is placed
at all possible positions in the asymmetric unit (x,y,z), and atom B is always
placed at the coordinates (x+X,y+Y,z+Z). Then all self and cross-vectors are
calculated.
Searches for additional sites.
Once a two-site solution has been found, additional sites can be tested for
consistency with the existing solution. Atoms are placed at all possible
positions in the asymmetric unit, and once again all self and cross vectors and
the minimum functions are calculated, and the highest minimum function is
chosen. If the likelihood of obtaining this value of the minimum function is
very low, the new site is included in the solution.
Example of a two-site search in space group C2:
LIST OF ISOLATED PATTERSON PEAKS: GENERAL POSITIONS.
PEAK X Y Z HEIGHT
1 0.156 0.344 0.125 10135.
2 0.125 0.344 0.313 10082.
3 0.070 0.219 0.172 2869.9
These peaks were found in a search of the difference Patterson function for our
test case, considering (first) just locations in general positions in the map.
Peaks in special positions are considered later, only after all peaks in
general positions have been tried. Each of these is considered, in turn, as a
possible cross-vector in a two atom search. The first two are "real" cross-
vectors. The first yields,
TWO-ATOM ORIGIN SEARCH. TRY # 1 .
CROSS-VECTOR BETWEEN SITES = (0.156,0.344,0.125)
MAXIMUM AGREEMENT FOUND IN SEARCH WITH THE TWO SITES:
(0.484,0.000,0.094) AND