People
Software
HEAVY
 
HEAVY

HEAVY is a general-purpose macromolecular x-ray diffraction phasing package. It can be used to scale and manipulate data to make maps, get isomorphous and anomalous differences, import and export, convert from binary to formatted files, convert MAD data to a form similar to SIR+anomalous differences, and prepare data for difference refinement. It contains the HEAVY and HASSP routines that allow searches for solutions to a Patterson function and refinement of heavy atom parameters in the MIR method.

To run HEAVY on the DEC-ALPHAs add the following line to your .login file:

alias heavy /joule2/programs/heavy/heavy

Also you can check some example files in the directory /joule2/programs/heavy.




************************  COPYRIGHT NOTICE  *********************************
                       Los Alamos National Laboratory

        Copyright, 1994.  The Regents of the University of California.
       This software was produced under a U.S. Government contract
       (W-7405-ENG-36) by Los Alamos National Laboratory, which is
       operated by the University of California for the U.S. Department
       of Energy.  The U.S. Government is licensed to use, reproduce,
       and distribute this software.  Permission is granted to the
       public to copy and use this software without charge, provided
       that this Notice and any statement of authorship are reproduced
       on all copies.  Neither the Government nor the University makes
       any warranty, express or implied, or assumes any liability or
       responsibility for the use of this software.

******************************************************************************

        Tom Terwilliger  Los Alamos National Laboratory
        please send all correspondence to the author at: "terwilliger@lanl.gov"

        version 3.0 of August, 1994

******************************************************************************

                                INTRODUCTION

This is a program that is designed to analyze MIR data and 
calculate and analyze Patterson and Fourier maps.  It can also be used 
to convert MAD data into a form that is similar to SIR+anomalous 
data, and this data can then be analyzed in just the way that standard 
SIR+anomalous is analyzed. It can also be used to prepare data for
difference refinement.


GETTING STARTED:

This package uses the binary "DORGBN"-style data files.  You will 
probably need to use the routine IMPORT (part of UTIL) to convert your 
data to this format. Note that "unobserved" data can be entered as either
"-1.000" or "0.000" in this package, but cannot be blank.

The space-group information in this package is read from a symmetry file 
(SYMFILE) that you write.  It contains the symmetry equivalents from 
the International Tables.

It is a good  idea to start out by setting your preferences for 
symmetry file and fft grids at the very beginning using "PREFERENCES" to 
set them and save them.   This will result in a small file called 
"preferences.dat" that the program will consult every time you run it.
Please see the section on PREFERENCES in the documentation below.

You can run HEAVY interactively or using keywords.  It may be easiest to
start out using it interactively so you get an idea of how the program 
works.  The keyword mode has more options, however, and is a better way
to use the program in general. See *** KEYWORD in the documentation 
below.  

You may find it convenient to use two screens when running HEAVY, 
one for the program and the other for displaying the relevant portion 
of this documentation file with the listing of keywords you may wish to set.

ABOUT THIS DOCUMENTATION:

This file describes each of the options in the HEAVY package.  If you are
looking for information on a particular command in HEAVY, say "LOCALSCALE",
then search this file for the string:  "*** LOCALSCALE" and you will find 
yourself at the writeup for LOCALSCALE.

At the end of the documentation on the programs is a section on the formats
of files used in this package, including symmetry files and binary DORGBN files.

The documentation ends with a description of how to use
HASSP and HEAVY to solve heavy atom derivatives.

-----------------------------------------------------------------------------

If you have any difficulty using these programs, please contact me by e-mail at 
the address listed above.  I will be happy to attempt to assist you.  Further, 
if you find any bugs or make any substantial modifications to these programs, I 
would like to hear about them.

You are free to modify these programs as you wish, but they may not be used in 
any commercial package without permission.  

-----------------------------------------------------------------------------
**** DOCUMENTATION FOR HEAVY 

This is a program to manipulate data files, to scale data using a local
scaling routine, to calculate maps, to analyze maps, to convert MAD data
to pseudo-SIR+anomalous data, to convert convert native and mutant
structure factor data to a form useful for "difference refinement", to
search for heavy atom sites in a Patterson (HASSSP) and to refine heavy atom
parameters (routine HEAVY within this program).

This program runs interactively or using KEYWORDS. It should be 
mostly self-explanatory. 

------------------------------------------------------------------------------
COMPILING AND RUNNING THIS PROGRAM USING HEAVYV3.SCRIPT:

The program is supplied with a set of test data and a script file that
will run the data through and demonstrate some capabilities.  Put all the
files in one directory.  Then...
 
On a VAX system, just use:

        $FOR HEAVY
        $LINK HEAVY

Then run it with:

$RU HEAVY
SCRIPT           !  YOU CAN THEN RUN THE TEST DATA THROUGH WITH THIS FILE:
HEAVYV3.SCRIPT
END

On an SGI, you probably need extra scratch space (8 MB) for compiling it.
Its easist to do this as follows, using a directory on a disk with lots of
extra space:

        mkdir scratchspace
        setenv TMPDIR scratchspace
        f77 heavy.f -v -static -o heavy.out
        chmod +x heavy.out

Then run it with:

heavy.out
overwrite       ! Note on the sgi you need to overwrite files or it will dump.
script
heavyv3.script
end

------------------------------------------------------------------------------
The options in HEAVY are:

 MAPS:  calculate Fourier and Patterson maps,
 GETISO: calculate differences between two data columns,
 GETANOM: convert from F+ and F- to Fbar and DelAnom,
 GETPHASES: convert from A,B coefficients to F and phase,
 IMPORT:  read in formatted file with h,k,l,data, stripping off any text,
 EXPORT: write out h,k,l, data without any titles,
 BTOF: convert from binary dorgbn file to formatted one,
 FTOB: convert from formatted dorgbn file to binary one,
 FILEMERGE: combine or extract data columns from dorgbn files,
 MERGE: merge equivalent reflections and write out asymmetric unit,
 MADMRG: create pseudo-SIR+anom data from MAD data,
 PEAKSEARCH: find peaks in Fourier map,
 FFTTOBOSS: convert asymmetric unit of FFT to any portion of cell in BOSS format
 FFTTODSN6: convert asymmetric unit of FFT to any portion of cell in DSN6 format
 FFTTOFSFOUR: convert asymmetric unit of FFT to FSFOUR format
 LOCALSCALE: scale one dataset to another with local scaling,
 COMPLETE: determine completeness of a dataset,
 WEIGHTS: generate weighting for atomic refinement using Fo-Fc and sigmas,
 FDIFF: Create pseudo-mutant dataset for difference refinement,
 MAPTOASYM:  Map a PDB file onto the asymmetric unit of the crystal,
 MAPTOOBJECT: Map atoms to equivalent position close to specified object, 
 HEAVY:  refine heavy atom parameters,
 HASSP:  search for solutions to a Patterson function,
 PREFERENCES: change symmetry file, cell dimensions, and grids for FFT,
 VIEW: view a binary dorgbn file ,
 SCRIPT: read in commands from a file,
 LOG: copy output to a log file,
 OVERWRITE: overwrite output files if version numbers are not present,
 KEYWORD: use keywords to set parameters instead of prompting,
 HELP: get information on this program,
 END: end the program

These options are described in detail below.

------------------------------------------------------------------------------

**** MAPS -- calculate Patterson and Fourier maps

  This routine reads in data from a dorgbn-style file and calculates any of
a variety of maps.  It requires that you have already specified a symmetry file
of matrices of equivalent positions, the cell dimensions of your space 
group and a grid for your map using the routine "PREFERENCES" at some
earlier time or that the file "preferences.dat" exists.

The command structure for this routine is:

1.  IMAPTYPE            (0 for Patterson, 1 for Fourier)
2.  File containing data

--If Patterson--

3.  ICOL                (coumn number of containing |F|**2
                        or column number of column containing F or del F
4.  KTYPE               1 if icol contains F or del F; 0 if icol contains F**2

--if Fourier--

3.  JFOURTYPE

0 = read in A, B directly 
1 = read in F or Del F and a phase in degrees
2 = read in delF ano and a phase in degrees
3=  read in Fo,Fc, and phase for (Fo-Fc)exp(i PHIc)
4 = read in Fo,Fc, and phase for (2Fo-Fc)exp(i PHIc)

3a.  Column numbers for A,B or column number for F or del F or del  F Ano,
or column numbers for Fo,Fc

3b.  Column number for phase if Ktype >0 


4.  Resolution range to consider

5. The name of the output FFT file that contains just an asymmetric unit of
the map (or whatever you have specified as such in your preferences).

The routine will calculate the FFT. If this is a Fourier, the program
willthen prompt to ask if you would like to find the peaks or minima 
in the map (see PEAKSEARCH). For both Fouriers and Pattersons, the program
will prompt you to ask if you would like to convert the map to 
BOSS format, expanding the map to any range of grid points you would 
like (see FFTTOBOSS).  It will also ask if you want to convert it to
DSN6 format (for reading into TOM or O).  This will only work on an SGI.
It will also ask if you want to convert the map to
FSFOUR format for display with MAPVIEW (see FFTTOFSFOUR).

KEYWORDING inputs:

KEYWORDS                VALUES

PATTERSON       Patterson map
NCOLPATT**2     Column # in input map for squared Patterson coefficients
            -or-
NCOLPATT                column # containing Patterson coefficients


FOURIER         Fourier map, reading in A, B coefficients directly
NCOLFA                  column for A 
NCOLFB                  column for B 

NATFOURIER      same as isofourier, below
ISOFOURIER      F exp(iPHIc) map (F = del iso or F, phase = PHI)
ANOFOURIER      F exp(i[PHIc-90]) map (F = del Ano, phase = PHI - pi/2)
NCOLF           column for F or Del F 
NCOLPHI         column for PHI (in degrees)

FOFC            Fo-Fc exp(i PHIc) map
2FOFC           2Fo-Fc exp(i PHIc) map
NCOLFOBS        column for Fo 
NCOLFC          column for Fc  
NCOLPHI         column for PHI (in degrees)


INFILE          name of input DORGBN file with data
FOURCOFILE      name of optional output file with A,B coefficients

FFTFILE         name of output file with FFT in UCLA format
PEAKFILE        name of optional file with nlist high and low peaks
                  (program finds peaks if this file is defined)
BOSSFILE        name of optional file with FFT in BOSS format
                  (program writes bossfile if it is defined)
DSN6FILE        name of optional file with FFT in DSN6 format
                  (program writes DSN6 file if it is defined)
                  (NOTE: this only works on an SGI!)
FSFOURFILE      name of optional file with FFT in FSFOUR format
                  (program writes FSFOUR file if it is defined)

DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider
                  (DMIN and DMAX are required)

NLIST           number of high and low peaks to list to peakfile
ISYMMETRY       number of symmetry equivalents for each peak to list
                        (put a big number to get all within all adjacent 
                        unit cells)
PDB             list peaks in PDB format
FRACT           list peaks in fractional format

-------------------------------------------------------------------------------

**** GETISO
**** GETANOM
**** GETPHASES

These are routines to take information from several columns of a dorgbn
file and to generate a new file containing either (1) isomorphous differences
between datasets, (2) data in the form F+,F- converted to Fbar and del Ano, or
(3) A,B coefficients converted to structure factor amplitude and 
phase in degrees.

The command structures and operation of each of these routines is 
shown below:

GETISO:

1. Input dorgbn file
2. Output dorgbn file (1 column of "del F")
3. Overall title for output file
4. Columns for Fnat and sigma of Fnat in input file (0 if absent)
5 Columns for Fder and sigma of Fder in input file (0 if absent)
6. Minimum ratio of Fnat/sig or Fder/sig to consider (all others tossed)
7. Resolution range to consider (e.g., 2.5 100.)

The program then calculates Del F = Fder-Fnat  for the selected reflections
and writes it to the output dorgbn file.  If either Fder or Fnat is missing
(less than the minimum ratio of F/sig) the reflection is ignored.


GETANOM:

1. Input dorgbn file
2. Output dorgbn file (4 columns: Fbar,sig of Fbar, Del Ano, sig of Del Ano)
3. Overall title for output file
4. Columns for: F+, sigma of F+ in input file
5. Columns for: F-, sigma of F- in input file
6. dmin,dmax (resolution range to consider, unless previously specified)

The program then calculates Fbar = (F+  +  F-)/2
and del Ano = (F+ - F-), for the selected reflections and writes it to 
the output dorgbn file.  If F+ or F- are  missing (F less than or equal to
0), the one present is written out as Fbar and del Ano and sig of 
del Ano are set to 0.0.

GETPHASES

1. Input dorgbn-style file
2. Output file (2 data columns. F and phase)
3. Title for output file
4. Column numbers for A, B = F cos(PHI), F sin(PHI) 
5. dmin,dmax (resolution range to consider, unless previously specified)
Routine calculates F and PHI in degrees from A and B and writes them
to the output file.  Data where A or B are both equal to -1.0 are rejected as
are data where A and B are both equal to 0.


KEYWORDING:

For each of these routines, you need to specify:

INFILE                  name of dorgbn file with data
OUTFILE                 name of output dorgbn file

FILETITLE               optional title for output file
DMIN                    minimum d-spacing to consider
DMAX                    maximum d-spacing to consider

                For GETISO

NNATF                   column number for Fnat 
NNATS                   column number for sigma of Fnat 
NDERF                   column number for Fder 
NDERS                   column number for sigma of Fder 

RATMIN                  minimum ratio of F/sig to include at all


                For GETANOM:

NCOLFP                  column number for Fplus
NCOLSFP                 column number for sigma of Fplus
NCOLFM                  column number for Fminus
NCOLSFM                 column number for sigma of Fmimus


                For GETPHASES:

NCOLFA                  column number for F cos(phi) = A
NCOLFB                  column number for F sin(phi) = B
-------------------------------------------------------------------------------
**** IMPORT
**** EXPORT

These are utilities to bring formatted data into dorgbn format and to
write out formatted data without titles (BTOF writes out titles).

The routine IMPORT has several options.  You can simply read the data
from a formatted file in, assuming it is h,k,l, and columns of data.  You can
also swap indices (as H->K, K->L, L->K) as you read it in.  You  
can also sort the data and map it to the asymmetric unit of the space
group.  Ordinarily you will want to sort and map the data, as some of
the other routines in the package (notably FILEMERGE) assume that the
data has been sorted in a particular order of hkl.

When you IMPORT data, it is essential that the input file has the same number
of data columns in every line of the file.  The input file can have text
in the middle of a data column, this text will be ignored.


EXPORT writes out a formatted version of a dorgbn-style file, with no
titles.  It can be read back in with IMPORT.


The commands for IMPORTing data are:

1. Input formatted data file name 

(the program will then type the first 3 lines of the file as read in, and
then again after stripping off any text)

2.  Output dorgbn-style file name
3.  Overall title for output file
4.  Number of columns of data (not counting h,k,l) in input file
5a...Title for each of these columns of data
6.  Overall scale factor to apply to all data
7.  lsort: 'y' to sort and map data, 'n' to leave it as is
8.  lswap: 'y' to swap indices hkl
  (only read if lswap='y'): 8a. HNEW:  index H will be mapped to HNEW
  That is, if you want to map old H->new K, old K->new L, old L ->new K, then
   you specify HNEW = "K"


Commands for EXPORT
1.  Input dorgbn-style file name
2.  Output formatted file name

KEYWORDING:

IMPORT ignores keywords.  Use it interactively.
EXPORT keywords:

INFILE                  name of file to be exported
OUTFILE                 name of output formatted file


-------------------------------------------------------------------------------
**** BTOF 
**** FTOB

          Binary TO Format conversion (BTOF)
          Format TO Binary conversion (FTOB)

These routines convert data in DORGBN-structured binary files to and
from formatted files.  See documentation on DORGBN files at the end of this
filefor a  description of the DORGBN file structure.

These programs are interactive:

BTOF will prompt for "INPUT FILE>" which is the binary file to be
converted to formatted structure, and, "OUTPUT FILE>" which is the
new formatted file.

FTOB will prompt for "INPUT FILE>" which is the formatted file to be
converted to DORGBN structure, and "OUTPUT FILE>" which is the
new DORGBN file.

You may wish to alter these programs to read your standard data files and
convert them to DORGBN files.


KEYWORDING:

INFILE                  name of input file
OUTFILE                 name of output file

-------------------------------------------------------------------------------

**** FILEMERGE:  Binary reflection data manipulation

    This routine allows manipulation of binary data files that are in
the "dorgbn" format.  You can extract one or more "columns" of data from
a file, duplicate columns from a file, or combine parts of different files.

     This routine is based on the UCLA program DORGBN.  Data are stored in 
binary files in the form h,k,l,resolution,and columns of "data".  
At the beginning of the file are an overall title for the dataset and 
individual titles for each "data" column.

     The program is intended to be run interactively.  The command format
is:

1.  # of Files to be opened for input (up to 4)
2a... Input file names 1...n.
3. Output file name
4.  Title for output file.  This title should  describe the  
    contents  of the entire file. 

5.  Command lines.  Each command can specify a range of columns of 
data from some particular file to be incorporated into the output file.   
These  data columns  are  incorporated  in the order in which the commands are  
specified.  Command input parameters: (these are 1 or 2 digit integers and an 
optional title, all separated by commas and no blanks):

  1.  NFILE:  The number of the file treated.
  2.  ICOL,JCOL,TITLE: the RANGE of columns to be copied.  If you want to
copy just column #3, specify: 3,3 here.

   The title is an optional title to be substituted  for  
that  already associated  with the first column in the range.   
If  the  specified range  includes  more  than one column and this 
title field is used, the titles for  all  the other  columns  in  
the range specified by this command must be input in the very next records. 
If  this  field is blank, the old title will be used.


7.  More command lines.
8.  Blank line to signify the end of input.


    An additional useful feature of the program is that  if  a  command  
is entered with a valid file number but with the column numbers missing or 
incorrect, the titles of  the  columns on that file are printed for the user.

EXAMPLE OF USE OF DORGBN. (ALL CHARACTERS BEFORE THE ">" ARE  TYPED BY 
PROGRAM)

DORGBN
  OUTPUT FILE>KI.PAT               (THE NEW DORGBN-STYLE FILE)
  INPUT FILE 1>KI.DRG              (THE STARTING DORGBN FILE)
  INPUT FILE 2>                (CARRIAGE RETURN TO INDICATE END
                                     OF LIST OF INPUT FILES)
  OVERALL TITLE>PATT COEFFICIENTS FOR KI.DRG               (TITLE)
  COMMAND>1,1,1,PATT COEFFS COLUMN 1 FROM KI.DRG   (WRITE COLUMN               
                                    1 FROM
                                     STARTING DORGBN FILE INTO NEW FILE.
                                     NOTE THAT H,K,L AND RES ARE ALSO
                                     WRITTEN AUTOMATICALLY)
  COMMAND>                      (CARRIAGE RETURN TO INDICATE END OF 
                                                          COMMANDS)
  FORTRAN STOP

-------------------------------------------------------------------------------
**** MERGE

This is a routine that merges measurements of structure factor amplitudes and
rejects outliers.  It summaryizes the quality of the dataset in a listing of
R-factors on I and on F.

The method followed by the program is:

1. group equivalent reflections together, analyze 1 group at a time.
2. get mean, sd for this group
3. reject observations differing from mean by >4 sigma
4. reject reflection outright if Chi-squared is greater than and ikeepflag=0
5. calculate stats based on what's left
6. figure out the relationship between sigmas in the input files and
  reasonable estimates of the true sigmas by assuming that the reduced 
  chi-square would equal 1.0 if the correct sigmas were present.  The data
  are fit to the equation,

    Sig**2(I)=Sig**2(Poisson)+(  A*I)**2

  and all sigmas are corrected with this factor.

6. write out mean, SEM for the reflection


Control parameters for MERGE are:

1.  Resolution range to consider (e.g., 2.5 100)
2.  ikeepflag     =0 to toss reflections with high chisqr, 1 to keep them
3.  Output  dorgbn-style file
4.  Number of input files
5a.  First input dorgbn-style file...  It is ASSUMED that columns 1,2 are your
    values of F and sigma. (If this is not true, you need to run FILEMERGE 
    first to create such a file)
5b.  Another input data file name (if more than one) .


Notes:  The input data files do not need to have
data in any particular order or to have complete datasets.

The data are written out starting with minimum H,K,L and incrementing
L fastest, then K, then H.

The program reports the number of rejects as NNN + MMM where
NNN = the number rejected as being too far from the mean for that
reflection and MMM is the number of reflections rejected completely with
chisqr > 20.


KEYWORDS:

DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider

NFILES          # of input files (1 to 4)
INFILE(1)       input file 1
INFILE(2)...    input file 2 (up to 4 files)

KEEPALL         keep all reflections, regardless of merging chisqr (default)
TOSSBAD         toss reflections with merging chisqr> 20
                Note: KEEPALL and TOSSBAD also apply to LOCALSCALE

OUTFILE         output file

-------------------------------------------------------------------------------
**** MADMRG:  Convert MAD data into a form similar to that used
in the analysis of SIR data + anomalous differences.

Reference: Terwilliger, T. C. (1994).  MAD phasing: treatment of
dispersive differences as isomorphous replacement information.
Acta Cryst. D50, 17-23.


Function:  MADMRG reads in measurements of Fbar and the anomalous differences
DelAno at several wavelengths for each reflection.  From this data and the
known values of f' and f" for the anomalously scattering atoms at 
these wavelengths, the program estimates (1) the magnitude of the 
structure factor corresponding to all atoms except the anomalous 
scatterer (Fo),(2) the "isomorphous" difference that would be 
measured +/- the anomalous scatterer at a standard wavelength, 
and (3) the anomalous difference that would be measured
at this standard wavelength.  In this way, the MAD data is converted to a form
identical to that used in the analysis of SIR+anomalous differences data.


Method:  Assume that structure factor due to anomalous scatterer is not
large compared to that due to all other atoms.  Then iso differences among
various wavelengths are proportional to differences in (f+f') for 
the anomalous scatterer, and ano diffs at each wavelength
are proportional to f".  Scale all the  ano diffs to a common
wavelength, then average them.  Take all the iso diffs (e.g., L3-L1, L3-L2,
L1-L2), scale each iso diff by:
 (f+f' at std wavelength)/(difference in f+f' at the 2 wavelengths)
to obtain estimate of what would be measured for the structure factor amplitude
due to the entire structure at the standard wavelength minus the structure
factor amplitude of the entire structure without the anomalously scattering
atoms.  That is, estimate, delta Fiso (+/- ano scatterer at standard 
wavelength).  Finally, obtain estimates of what delta Fiso would be at 
each wavelength by scaling std Fiso by f+f' at that lambda.  This 
allows the program to obtain estimates of Fo, the structure factor 
amplitude due to all non-anomalously scattering atoms from each 
value of (Fbar - Fiso at that lambda).  These estimates of Fo are averaged.

The result of these manipulations is a pseudo-SIR+anomalous differences dataset.
The "native" structure factor amplitude is Fo, the estimate of the structure
factor amplitude due to all non-anomalously scattering atoms at the standard
wavelength.  The "derivative" structure factor amplitude is Fo plus the 
isomorphous difference, delta Fiso, corresponding to the contribution of the 
anomalously scattering atoms at the standard wavelength.  The anomalous 
difference is the averaged anomalous difference, scaled to the value 
at the standard wavelength.

Generally, the standard wavelength is chosen to be one well away from the
absorption edge of the anomalously scattering atoms, so that f' is small or
negligible.  This is not essential, however.

USING MADMRG


Required scattering factor table:

Table of scattering factors for heavy atom at various
wavelengths.  Read from  madmrg.STD or any other file name you specify 
if you use keywording.   (modify existing file if
necessary.  All data is in form used in Int Tables vol. IV.)
Note: madmrg.std is currently set up for selenium.  It is trivial to
change it for any other atom
   
   The order of scatter factors is a1,a2,a3,a4..., not a1,b1,a2,b2...
   as listed in the new International Tables, volume C, p.500-502.

   f' and f" spectra are listed in "Macromolecular Crystallography with
   Synchrotron Radiation" by John Helliwell, Cambridge University Press 1992.
   Appendix table A3.2


Control parameters:

1. Resolution range to consider (i.e., 2.5 100)
2.  File containing scattering factors (MADMRG.STD)
3.  Input dorgbn-style data file.
4.  Output dorgbn-style data file.
5.  Number of protein residues in asymmetric unit
6.  Number of anomalously scattering atoms in asymmetric unit.
7.  Number of wavelengths represented in data file

--for each wavelength:--
8a.  Title for this wavelength
9a.  wavelength number
10a. Columns in input file for Fbar,sigma of Fbar, DelAno, sig of Delano at
   this wavelength

--------
11.  Wavelength # to refer all data to for output. This is usually the
wavelength far from the absorption edge for the anomalous scatterer.


KEYWORDS:

DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider

INFILE          input file
OUTFILE         output file

STDFILE         file containing scattering factor info (e.g., MADMRG.STD)
NRES            # of protein residues in asymmetric unit 
NANOMALOUS      # of anomalously scattering atoms in the a.u.
NSET            # of wavelengths in dataset

        For each wavelength, specify:

LABEL(1)        label for wavelength 1  (LABEL(2) for wavelength 2...etc)
JLAMBDA(1)      wavelength ID for this wavelength (usually 1 for lambda 1)
NCOLFBAR(1)     Fbar for this wavelength
NCOLSFBAR(1)    sigma of Fbar
NCOLDELF(1)     Del F ano (Fplus - Fminus)
NCOLSDELF(1)    sigma of del F ano

JSTD            wavelength ID for wavelength to be considered the STANDARD


Notes on MADMRG:

Detailed description of input parameters:

OUTPUT FILE:  This a DORGBN-style output file containing 7 columns of
                data.  They are:
                   1   madmrg est of Fp-zero   ("Fnative")
                   2   madmrg sig of fp-zero   ("sig of Fnative")
                   3   madmrg est of del iso
                   4   madmrg sig of del iso   ("Sig of Fderiv")
                   5   madmrg est of del ano   ("Delano")
                   6   madmrg sig of del ano   ("Sig of Delano")
                   7   madmrg: MOCK FDER ("Fderiv"; 
                               equal to Fp-zero + del iso)

To use this data as MIR + anomalous differences, simply use columns 1 and 
2 as Fp (native F) and sigma;  columns 7 and 4 as Fder (derivative F) and 
sigma; and  columns 5  and 6 as delano and sigma.  


Please note:  The output of MADMRG is set up to be used as "Mock" native and
derivative data.  When you refine heavy atom parameters using this mock
dataset, you must define a heavy atom type that has scattering factors
identical to those you use in MADMRG at the "standard" wavelength.  That is,
if you define lambda 3 as "standard" in madmrg and f" at lambda 3 is 8.9933,
then when you get to heavy atom refinement with routine HEAVY
you will need to define an atom type  "L3" (or something) that has all the
right scattering factor information including an f" of 8.9933.  Use the
keyword NEWATOMTYPE in HEAVY to do this easily.


PLEASE NOTE: when you use this data in your MIR program, DO NOT refine
an overall scale factor and B for the "derivative."  The overall scale
factor and B of the derivative relative to the (pseudo) native are 
absolutely perfect to start with (because of the way the derivative has 
been set up).  In this package, use the flag "NOREFINESCALE" for the
derivative.

The reason to use column 4 as sigma of Fder is that heavy atom refinement 
programs such as HEAVY assume that errors in Fp and Fder are independent.  
In this case they are not.  Suppose you estimated the error in Fder 
by combining errors in Fo and deliso.  Then your heavy atom refinement 
program would estimate the error in Fder-Fnat by combining the   
errors you give it for Fder (based on errors in deliso + Fo) and the 
errors you give it for Fnat (the error in Fo).  The estimates of errors in 
Fder-Fnat would therefore contain the errors in Fo twice.  If you use 
column 4 as sigma of Fder, the errors in Fder-Fnat will be correctly 
calculated based on deliso and Fo.


Input data file:   This input data must be scaled carefully.  MADMRG does not
scale your data for you.

# OF PROTEIN RESIDUES.  The program assumes that the B-factor for the 
anomalously scattering atoms is similar to that for all other atoms.  
Using the number of protein residues and the number of anomalously 
scattering atoms on the next line, the program estimates the rms 
value of structure factor amplitudes due to anomalously scattering 
atoms as a function of resolution.  Each of these are for the asymmetric unit.


Annotated and condensed example using MADMRG, with comments in []

Scattering form factors read for  3 wavelengths.  [read from MADMRG.STD]

 1:   lambda =  0.9797. a(4),b(4),c,fp,fpp for S,Se,N:
 6.905  5.203  1.438  1.586  1.468 22.215  0.254 56.172  0.867  0.319  0.557
17.001  5.820  3.973  4.354  2.410  0.273 15.237 43.816  2.841 -9.851  2.858
12.213  3.132  2.013  1.166  0.006  9.893 28.997  0.583-11.529  0.000  0.000

Set #  1 Label: l1     [lambda 1 data]
Set #  2 Label: l2     [lambda 2 data]
Set #  3 Label: l3     [labmda 3 data]

Set:                  1         2         3
Lambda #:             1         2         3
Lambda :         0.9797    0.9794    0.9000
N protein :         644       644       644        [# of protein atoms in a.u.]
# of S or Se:         2         2         2        [# of anom scattering atoms]
Fxn Se      :    1.0000    1.0000    1.0000        [fractional substitution of
                                                      S with Se, always 1.0 for 
all
                                                      other atoms]
Col for Fbar:         1         5         9         [columns in input data file]
Col for Sig :         2         6        10
Col DelfAno :         3         7        11
Col for sig :         4         8        12

Which of the  3 wavelengths is to be defined as the "standard"
at which all values of Fa will be calculated? >
3,                    std wavelength # 

Wavelength # 3 with lambda=0.9000, corresponding to
data set  3 will be used as the standard wavelength.

Output file: DATA.MAD
col  0 : madmrg output:                                                        *
col  1 : madmrg est of Fp-zero                                                 *
col  2 : madmrg sig of fp-zero                                                 *
col  3 : madmrg est of del iso                                                 *
col  4 : madmrg sig of del iso                                                 *
col  5 : madmrg est of del ano                                                 *
col  6 : madmrg sig of del ano                                                 *
col  7 : madmrg: MOCK FDER                                                     *

Based on space groupC2   the first acentric reflection in this
dataset is: (  -25    1    2).

                                [space group-specific test for centric reflns]

Based on space group C2   the first centric reflection in this
dataset is: (  -24    0    1).

 1763 REFLECTIONS READ FROM THIS FILE


Form factors at lambda =   0.9797        [lambda 1 form factors as fn of reso
                                        [If your anomalously scattering atom is
                                         100% subtituted, ignore the "S" data.
                                         If your atom is not Se, you have
                                         altered MADMRG.STD and "Se" means
                                         whatever atom you put in there.]

dmin:       4.00   3.00   2.80   2.65   2.50   2.40   2.30   2.20   2.10   2.00
f (S):     14.52  12.96  12.20  11.88  11.58  11.32  11.09  10.86  10.62  10.37
f" (S):     0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56
f (Se):    21.64  19.40  18.28  17.80  17.33  16.92  16.57  16.19  15.79  15.36
f" (Se):    2.86   2.86   2.86   2.86   2.86   2.86   2.86   2.86   2.86   2.86
f (N):      6.20   5.42   5.01   4.82   4.65   4.49   4.36   4.21   4.06   3.90
f" (N):     0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00


Form factors at lambda =   0.9794

dmin:       4.00   3.00   2.80   2.65   2.50   2.40   2.30   2.20   2.10   2.00
f (S):     14.52  12.96  12.20  11.88  11.58  11.32  11.09  10.86  10.62  10.37
f" (S):     0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56
f (Se):    22.86  20.62  19.50  19.01  18.55  18.14  17.78  17.40  17.00  16.58
f" (Se):    4.88   4.88   4.88   4.88   4.88   4.88   4.88   4.88   4.88   4.88
f (N):      6.20   5.42   5.01   4.82   4.65   4.49   4.36   4.21   4.06   3.90
f" (N):     0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00


Form factors at lambda =   0.9000

dmin:       4.00   3.00   2.80   2.65   2.50   2.40   2.30   2.20   2.10   2.00
f (S):     14.52  12.96  12.20  11.88  11.58  11.32  11.09  10.86  10.62  10.37
f" (S):     0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56   0.56
f (Se):    29.87  27.63  26.51  26.03  25.56  25.15  24.80  24.42  24.02  23.59
f" (Se):    3.28   3.28   3.28   3.28   3.28   3.28   3.28   3.28   3.28   3.28
f (N):      6.20   5.42   5.01   4.82   4.65   4.49   4.36   4.21   4.06   3.90
f" (N):     0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00


Scaling factors to renormalize data:
"all" = relative expected total rms F at this lambda
"prot" = relative expected protein rms F at this lambda
"SE or S to Se" = relative expected Se or S rms F
at this lambda (relative to Se at standard wavelength)




For dataset # 1    Label: l1                                                  (*

lambda =   0.9797

dmin:                 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
scale (all):          0.98 0.98 0.98 0.98 0.98 0.98 0.97 0.97 0.97 0.97
scale (prot):         1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (SEorS to Se):  0.72 0.70 0.69 0.68 0.68 0.67 0.67 0.66 0.66 0.65
anom ratio:            0.9  0.9  0.9  0.9  0.9  0.9  0.9  0.9  0.9  0.9


For dataset # 2    Label: l2                                                  (*

lambda =   0.9794

dmin:                 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
scale (all):          0.99 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.97
scale (prot):         1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (SEorS to Se):  0.77 0.75 0.74 0.73 0.73 0.72 0.72 0.71 0.71 0.70
anom ratio:            1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5


For dataset # 3    Label: l3                                                   *

lambda =   0.9000

dmin:                 4.00 3.00 2.80 2.65 2.50 2.40 2.30 2.20 2.10 2.00
scale (all):          1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (prot):         1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
scale (SEorS to Se):  1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
anom ratio:            1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 Enter name of .drg file containing reflection data>


Summary of averaging results.
Total of  1763 reflections written out
Type  Mean Chi     Naveraged     Chi cutoff    n rejected
Iso    0.7203      1763           3.0000             0
Ano    0.9044      1763           3.0000             0
Fbar   0.5980      1763           3.0000             0

       [  These are estimates of sqrt(chi**2) for estimation of isomorphous and
anomalous differences, and Fo.  They should be near 1.0.  Reflections for which
chi**2 is > 3.0 are tossed.]


Summary of merging statistics by shell:

     shell              Nobs     Mult   RMS value rms error  Norm RMS value    *

   1   4.000 Iso :     732.       3.0   125.544    48.835     0.376     0.146
             Ano:      732.       3.0    28.096    14.242     0.084     0.043
             Fbest:    732.       3.0   333.477    40.914     1.000     0.123

   2   3.000 Iso :    1031.       3.0    86.414    33.233     0.347     0.133
             Ano:     1031.       3.0    21.607    10.912     0.087     0.044
             Fbest:   1031.       3.0   248.971    27.400     1.000     0.110


All          Iso :    1763.       3.0   104.456    40.448     0.364     0.141
             Ano:     1763.       3.0    24.510    12.403     0.085     0.043
             Fbest:   1763.       3.0   287.094    33.676     1.000     0.117



[ This table says that:  there were 732 isomorphous differences estimated in
the shell from 4 A to infinity.  There were an average of 3 estimates of each 
isomorphous difference (one from each lambda).  The rms isomorphous difference
was 125.544.  The rms error estimate was 48.835.  The rms isomorphous difference
normalized to the rms estimate of Fo was 0.376, and the normalized sigma was 
0.146.

Note that the errors in the isomorphous and anomalous differences are quite 
large, even in this test case with model data.]

------------------------------------------------------------------------------

**** PEAKSEARCH

This is a routine that finds high and low points in a Fourier map.  It is
not applicable to Patterson maps (use HASSP for that purpose).  It assumes
that the FFT has been calculated over the entire asymmetric unit and uses
the symmetry file to map neighboring grid points on to the asymmetric unit.
It reports the highest peaks in the map, with the height being the highest
value of the FFT on a grid point and the coordinates being the centroid of
the peak.

The routine can read in an FFT written by UTIL or it can be called at the
end of routine MAPS.  Note that it cannot read BOSS format data.

Control parameters for PEAKSEARCH

1.  Name of FFT file containing the asymmetric unit of the map
2.  NPEAK, the number of positive and negative peaks to list (maximum = 100)
3.  Name of file to write the peaks out to
4.  ISYMMETRY= # of symmetry equivalents to write out for each peak.
    If you want all peaks in the region used for FFT calculations, use
     ISYMMETRY=0.  If you want all in the region used for FFTTOBOSS, then use
     ISYMMETRY=-1.
5. Format to write peak list (PDB format with waters or fractional)

The routine will write out NPEAK highest and NPEAK lowest peaks to the
output file.  The "B-factor" in the PDB format file is the peak height/1000.
The final column in the fractional-format file is the peak height/1000.


KEYWORDS:

FFTFILE         name of fft-containing file
NLIST           number of peaks (high, same # of low) to list
ISYMMETRY       # of symmetry equivalents to list for each peak 
                   -1 for all within FFTTOBOSS region, 0 for all within
                    FFT region. 1= default= just 1 for each peak
PEAKFILE        name of output file
PDB             write peaks in PDB format
FRACT           write peaks in fractional format
POSITIVEONLY    only list positive peaks        
NEGATIVEONLY    only list negative peaks

------------------------------------------------------------------------------
**** FFTTOBOSS

This is a routine that converts an asymmetric unit of an FFT in the UCLA
FFT format to any region of the map in the BOSS format.  This routine is
applicable to Fourier maps, but can be used with Patterson maps as long
as the output region is contained within the input FFT.

Control parameters:

1.  Name of file containing FFT map
2.  0 for Patterson, 1 for Fourier map
3.  Output file (BOSS format)
4.  Title for output file

The output map is calculated using the same grid as the input FFT, but
the endpoints in x,y, and z can be different.  The program generates the
entire unit cell from the input FFT if the endpoints of the output BOSS map
are not contained within the input FFT.

The grids for the input FFT and the output BOSS map are set in 
PREFERENCES.

The output map is scaled so that the rms value of the map is 5.000.

KEYWORDS

FFTFILE         name of FFT-containing file
BOSSFILE        name of output BOSS format file
FILETITLE       optional title for file 

------------------------------------------------------------------------------
**** FFTTODSN6

This is a routine that converts an asymmetric unit of an FFT in the UCLA
FFT format to any region of the map in the DSN6 (brick) format for TOM/FRODO
or O.  This routine is applicable to Fourier maps, but can be 
used with Patterson maps as long as the output region is contained 
within the input FFT.

Control parameters:

1.  Name of file containing FFT map
2.  0 for Patterson, 1 for Fourier map
3.  Output file (DSN6 format)
4.  Title for output file

The output map is calculated using the same grid as the input FFT, but
the endpoints in x,y, and z can be different.  The program generates the
entire unit cell from the input FFT if the endpoints of the output BOSS map
are not contained within the input FFT.

The grids for the input FFT and the output map (same as for the BOSS-format
map) are set in PREFERENCES.

The output map is scaled so that the rms value of the map is 5.000.

KEYWORDS

FFTFILE         name of FFT-containing file
DSN6FILE        name of output DSN6 format file
FILETITLE       optional title for file 

------------------------------------------------------------------------------
**** FFTTOMAPVIEW

This is a routine that converts an asymmetric unit of an FFT in the UCLA
FFT format to a format compatible with the PHASES package.  This is
useful for displaying the map using MAPVIEW in the PHASES package.

Control parameters:

1.  Name of file containing FFT map
2.  Output file (PHASES-compatible format)

The output map is calculated using the same grid as BOSS-style output
maps.  This grid may be set in PREFERENCES and is called IBOSSGRID.
For Pattersons, this grid must be contained within the  FFT grid,  IFFTGRID

The grid for the input FFT is set in PREFERENCES.


KEYWORDS

FFTFILE         name of FFT-containing file
MAPVIEWFILE     name of output BOSS format file

------------------------------------------------------------------------------
**** LOCALSCALE

This is a package to scale a "derivative" dataset to a "native" dataset using
local scaling.  In this method the scale factor for a particular reflection
is based on the ratio of derivative:native for reflections surrounding this 
reflection.  This method is useful because the scale factor is not restricted
to any particular function of position in reciprocal space.

In this implementation, at least 30 reflections surrounding the reflection
to be scaled are used to obtain a scale factor.  Additionally, the reflections
used in obtaining a scale factor are always chosen so that they form a
complete sphere around the reflection of interest (inasmuch as possible).

Initial Wilson scaling is carried out before local scaling.

Data files:  The program expects to read in two data files: one for the
native dataset and one for the derivative.  The native dataset is expected
to have h,k,l, F and sigma (at least).  The derivative dataset is expected
to have h,k,l, F, and sigma, and, if desired, del F ano and sigma of del
F ano.  The scale factor obtained for the derivative F is applied to all of 
the derivative data.

A dorgbn-style file is written out containing the scaled derivative data.
If you wish to have the derivative and native data in the same file, then
follow this with the routine "FILEMERGE" and merge the two files.



Command parameters:

1.  Resolution range to consider (all data outside the range is ignored)
2.  Data file containing native data
3.  column numbers for native F and sigma (use "0" if no native sigma)
4.  Data file containing derivative data
5.  column numbers for derivative F, sigma, del Ano, sigma of del ano
    (use "0" if missing; you need to input 4 numbers here)
6.  Name of output data file for scaled derivative data
7.  Title for output file
8.  inotoss = 1 if you want to keep reflections with del F much greater
        than expected from the sigmas and the rms deviations for other
        reflections; 0 if you want to toss these reflections.
        Suggestion:  use "1" for most cases, but if you are about to
        use GETISO to calculate a Patterson or difference Fourier, you
        might want to exclude these outliers and use "0".
9.  minimum number of reflections to use in scaling.  Suggested value: 30
10.  minimum ratio of F/sig for native or derivative to be read in at all.
        Suggested value: "0.0".  The program only uses data with F/sig>3.0
        for actually calculating scale factors, so this value does not
        affect your scale factors.  It only affects what data is scaled
        and your R-factors at the end and what data is written out.



KEYWORDS:

DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider

INFILE(1)       name of file with Native data
INFILE(2)       name of file with Derivative data to scale to Native
OUTFILE         name of output file with scaled derivative data

NNATF           column # for F of native data
NNATS           column # of sigma of F of native data

NDERF           column # for F of deriv data
NDERS           column # for sigma of F of deriv data
NANOF           column # of anomalous difference (Fplus-Fminu) of deriv data
NANOS           column # of sigma of anomalous difference

                (note: be sure to set those you don't want to 0)

FILETITLE       optional title for output file

KEEPALL         keep reflections even with high differences (default)
TOSSBAD         Toss reflections if differences between native and derivative
                  are more than 3 * the rms found for other reflections.
                Note: KEEPALL and TOSSBAD also apply to MERGE

ANCUT           minimum # of reflections to use to scale a reflection (30.)
RATMIN          minimum ratio of F/sigma to include


Notes:

1. A value of 0 or less for fnat or fder  is assumed to mean
data are not measured.  A value of 0.0 or -1.0  for del f ano is assumed to
mean the data are not measured also.

2.  If sigmas are not supplied at all, then a value of 1.0000 will be assumed.
This can affect what data are read in if you specify a minimum F/sig >0.0

------------------------------------------------------------------------------
**** COMPLETE

This is a routine to determine the completeness of a dataset.  It maps
input data to the asymmetric unit of the space group and calculates the
percentage of data that is present.

Control parameters:

1. Resolution range to consider (e.g., 2.5 100)
2. Name of file containing data to be examined.
3. Column numbers for F and sigma of F (enter 0 if sigma not present)
4. Minimum ratio of F/sigma to include


KEYWORDS:

DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider

INFILE          name of file to be examined
NNATF           column # for F of data
NNATS           column # of sigma
RATMIN          minimum ratio of F/sigma to include

------------------------------------------------------------------------------
**** WEIGHTS

This is a routine to generate weighting factors for atomic refinement.
The weighting factors are based on both experimental sigmas and on rms
values of (Fobs-Fcalc)**2 in ranges of resolution.  The premise for this
type of weighting is that the atomic model used to generate Fcalc is
incomplete.  This leads to an expected difference between Fobs and Fcalc
that is larger for centric reflections than for acentrics by a factor of
sqrt(2).  The errors in the fit of the model to the data are divided into
two parts, one due to errors in measurement and one due to errors in the
model.  It is assumed that errors in measurement are reasonably well known.
If this is not the case, then just do not include them (see below).

     The errors in the model are estimated in a shell of resolution as

          E**2 = [ < (Fobs-Fcalc)**2 > - ] / Q

where  Q=1 for centric reflections and 0.5 for acentric reflections.
The weighting factor applied to a particular reflection is then:

         WEIGHT = 1/( Q * E**2  + Sigma**2 )

where Q is again 0.5 for acentric reflections and 1 for centric ones.

Reflections where Sigma is not >0 or Fobs is not > ratmin*sigma or Fcalc is
not >0 are ignored and not written out.

Control parameters:

1. Resolution range to consider (e.g., 2.5 100)
2. Name of file containing Fobs,sigma and Fcalc data.
3. Column numbers for F and sigma of F and Fcalc (enter 0 if sigma not present)
4. Output file name
5. Minimum ratio of F/sigma to include= ratmin


KEYWORDS:

DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider

INFILE          name of file with Fobs,sigma, and Fcalc
OUTFILE         name of output file in X-PLOR format with h,k,l,fobs,sig,weight

NNATF           column # for F of data
NNATS           column # of sigma
NCOLFC          column # of Fcalc

RATMIN          minimum ratio of F/sigma to read in at all 
------------------------------------------------------------------------------
**** FDIFF

This routine creates a pseudo-mutant dataset for difference refinement.
It is used in cases where a "WT" structure has been refined and a 
"mutant" dataset is available, and where it is the differences between 
the WT and mutant structures that is of interest.  This routine
takes Fcalc for the WT dataset and Fobs for the WT and mutant
datasets to create a pseudo-mutant dataset: Fdiff and sigma.  These are
then used just as if they were Fobs,mutant and sigma in refinement of the
mutant structure.

The value of Fdiff is given by:

        Fdiff = Fc (WT) + (Fobs, mutant - Fobs, WT)

Control parameters for FDIFF:

1. Input file with Fc (WT), Fo (WT), sig(WT), Fo(MUT), sig(MUT)
2.  Output file name for Fdiff, sigma and Delta=(Fo,MUT-Fo,WT)
3. Column number in input file for Fc(WT)
4.  Column numbers for Fo(WT) and sigma of Fo(WT)
5.  Column numbers for Fo(MUT) and sigma of Fo(MUT)
6.  Overall title for output file
7.  Resolution range to consider (dmin, dmax)

The program ignores reflections where any of these F's are missing (not
positive)


KEYWORDS:

INFILE                  input file with WT Fc, Fo and MUT Fo
OUTFILE                 output file with FDIFF,sig,Del
FILETITLE               optional title for output file

NCOLFC                  column # for WT Fc (WT Fcalc) in input file
NCOLFOWT                column # for WT Fo (WT Fobs) in input file
NCOLSWT                 column # for sigma of WT Fo

NCOLFOMUT               column # for MUT Fo (MUT Fobs) in input file
NCOLSMUT                column # for sigma of MUT Fo

DMIN                    minimum d-spacing to consider
DMAX                    maximum d-spacing to consider
------------------------------------------------------------------------------
**** MAPTOASYM
**** MAPTOOBJECT

These routines map the atoms in a PDB file, one by one, using crystallographic
symmetry.  MAPTOASYM maps atoms into the asymmetric unit of the space group,
as defined by the symmetry file and the (FFTGRID) grid specified for FFT
calculations, assumed to contain the asymmetric unit.  MAPTOOBJECT maps 
atoms to their symmetry equivalents closest to an atom in a second PDB
("object") file.

Inputs:

MAPTOASYM

1.  Input PDB file name
2.  Output PDB file name.

MAPTOOBJECT

1. Input PDB file name
2. PDB file name containing ojbect to map atoms close to
3. Output pdb file name


KEYWORDS

INFILE(1)       Input pdb file name
INFILE(2)       Object pdb file name for MAPTOOBJECT
OUTFILE         output pdb file name.

        --for MAPTOOBJECT--
DISMIN          Only atoms with closest distance to atom in object between
DISMAX          dismin and dismax will be written out. Default: 0. to 1000000.

------------------------------------------------------------------------------
**** PREFERENCES

This is a routine to set up the user's preferences for
matrices of equivalent positions, cell parameters, and grids for FFT
calculations.  If desired, the preferences will be saved in a file
called "preferences.dat".  If a file with this name exists in the 
default directory at startup of the program, the preferences will be
read in and it is not necessary to enter them unless they need to
be changed.

This routine should usually be run interactively.  Prompts are given
for a file containing the symmetry information, for cell
parameters and for the grids for FFT calculations.  If called when
keywording is in use, the routine simply saves all current values.

The grids for FFT calculations may be somewhat confusing.  There are 3 grids.
One (IFFTGRID) is for calculating an asymmetric unit of a Fourier.  For this 
grid,the values of starting and ending grid points must be in the range from
0 to the number of grid points for the cell translations.

The second grid (IPATTGRID) is for the asymmetric unit of a Patterson.  The same
restrictions apply as for the Fourier.

The third grid (IBOSSGRID)is for creating output BOSS-format maps 
from the asymmetric unit of an FFT.  The range of grid points in 
this map may be anywhere in the range of -256 to 25 in any direction.  It can 
therefore be adjusted to draw the output map around the contours of 
a protein molecule, for example.  For this BOSS grid, the number of 
grid points in the lattice along x,y, and z is identical to that for the 
FFT, only the starting and ending grid points are different.

The cell parameters, symmetry file and the grids for the Fourier, 
Patterson, and Boss maps can be set using keywords as well.

Note on grids:

   All grids for pattersons, Fourier, or other output must have unit
cell translations that are multiples of 2 and 3 and no other numbers.  
Furthermore, any translations in the equivalent positions must correspond to
an integral number of grid units.  This means that along an axis with a
6-fold screw axis, the cell translation must be a multiple of 6, and in a
centered cell, the cell translation along those directions must be a 
multiple of 2.

You may always use any of the following grids for the cell translation, as
they are multiples of 6,2,and 3:  6, 12, 18, 24, 36, 48, 54, 72, 96, 
108, 144, 162, 192, and 216.

The usual way to set up your grids is as follows.  Suppose you have cell
dimensions of 30 x 40 x 90 A, and the resolution of your data is to 2 A.
You will want a grid that is about 1/3 of the resolution, or a 0.7 A
spacing.  This would be 40 along x, 53 along y, and 120 along z.  Since it
is best to use one of the grids listed above, just choose the closest one
to each, usually going to the higher of the two closest ones: 48 x 54 x 128.
Now suppose the asymmetric unit of your unit cell is 0-1, 0-1/2, 0-1/2.  Then
your grid statement for your fouriers will be:

FFTGRID 0 48 48 0 27 54 0 64 128

That is, your FFT will go from 0 to 48 in x where the cell translation is 48,
0 to 27 in y where the cell translation is 54, and 0 to 64 in z where the cell
translation is 128.

The grid for your Patterson maps will usually be the same as for your fouriers,
except that the asymmetric unit will usually be half the asymmetric unit of
the fourier.  That is, your Patterson grid statement will look like:

PATTGRID 0 24 48 0 27 54 0 64 128

Finally, you may want to output Fourier maps that are larger than the
asymmetric unit of your crystal.  For example, you may want the map to
surround your molecule in the crystal.  You can specify any range from -256
to 256 in each direction for this grid, but you cannot change the numbers
corresponding to the cell translation.  Suppose your molecule goes from
-0.25 to 0.5 in x, 0 to .5 in y, and .25 to .75 in z.  Your BOSSGRID is then:

BOSSGRID -12 24 0 27 32 96

Note that you only need 6 numbers for BOSSGRID as you are not respecifying
the cell translations.

The keywords you may wish to set for PREFERENCES are:

KEYWORD

CELL            a,b,c,alpha,beta,gamma (angstroms, degrees)
SYMFILE         name of space group symmetry file
MEPFILE         name of space group symmetry file (same as SYMFILE)
FFTGRID         9 integers specifying grid for fourier
                  These are 3 for each direction x,y,z: nxs,nxe,nx, etc...
                  nxs is starting grid point in x, nxe is ending grid point,
                  nx is # of grid points corresponding to the entire cell edge. 
PATTGRID        9 integers specifying grid for patterson, same form as fftgrid
BOSSGRID        6 integers specifying range of grid for Boss.  Here only the
                  starting and ending grid points in each direction are
                  input, as the cell edge is defined by the fftgrid.


------------------------------------------------------------------------------
**** VIEW

This is a routine that allows you to view binary dorgbn-style files. 

Control parameters:

1. Input dorgbn-style file
2. number of records to print out


KEYWORDS:

INFILE                  input file name
NLIST                   # of records to write out 
 
------------------------------------------------------------------------------
**** SCRIPT

This is a routine that allows you to shift the input controls from the
current input to a specified file.  The input starts out with unit 5 
(the terminal).  When you specify a script file to use as input, the
program expects to find a valid command as the first line in the file,
i.e., "MAPS", or "VIEW".  All commands will be interpreted just as if they
were typed from the terminal.  Script files may call script files as well.

Any time there is an error in reading a script file at the command level,
the script file is closed and control is returned to the file that called
it, or the terminal if the script file was called from there.

Control parameters:

1.  File name to read input from

KEYWORDS:

INFILE            file name to read input from

------------------------------------------------------------------------------
**** LOG
LOG allows you to copy output to a log file.

Input is:

1.  Name of new log file


Note: if you already have a log file open, then it will prompt you to
see if you want to close it and open a new one.

KEYWORD operation:  If keywording is in use, then if the keyword
"LOGFILE" has been associated with a file name, that file will be the
log file.  If a log file is already open, the request will be ignored.

------------------------------------------------------------------------------
**** KEYWORD

Allows parameters to be set using keywords instead of prompts.  After 
specifying "KEYWORD", you can enter as many keywords with their values
as you want.  End the setting of keywords with "DONE".  Now when you
enter commands, the program will use the values you set with keywords.
If you have not specified a needed keyword, the program will quit the
routine it is in. 

To return to prompts, specify "KEYWORD" and then "PROMPT".

Example:

$ru heavy

! go into keyword mode:
KEYWORD

! set a few parameters:
NLIST 10
INFILE tmp.drg

! go back into command mode:
DONE

! execute a command
VIEW

This sets up the keyword NLIST to 10 and INFILE to tmp.drg, then
runs the command VIEW.


The values of keywords that you set apply to any routine in the program.  They
are remembered and need not be reentered if they have not changed.  This
means you can set keywords, return to prompt mode, and when you go back 
into keyword mode the values will still be there, as modified by any inputs
you made in prompt mode. The only exception to this is that names input files in
MADMRG are not retained.

Keywords are read sequentially and if you retype one, it supercedes the
previous value.

Keywords for each routine are listed in the documentation for that routine.
Some general keywords applicable to most routines are:

DMIN            minimum resolution to consider
DMAX            maximum resolution to consider
SYMFILE         matrices of equivalent positions file
CELL            6 Cell dimensions and angles
INFILE          name of input data file.  If there are more than one,
                  specify INFILE(1), INFILE(2) etc...
OUTFILE         name of output data file

------------------------------------------------------------------------------
**** HELP

This command issues a list of commands that may be entered at the command
level of the program.

------------------------------------------------------------------------------

**** END

This ends the program. Also QUIT, STOP, EXIT do the same. 
------------------------------------------------------------------------------

**** HASSP - Heavy atom search program.  Version 3.0 of August, 1994

   HASSP is a routine for searching for solutions to a difference Patterson
function.  The only essential inputs to the routine are (1) a map file 
containing the patterson function, and (2) the region to be searched for
solutions (usually the asymmetric unit of the real cell).


             SUMMARY OF INPUT PARAMETERS  (SEE SECTIONS VI AND VII)
             READ FROM DEFAULT INPUT.

   LINE            PARAMETERS
   __         ___________________
   1            NAME OF FILE CONTAINING OUTPUT FROM FFT

   2            XS,XE; YS,YE; ZS,ZE  (6F10.0)  (...SEARCH REGION, FRACTIONAL)
   3            NAME OF LOG FILE, IF NOT YET DEFINED. 

   ----------------***    INPUT FILE MAY END HERE OR ANYWHERE AFTER HERE ----*
                          IF DEFAULTS ARE TO BE USED

   5            BLANK LINE OR NAME OF FILE CONTAINING LOCAL SYMMETRY
   6            [DISCRM,ICRMAX] (2I6) defaults = 1.0, 0
                        (DISCRM=MIN. RATIO OF PEAK:SURROUNDINGS TO BE "ISOLATED"
                         ICRMAX=MAXIMUM NUMBER OF PEAKS TO TRY AS CROSS VECTORS

   7            [IHASSPTYPE]  (DEFAULT=-5) (I5)
       |IHASSPTYPE| =  2..SEARCH FOR SINGLE-SITE SOLUTIONS USING HARKER VECTORS
                  3..SYMMETRY SEARCH FOR 2-SITE SOLUTION GIVEN CROSS-VECTOR
                  5 OR 0 ..SINGLE SITE SEARCH, THEN TWO-SITE SEARCH , THEN  6
                  6..SEARCH FOR ADDITIONAL SITES GIVEN 1 OR MORE STARTING
                      SOLUTIONS

          NOTE:   IF IHASSPTYPE < 0 , CROSS-VECTORS OR STARTING SOLUTIONS ARE
                  LINES 8A... (REQUIRED FOR ITYPE = -6).  OTHERWISE PATTERSON
                  PEAKS WILL BE USED AS TRIAL CROSS-VECTORS

   8            [NSIGNF,SPAT,SSIN,SDUB,STRP,SSFT] (I10,5F10.0)
                            ...NSIGNF=0 IF SIGNIFICANCE IS TO BE TESTED
                            ...SPAT...SSFT..MINIMUM PROBABILITIES FOR NON-
                               RANDOMNESS REQUIRED TO KEEP PEAKS IN  ROUTINES
                               "PATPK", "SINGLE","DOUBLE","TRIPLE","SIFT"

   [9A          X1,Y1,Z1,     (3F10.0)
   [9B          X2,Y2,Z2,                            (....LIST OF TRIAL CROSS-
                   .                                    VECTORS OR SITES, ONLY
                   .                                    NEEDED IF ITYPE < 0 )

KEYWORD INPUTS:

LOGFILE                 NAME OF FILE FOR OUTPUT SUMMARY (REQUIRED UNLESS
                          YOU ALREADY HAVE A LOGFILE OPEN)
SYMFILE                 SYMFILE NAME IF NOT PREVIOUSLY DEFINED
FFTFILE                 NAME OF FILE CONTAINING THE PATTERSON FFT
PATTGRID                GRID FOR PATTERSON (IF NOT PREVIOUSLY DEFINED)
SEARCHREGION            REGION TO SEARCH (XS,XE,YS,YE,ZS,ZE) (DEFAULTS=0.0)
IHASSPTYPE              CONTROL OF WHAT IS TO BE DONE (DEFAULT=-5) dha: it's 5
DISCRM                  RATIO OF PEAK HEIGHT OVER SURROUNDINGS TO USE 
                          (DEFAULT=1.0)
ICRMAX                  MAXIMUM # OF PEAKS TO TRY IN 2-SITE SEARCH (DEFAULT=0)
NOSPEC                  CONTROL OVER IGNORING SYMMETRY #'S OF SPECIAL POSITIONS
                          (DEFAULT=0, DO NOT IGNORE)
NSIGNF                  0 IF SIGNIFICANCE OF PEAKS IS TO BE TESTED (DEFAULT=0)
SPAT                    MINIMUM PROBABILITY FOR NON-RANDOMNESS TO KEEP A PEAK
                          IN ROUTINE "PATPK". (DEFAULT=0.0)
SSIN                    AS SPAT, BUT FOR SINGLE-SITE SEARCHES (DEFAULT=0.0)
SDUB                    AS SPAT, FOR TWO-SITE SEARCHES (DEFAULT=0.95)
STRP                    AS SPAT, FOR 3-SITE SEARCHES (DEFAULT=0.0)
SSFT                    AS SPAT, FOR SIFTING THROUGH 3-SITE SOLUTIONS
                         (DEFAULT=0.95) 

TRIALSITE               FRACTIONAL COORDINATES OF A TRIAL SITE OR CROSS VECTOR
                          (USED IF IHASSPTYPE < 0)


                              I.    INTRODUCTION
      This program uses a space-group symmetry minimum method to obtain sets  of
atomic  sites  consistent  with a patterson function.  For description of input
parameters see section VII.  The usual procedure followed in using this program
is:
     (1) (ITYPE=2) Search for  single-atom  solutions  to  patterson  function.
Also  adjust  parameter  (DISCRM)  to  obtain  a  reasonable number of isolated
patterson peaks (see section II).
     (2) (ITYPE=5;  parts 5a, 5b, and 5c are automatically carried out.)
     Generate a list of isolated peaks in patterson one at a  time.   For  each
peak  assume  it  is a true cross-vector between two atomic sites in the struc-
ture.  Given the position of site 1, site 2 is immediately given by site 1 plus
the cross vector.
     (a) A search for the position of site 1 relative to  the  origin  is  then
carried  out  (unless  space group has no symmetry).  At each search grid point
(corresponding to the position of site 1), all positions in unit  cell  equiva-
lent by space group symmetry to site 1 and site 2 are identified.  All self and
cross vectors between these (NEQUIV * 2) sites are calculated, and the  minimum
of  the  patterson function over this set of sites is noted.  The best position
of site 1 relative to the origin is taken to be that which yields  the  maximum
of this minimum function.  (subroutine DOUBLE)
     (b) At this point we have a two-site solution to  the  patterson  function
(plus  space-group  symmetry).  Next we find additional sites which are consis-
tent with these sites (and all other additional sites).  This search is carried
out  over entire supplied grid region.  Each grid point is taken as a trial ad-
ditional site, space-group-related sites are calculated, and the minimum  value
of the patterson function at each of the self and cross vectors due to this new
site and the starting sites is noted.  At the end of this  search,  a  list  of
maxima  in  the  search  are obtained.  Each site corresponding to a maximum is
"consistent" with the starting sites  (but  not  necessarily  to  each  other).
(subroutine TRIPLE).
     (c) Go through list of potential solutions and extract a group  which  has
no negative self or cross vectors.  (subroutine SIFT).  The best solution obta-
ined is printed along with a list of minimum values of the  patterson  function
at the cross vectors between sites and an analysis.

     II.  Searching for isolated peaks in  patterson  function  (always  done).
(Subroutine PATPK).
     A search is carried out over the input  patterson  function  for  isolated
eaks.   The  positions of these peaks are stored (Subroutine SAVEP), sorted on
the basis of symmetry (Subroutine SORT), and listed  along  with  peak  heights
after elimination of origins and redundancies.
     Grid points corresponding to isolated peaks in the patterson function  are
defined as those which have the following properties:
     (1) The value of the patterson function at this grid  point  is  not  less
than  at  any  adjacent  point in the map.  (At edges of the map, use patterson
symmetry to generate value of patterson at neighboring grid point).
     (2) A box less than 9 grid points on an edge, centered at the  grid  point
in question, may be constructed so that all values of the patterson function on
the surface of this box are less than (the peak  value/DISCRM).   This  insures
that  this  peak  is  isolated (and is therefore likely NOT to be several peaks
close together).
        The parameter DISCRM (Default=1.0) is user-determined.

III.  Searching for single-site solutions to patterson function  (ITYPE  =
   2) (Subroutine SINGLE).
A map is calculated over the range supplied (XS-XE;  YS-YE;   ZS-ZE),  ex-
cept that search is not carried out over axis which are not fixed (all three in
space group P1).  The value of the map at each grid point is the minimum of va-
lues of the patterson functon at the (NEQUIV-1) Harker vectors corresponding to
this grid point.  Peaks in this map are stored (Subroutine SAVEP),  sorted  ac-
cording  to  symmetry (Subroutine SORT), and listed after elimination of redun-
dancies.
     For points in general positions, the peak height listed is simply the min-
imum value of the patterson function at the (NEQUIV-1) Harker vector associated
with this point.  For points in special positions, the  listed  height  is  the
minimum of the values of the patterson function at each of the (NEQUIV-1) Hark-
er vectors divided by the number of times a Harker vector associated with  this
point  falls  on  that  position.  For example, in space group P222, an atom at
(x,y,z) yields Harker vectors (0,0,0), (2x,2y,0), (2x,0,2z), and (0,2y,2z).  If
x=0  and  y=0,  though,  (0,0,0)=(2x,2y,0) and (2x,0,2z)=(0,2y,2z) and there is
only one unique Harker vector (excluding the origin), which is repeated  twice.
The value of the peak height listed would be 1/2 the height at (0,0,2z).
     The probability that a given peak of height A in this function is due to a
random combination of peaks is roughly given by:


           P=(1.- (1.-  p(A)**M )**N ) ,  where,


        A= minimum value of (value of patterson function at Harker vectors
                          divided by expected noise at that position).

        p(A) is probability of observing a value of A or higher on a given try.
        M=number of independent Harker vectors examined for this peak
        N=number of independent grid points used in search for peaks.


     The noise in the map is taken to be the RMS value of the  patterson  func-
tion  if  this  is a general position.  If it is a position of higher symmetry,
the noise = sqrt(SIGMA) * the symmetry number of this position (see VI C).
     The number of independent grid points used in the search for  peaks  would
roughly  be  equal  to the number of reflections used to make up the map if re-
flections at all resolutions contributed equally.  A better  estimate  of  this
numbr  is  probably  the number of peaks+valleys in the patterson map.  In this
routine, we actually use 2* the number of peaks.
     IV.  Searching for  two-site  solutions  to  patterson  function  given  a
cross-vector (x,y,z) between the sites (ITYPE=3 or 5, Subroutine DOUBLE).
     A map is calculated over the supplied region (XS-XE;  YS-YE;  ZS-ZE).   At
each grid point (u,v,w) , the value of the map is the minimum of:
     (1) Values of patterson function at Harker vectors due to atom at (u,v,w)
     (2) Values of patterson function at Harker vectors due to atom at
     (u+x,v+y,w+z)
     (3) Values of patterson function at each cross vector between (u,v,w) or a
position equivalent by space group symmetry to (u,v,w) and (u+x,v+y,w+z)
     Positions  of  peaks  in  this   map   (corresponding   to   [u,v,w]   and
[u+x,v+y,w+z]) are stored (Subroutine SAVEP), sorted according to symmetry (Su-
broutine SORT), and listed.  If ITYPE=5, subroutine SIFT is called with the top
peak in this map.
The measures of peak heights and probabilities  of  random  occurence  for
this routine are similar to those for the single-atom search (see section III).

     V.  Searching for additional solutions to patterson function given  a  set
of N starting solutions.  (ITYPE = 5) (Subroutines SIFT and TRIPLE).
     This search is carried out in two parts.  First, positions in the supplied
region which have non-negative self vectors and non-negative cross vectors with
the starting solutions are identified.  These new solutions are not necessarily
consistent with each other, however.  Next, a set of sites which are completely
self-consistent are extracted from these possible solutions.  NOTE:  a  maximum
of  4  new  sites are added to list of soluions in each pass through subroutine
SIFT (if local  symmetry  is  used,  4  sites  unique  by  local  symmetry  and
space-group  symmetry).   If more than 4 additional sites are desired, you must
run the program again with ITYPE = -6, and list the current solution  in  lines
9a....

   (A) (Subroutine TRIPLE) A  map  is  calculated  over  the  range  supplied
(XS-XE;   YS-YE;   ZS-ZE).  At each grid point (u,v,w), the value of the map is
the minimum of:
     (1) Values of the patterson function at Harker  vectors  due  to  atom  at
(u,v,w)
     (2) Values of the patterson function at each cross vector between
     (u,v,w) (or a position equivalent by space-group symmetry) and each one of
the of the "known" solutions.
     Peaks in this map are stored (Subroutine SAVEP), sorted according  special
or general positions, and saved.

     (B) (Subroutine SIFT) Each peak from subroutine TRIPLE is considered as  a
potential additional site, beginning with the highest peak:
     (1) If this new site is equivalent by space group symmetry to a site alre-
ady in list of solutions, forget it.
     (2) If this new site has positive cross-vectors with all  sites  currently
in list of solutions, add this site to the list of solutions.



        VI.  Some technicalities.

     (A) GRID -- The grid used for all searches is exactly the same as the grid
for  the  input  patterson  map, but each time a peak is found, all neighboring
grid points are tested on a grid twice as fine and the highest  of  these  test
values  is  used.  Values of the patterson function between grid points are in-
terpolated.  Do not use a grid coarser than 1/3 the resolution  for  the  input
patterson  map.  Also don't bother to use a grid finer than 1/6 the resolution.
NOTE: the input patterson map must be on a grid such that the symmetry elements
lie on grid points.  That is, if there is a two-fold axis at 1/12 in z, then
the z-axis must be divided into a number of grid points that is a multiple of
12.  The easiest way to be safe is to make sure all unit cell translations
are multiples of 12.

     (B) SIGNIFICANCE TESTS -- Difference patterson functions have a  consider-
able  amount  of noise if acentric reflections are present.  (For each acentric
reflection, the expected error [|Fph-Fp| - |fh|] is roughly equal to |Fph-Fp|).
It  can  be  shown that SIGMA, the RMS noise in the map is roughly equal to the
RMS value of the patterson function.
     Peaks in the patterson map which have a height much less  than  SIGMA  are
therefore  likely  to  be unrelated to atomic sites.  On special positions, the
RMS noise in the map will be  sqrt(NSYM)*SIGMA,  where  NSYM  is  the  symmetry
number of this position (see VI C).
     In order not to include too many peaks due to this noise  in  any  of  the
searches  carried  out,  a  significance test is made for each peak if ISIGNF=0
(default).  A peak is rejected if there is a probability less than SIGNIF  that
no  peak  of  this  height  or higher would occur by chance in this search (see


III.  SIGNIF is selected by the user for each routine (SPAT, SSIN, etc.)
     (C) SYMMETRY NUMBERS OF POSITIONS IN REAL AND PATTERSON CELLS --  (Subrou-
tines SPECR AND SPECP).  For this program, the symmetry number of a position in
a real or patterson cell is the number of ways that a symmetry operator in  the
group  (patterson  or real cell) can map the point onto itself (within a toler-
ance of 2 grid units).  The symmetry number of a general position is 1,  for  a
point on a dyad, it is 2, etc..

     (D) The matrices of equivalent positions for this  space  group  are  read
from a SYMMETRY file (SEE SYMMETRY in the documentation)

     (E) INPUT PATTERSON MAP -- This map is assumed  to  have  been  calculated
with  X-across, Y-down, Z-sections, where X, Y, and Z are those used in the ma-
trices of equivalent positions.  The map must have 1 record for each line of  X
across,  with  a total of NY*NZ records.  The map is unformatted REAL*4 with no
header.  The total number of elements in the map must not exceed 200,000.

     (F) SEARCH REGION -- Except for patterson peak searches, all searches  are
carried out one section at a time.  The grid searched is NX by NY where,
     NX = (XE - XS)* IPCELL(1) + 3
     NY = (YE - YS)* IPCELL(2) + 3

     where XS, XE, YS, YE are input by the user.  For single atom searches  and
origin searches, however, if the region covered by the input patterson function
is smaller than this, the latter region is used NX*NY must not exceed 20000.

G.  NON-CRYSTALLOGRAPHIC (LOCAL) SYMMETRY.  If local symmetry elements exist in
the  crystal  structure  and  are  known, it is very useful to include them, at
least part of the time, in using this program.  If local symmetry is to be  in-
cluded,  you must specify the file to read it from on line 5 of the input file.
The local symmetry file has the format:

   RECORD 1:  NNCR (number of sets of local symmetry element and regions to
             follow)

   NEXT 2*NNCR RECORDS (in sets of 2, all REAL numbers, NOT INTEGERS)
        R11 R21 R31 R12 R22 R32 R13 R23 R33 T1 T2 T3
        XS,XE; YS,YE; ZS,ZE

        where, R11, R21, are rotation matix elements just as for cryst-
        allographic symmetry, T1, T2 are translations, in fractional
        coordinates, NOT divided by 12.
        XS, XE, etc. define the region in which this symmetry
        operation is valid.

    NOTE 1:   Only specify symmetry elements for ONE asymmetric unit of crystal.
     It is not necessary to include the identity.
     The maximum value allowed for NNCR is 24.
     The maximum number of local symmetry operations which may be
        applied to a PARTICULAR point in the unit cell is 12,
        not including the identity.

     NOTE 2:  If local symmetry is specified,  during  single-atom  searches and
two-atom  searches,  only  the  UNIQUE positions will be listed, you must apply
local symmetry yourself in order to generate the complete list  of  sites.   In
searches  for additional sites, however, all sites are listed (except those re-
lated by space-group symmetry, of course.) When sites are input (ITYPE = -6) in
searches  for  additional  sites,  it is not necessary to specify all the local
symmetry-related sites;  these will be generated automatically.  If you wish to
generate local symmetry-related sites from a set of unique sites, you may input
them with ITYPE = -6 and a search region of (0-0,0-0,0-0), the  entire  set  of
sites will then be listed for you.

     H.  Obtaining an analysis of a solution that you input yourself.  In order
to  generate  a  list  of  local  symmetry-related points and minimum self- and
cross-vectors corresponding to a set of unique sites you specify, use  ITYPE  =
-6 with a zero search region (blank line).

     I.  Maximum number of sites in a solution set.  In the  absence  of  local
symmetry, this program will yield up to a 6-site solution in one pass.  If more
sites are to be located, run program again using ITYPE  =  -6  ;   specify  the
known  sites  on  lines 9a-9f.  Up to 4 additional sites may be located on each
pass.  If local symmetry is present, NSYM * 6 sites may be found on  the  first
pass  and NSYM * 4 on each additional pass.  A maximum of 40 sites may be found
in any one solution.

     J.  Use of ITYPE=-6 to search for causes of questionable results.  If
ITYPE=-6 is specified along with a small (or zero) search region, an analysis
of the sites you input on lines 9a-... will be printed. This analysis
includes the minimum values of the self- and cross-vectors for this set
of sites.  If you know that the Patterson function is large at all these
vectors for a given group of sites, then this procedure will help you
determine if there is anything unusual about your map.

        VII.  DESCRIPTION OF INPUT PARAMETERS

     A.  IPCELL(1),IPCELL(2),IPCELL(3) are cell dimensions in  grid  units  for
input patterson map.  The map runs from 0 - NPATXE across, 0 - NPATYE down, and
0 - NPATZE in sections.  The map must be at least the asymmetric  unit  of  the
patterson function (1/2 the asymmetric unit of the real cell), but may be more.
If it is more than asymmetric unit, the estimates of  number  of  peaks+valleys
will be too high by the ratio of volume included to the actual asymmetric unit.
This will throw probability calculations off slightly (a factor  of  two  makes
little difference, however).
        NPATXE, NPATYE, and NPATZE must be positive or zero.

        B.  DISCRM is described above (section II)

     C.  XS, XE;  YS, YE;  ZS, ZE are boundaries of the search region in  frac-
tional  coordinates  in  the X, Y, and Z directions (defined by input patterson
map, see VI D).  Note that in many space groups it is not necessary  to  search
over  the  entire  asymmetric  unit  for single-site solutions.  Similarly, for
two-site origin searches, one-half of asymmetric  unit  is  always  sufficient.
For  searches  for  additional sites, the entire asymmetric unit must be speci-
fied.  This program does not do all this automatically at this time.

     D.  ICRMAX (default=1) is the maximum number of tries to make if ITYPE = 3
or  5  The top ICRMAX isolated patterson peaks in general positions are used as
trial cross vectors, one at a time.  If there are  less  than  ICRMAX  isolated
peaks  in  general  positions,  those  in special positions are used also for a
total of ICRMAX.

    E.  NOSPEC (default=0) is 1 if the difference between general and  special
positions is to be ignored in sorting peaks.

    F.  ITYPE controls the method used.
        1. If ITYPE is greater than or equal to 0, isolated patterson peaks
                will be identified. The top ICRMAX of them will be used
                   in 2-site searches if ITYPE = 3 or 5.

        2. If ITYPE is less than 0, lines 8a,8b,... will be read and
                used as:(a) trial cross-vectors, each considered separately
                                 (ITYPE = -3 or ITYPE = -5),or
                        (b) a starting solution, all sites read are part of
                                 the same solution (ITYPE = -6).

        3. If |ITYPE| is:

                2 ...Search for single-site solutions using Harker vectors
                        (section III).

                3 ...Search for two-site solutions related by a cross-vector
                        (an isolated patterson peak if ITYPE = 3, an input
                        cross-vector (line 8a) if ITYPE = -3 ).

           0 or 5 ...First do   3, then for each two-site solution, do  6.

                6 ...Search for additional sites given 1 or more starting
                        solutions.  ITYPE =6 is not allowed. IF ITYPE = -6
                        a list of starting sites is read on lines 8a,8b,...


     G.  NSIGNF (default =0) is 1 if signficance tests are not  to  be  carried
out.   Use  NSIGNF  for  difference Patterson functions, NSIGNF=1 for Patterson
functions with no noise.

     H.  SPAT, SSIN, SDUB, STRP, SSFT (default= 0., 0.0, 0.95, 0.0,  0.95)  are
minimum  values  of the probability that (no random peaks will be found greater
than this peak) in order to keep peaks.
        SPAT is not used at present.
        SSIN is for single-atom searches.
        SDUB is for two-atom origin searches.
        STRP is for calculation of map in searches for new sites
        given starting sites (see section V A).
        SSFT is for selecting sites which are consitent with each other
        in searches for new sites given starting sites (section V B).
        J.  X1, Y1, Z1, etc.  (lines 8a, 8b, etc) are fractional  coordinates
        for cross-vectors or starting solutions.  These are read if ITYPE < 0.


EXAMPLE OF USE OF PROGRAM HASSP:
SPACE GROUP C2.

! set keywords:
KEYWORD
SYMFILE C2.SYM
LOGFILE c2.log
FFTFILE c2.patt
SEARCHREGION    0. 0.5 0 0.5 0 0.5
IHASSPTYPE 5
ICRMAX  10
DONE

! go:
HASSP

RESULTS OF HASSP (ANNOTATED)

"HASSP -- PATTERSON SEARCH AND SUPERPOSITION PROGRAM.    13:40:48     12-MAY-94

INPUT PARAMETERS:
...[ LIST OF INPUT PARAMETERS AND DISCUSSION OF SPACE GROUP SYMMETRY
     IS PRODUCED BY PROGRAM HERE]...


INPUT PATTERSON MAP HAS    36465 ELEMENTS  AND AN RMS VALUE OF     0.21687E+06,
SCALED TO 1000.0
NUMBER OF DEGREES OF FREEDOM IN PATTERSON MAP WHICH IS ABOUT
EQUAL TO THE NUMBER OF PEAKS+VALLEYS IN MAP:        462

...[THIS NUMBER OF DEGREES OF FREEDOM IS USED IN ESTIMATING PROBABILITIES OF 
FINDING
    SOLUTIONS BY CHANCE]....


LIST OF ISOLATED PATTERSON PEAKS: GENERAL POSITIONS.

  PEAK         X         Y         Z         HEIGHT

    1         0.156     0.344     0.125     10135.
    2         0.125     0.344     0.313     10082.
    3         0.070     0.219     0.172     2869.9
    4         0.414     0.406     0.031     2356.5

....ETC...[ THESE PEAKS WILL BE USED AS POSSIBLE CROSS-VECTORS.  
        THOSE ON SPECIAL POSITIONS
  (BELOW) WILL ONLY BE TRIED AFTER THOSE ON THIS LIST ARE EXHAUSTED.]

LIST OF ISOLATED PATTERSON PEAKS: SPECIAL POSITIONS.

  PEAK         X         Y         Z         HEIGHT    SYMM #

    1         0.469     0.500     0.188     9822.5        2
    2         0.281     0.000     0.438     9478.0        2
  
FOR EACH TRIAL SOLUTION IN SEARCHES, THE "HEIGHT" IS THE MIMINUM VALUE, OVERall 
PREDICTED PATTERSON VECTORS, OF: (VALUE OF THE PATT FN)/(THE # OF PREDICTED 
VECTORS WHICH FALL ON THIS POINT)

 THE SYMMETRY NUMBER IS THE SYMMETRY OF THIS POSITION IN THE PATT FN.


LIST OF MAJOR PEAKS IN SINGLE-ATOM SEARCH:GENERAL POSITIONS

  PEAK         X         Y         Z        HEIGHT     PROB THAT THIS IS chaNCE

    1         0.484     0.000     0.094     9822.5              0.000
    2         0.141     0.000     0.219     9478.0              0.000
    3         0.055     0.000     0.094     3322.7              0.669



LIST OF MAJOR PEAKS IN SINGLE-ATOM SEARCH:SPECIAL POSITIONS

  PEAK         X         Y         Z         HEIGHT    SYMM #   PROB THAT THIS I
S BY CHANCE

   14         0.000     0.000     0.000     25739.         2          1.000


TWO-ATOM ORIGIN SEARCH.  TRY # 1 . CROSS-VECTOR BETWEEN SITES = (0.156,0.344,0.1
25)

MAXIMUM AGREEMENT FOUND IN SEARCH WITH THE TWO SITES:
(0.484,0.000,0.094) AND  (0.641,0.344,0.219).
PROBABILITY THAT THIS PAIR OF SITES CORRESPONDS TO CHANCE PEAKS =0.000

                             --------*ANALYSIS OF THIS SOLUTION:--------**

SITE   X       Y      Z       PROB     HAS SAME                 CORRESPONDS    
                                                             HARKER VECTORS     
                                                                   AS SITE:   
1   0.484   0.000   0.094     0.000     0     1        (X+.000 ,Y+.000, Z+.000)
2   0.641   0.344   0.219     0.000     0       2        (X+.500 ,Y+.344, Z+.

SOLUTIONS LISTED BELOW SHARE HARKER VECTORS WITH
AT LEAST ONE SITE ABOVE YET ARE COMPLETELY CONSISTENT
WITH ABOVE SITES


3   0.984   0.125   0.078     0.000     1       1        (X+.500 ,Y+.125, Z+.

LIST OF CROSS-VECTORS IN PATTERSON FUNCTION ; FIRST TEN ONLY :

       1           2           3

 1   9822.5     10082.2      2097.4
 2  10082.2      9478.0      2151.0
 3   2097.4      2151.0      5852.6

...[ ETC FOR OTHER TEST CROSS-PEAKS]...

NOTE:  SEE "USING HASSP AND HEAVY" FOR MORE DISCUSSION OF THIS OUTPUT.
-------------------------------------------------------------------------------

**** HEAVY -- Heavy atom refinement  Version 3.0
 

HEAVY is a general-purpose heavy atom refinement routine.  It can be used to
carry out either phase refinement or origin-removed Patterson refinement, as
well as to calculate coefficients for native Fourier and difference Fourier
maps.

   Major changes from versions 1.0 - 1.5:

        1.  Sigmas of anomalous differences read in explicitly
        2.  Program is now compatible with MAD data after conversion to
                MIR format with MADMRG.
        3.  All inputs now can be entered by KEYWORDing
        4.  Refinement and phasing can be carried out using isomorphous
                differences, anomalous differences, or both.

   The input parameters will be read from standard input file

KEYWORDING inputs .  These are most conveniently entered using a SCRIPT file
NOTE: any values previously defined do not need to be specified.  If you
run HEAVY a second time without quitting the main program and do not specify
any new parameters, the routine will start where it left off and carry out
another set of refinements of the same type that you specified the last time
you ran it.

NOTE also: average residuals are maintained throughout.  This means that if
you want to refine a completely new set of data, you must start the program 
over.

KEYWORD         Values

LOGFILE         REQUIRED name of output file for summary of results.  
                If a logfile has already been defined, this is ignored.
CELL            cell parameters 
DMIN            minimum d-spacing to consider
DMAX            maximum d-spacing to consider
NEWFILE         file with updated inputs to be created
INFILE          input data file
KOUT            type out output, if any. DEFAULT = 0 (no binary output)

                KOUT  TYPE OF OUTPUT                    # of columns of data
                __  _____________________             ______________________

        0.... NONE
        2.... DIFFERENCE FOURIER FOR KDER                       2
           A=m(Fder-Fnat)cos(PhiBest)
           B=m(Fder-fnat)sin(PhiBest)
        3.... ANOM DIFF FOURIER FOR KDER                        2
           A=m(DelAno)cos(PhiBest-90)
           B=m(DelAno)sin(PhiBest-90)
        4.... RESIDUAL MAP FOR KDER                             2
           A=m(Fder-|Fnat+FH|)cos(PhiBest)
           B=m(fder-|Fnat+FH|)sin(PhiBest)
            (where Fnat+FH is the vector sum of Fnat 
              and the heavy atom FH)
        6.... NATIVE FOURIER                                    2
           A=m(Fnat)cos(PhiBest)
           B=m(Fnat)sin(PhiBest)
        7.... PHASES AND FIGURE OF MERIT                        3        
              PhiBest (in degrees), PhiMostProbable, 
                and figure of merit
        8.... HENDRICKSON-LATTMAN COEFFS                        4
        9.... HEAVY ATOM S. FACTORS FOR KDER                    4
           A, B= real and imaginary parts of normal 
              scattering from heavy atom.
           C, D= real and imaginary parts of anomalous 
              scattering from heavy atom.

          Here, m=the figure of merit, PhiBest is the "Best" phase, 
          PhiMostProbable is the the most probable phase.

KDER            derivative to include in output

OUTFILE         output file name, required if KOUT is not 0
FILETITLE       optional title of this run

IANGLE          phasing angle, minimum=5, default=5
INANAL          PHASE ANALYSIS.  DEFAULT=0
                  1 for printing of extensive heavy atom statistics 
INRESD          RESIDUAL AND STATISTICS.  DEFAULT = 0
                -1  No residuals or statistics calculated.
                0   zeroth cycle added before first refinement
                    cycle. During zeroth cycle residuals and
                    statistics are calculated and printed.
                    No statistics are calculated on other cycles.
                1   Residuals and statistics calculated every
                    cycle and printed according to INPRNT.
                    Note: residuals are only calculated for
                    derivatives with INPHAS = 1.
INOSIG          USE OF SIGMAS.  DEFAULT = 0 (use sigmas).
                 1  if sigmas from input data file are not to be used.
INHEND          USE OF HENDRICKSON-LATTMAN COEFFICIENTS.  DEFAULT=0 (don't use)
                 1  if Hendrickson-Lattman coefficients are to be calculated,
                    useful for outputting phase probability distribution in
                    this form (KOUT=8).  HEAVY does not do phase combination.
INPRNT          PRINTING OF SHIFTS. DEFAULT =0 (don't print)
                 1  if shifts (and statistics, if any) are
                    to be printed on every cycle.  Default
                    is to print statistics on first cycle,
                    shifts and statistics on last.
JALT            USE OF PHASES IN REFINEMENT.  DEFAULT = 0 (Patterson refinement)
                   0 is to use origin-removed Patterson refinement.
                   1 is to use phase refinement at most probable phase
                   JALT and KALT are set automatically if you use a procedure
                   (IHEAVYPROC > 0)             
KALT            USE OF DERIVATIVE BEING REFINED IN PHASING. DEFAULT=0(don't use)
                   0 is not to use derivative being refined in phases
                   1 is to use all available derivatives in phasing

NCYCLE          Number of cycles of refinement to be carried out if a 
                PROCEDURE is NOT used (see IHEAVYPROC).  Maximum = 30
                Default = 0
IREFCY          List of derivative numbers to be refined during the NCYCLE
                cycles of refinement if a procedure is NOT used. Default = 0
                  i.e., 1,1,1,1,0 means refine deriv #1 on cycles 1-4 and
                  calculate phases, get residuals, figure of merit, etc 
                  on cycle #5.  Note that you don't get these statistics on
                  cycles in which you refine with Patterson refinement.

IHEAVYPROC      RUN a procedure with HEAVY.  Default =0 (no procedure)
                Available procedures:
                1 = NREP cycles of refinement of each deriv that has INPHASE
                  specified, refining only occupancy.
                2 = as 1, but refining only xyz.  Fixes coordinates of best
                  atom in each deriv in polar space group in polar directions
                  unless another atom is already fixed by user.
                3 = as 2, but refining xyz and occ
                4 = as 3, but refining xyz, occ, B
                5 = 1, then 2, then 3, then 4
                6 = phased refinement to obtain relative coordinates among
                  derivatives for polar directions in polar space groups.
                  Fix and phase with best derivative.  Refine just coordinates
                  in polar directions for all other derivatives with INPHASE
                  specified.  This should be followed by #5 again.  

NREP            # of refinements of each deriv in procedures with
                 IHEAVYPROC > 0

SMALL           minimum ratio of derivative structure factor amplitude
                 (F Deriv) to RMS lack-of-closure for use in 
                 refinement or residuals. DEFAULT=0.
FMIN            minimum native F for any action. Default=0.
FOMMIN          minimum figure of merit for use in phased
                   portion of refinement. Default =0.
BMIN            minimum isotropic atomic B allowed. Default =0.

THR             Keywords to set threshold and dampling factors for shifts:
ACL             if SHIFT > THR *sigma of SHIFT, SHIFT=SHIFT*ACL
                         Defaults are 0. and 0.5

FSIGMIN         MINIMUM ratio of F/sig to include. DEFAULT =1.0

NNATF           column number in input file for native F    
NNATS           column number for sigma of native F

NBST            optional column number for "best" phase in input file  
NMP             optional column number for most probable phase in input file
NFIGM           optional column number for figure of merit in input file

INOLD           Flag for using phases from input file in phasing when they
                  are not available from current data. default = 0.  To use
                  input phases, inold=1

ANATSCALE       Overall scale factor applied to ALL data before any 
                  other scaling.  DEFAULT = 1.0
SIGNATSCALE     Scale factor applied to native sigmas after overall scaling
                  DEFAULT = 1.0

NEWATOMTYPE     XXXX, where XXXX is the name of the new atom type.  The name
                  of an atom can have up to 4 characters.  This
                  keyword allows you to enter the scattering factor information
                  for an atom type that is not supplied with the program.  This
                  should be followed by the next 5 keywords (AVAL, BVAL,
                  CVAL,FPRIMV,FPRPRV)

AVAL            4 real numbers from the International Tables corresponding to
                  the "A1", "A2", "A3", and "A4"
                  values for scattering factors for a new atom type
BVAL            4 real numbers for "B" values
CVAL            1 real number for "C" value
FPRIMV          1 real number for f' value for the new atom type
FPRPRV          1 real number for f" value for the new atom type

DERIVATIVE      1 (..2,3,4...derivative #)
                  Keyword indicating start of a new derivative.  When you
                  enter the command "KEYWORD", then as soon as the keyword 
                  "DERIVATIVE" appears the first time, all heavy atom 
                  information is zeroed, and the program assumes you are typing
                  in information on derivative #1.  The next time "DERIVATIVE"
                  appears, it assumes you are typing in information on
                  derivative #2, and so forth.  As long as you do not type
                  "DERIVATIVE" again, all heavy atom information will be
                  maintained and updated as the refinements progress, even if
                  you use other routines in the package, such as HASSP or MAPS.

GOTODERIV       Keyword that allows you to alter parameters for a particular
                  derivative already entered.  See notes below.
GOTOATOM        Keyword that allows you to alter parameters for a particular
                  atom already entered.  See notes below

LABEL           Title for this derivative
NCOLFBAR        Column number for derivative F
NCOLSFBAR       Column number for sigma of derivative F
NCOLDELF        Column number (optional) for anomalous difference
NCOLSDELF       Column numbaer for sigma of anomalous difference

INPHASE         Keyword indicating this derivative is to be used in phasing
                  DEFAULT = not to include in phasing   
INANO           Keyword indicating the anomalous differences are to be used
                 for this derivative. DEFAULT = not to include anom diffs
ISOONLY         Use only isomorphous differences for phasing and refinement
                 (overrules INANO)
ANOONLY         Use only anomalous differences for phasing and refinement,if
                 available (you need to specify INANO also). 

DERSCALE        Dividing scale factor applied to all this derivative data 
                  after overall scale factor has been applied.  DEFAULT=1.0     
NOREFINESCALE   Do not refine overall scale factor. Default = refined
DERTEMP         Dividing B-factor to apply to deriv data.  DEFAULT =0.
REFINETEMP      Refine B-factor applied to deriv data. DEFAULT= not refined
SIGDERSCALE     Scale factor to apply to derivative sigmas after all above
                  scaling is applied. DEFAULT = 1.0

ATOMNAME        XXXX, where XXXX is the atom type of an atom to be refined.  
                This name must appear in the
                DATA statements at the start of routine "HEAVY" or you can
                enter them using the NEWATOMTYPE keyword.  Atoms 
                  supplied with the package are:
                 "I-  ", "IR+3", "PT+2", "AU+1", "HG  ","HG+2", and "U   "

                When you type "ATOMNAME" it assumes you are typing in a new
                 atom and it zeroes out all the parameters for this new atom.
                 If you want to go back to this atom later (i.e., in another
                 cycle) use the keywords GOTODERIV and GOTOATOM to identify
                 this atom.

                When you have multiple sites for a particular derivative, 
                  use ATOMNAME XXXX for the first, then input all the data on
                  that site, then start the next site with ATOMNAME YYYY, and
                  do forth.
 

OCCUPANCY       Fractional occupancy of this atom
BVALUE          Temperature factor for this atom.  Anisotropic temperature
                  factors are no longer supported.
XYZ             Fractional coordinates of this atom

                Control of refinement of this atom.  These are cumulative, 
                  so you can refine x and y using REFINEX and REFINEY.
REFINENONE      Don't refine anything...This is used to reset all the
                  refinement flags to zero if you have previously set them
                  up and want to change them.
REFINEALL       Refine x,y,z,occupancy, and B
REFINEOCCB      Refine occupancy and B
REFINEXYZ       Refine x,y,z
REFINEX         Refine x
REFINEY         Refine y
REFINEZ         Refine z
REFINEOCC       Refine occupancy        
REFINEB         Refine B

EIS             Optional list of estimated rms isomorphous lack-of-closure
                  residuals in 8 resolution ranges
EAD             Optional list of estimated rms anomalous lack-of-closure
                  residuals in 8 resolution ranges
FPHBAR          Optional list of estimated rms derivative F in 8 
                  resolution ranges
FHBAR           Optional list of estimated rms heavy atom F in 8 
                  resolution ranges
SIGBAR          Optional list of estimated rms derivative sigma in 8 
                  resolution ranges

Notes:  

0.  If you want to change which parameters for which atoms are refined after
you have already set up the atoms and refinement parameters, then you have
to use a special way to reset them.  The reason you have to do something
special is that if you say "DERIVATIVE" then the routine assumes you are
inputting data for a new derivative, so you can't go back to a previous one
with that command. Instead, you type:

GOTODERIV 2    ( to go to derivative #2)
GOTOATOM 3     (to atom #3 in deriv #2)
REFINENONE     (set all refinement flags back to zero)
REFINEXYZB     (or whatever you want to refine for this atom)

GOTOATOM 1    ( now do atom 1 in deriv 2)
GOTODERIV 1    (now do derivative 1)

Please note: you must reset all the refinement flags back to zero with
REFINENONE if you want to turn any refinement flags off.  Then turn the
ones you want back on.


1. The refinement against an origin-removed Patterson map is a way of
refining heavy atom parameters of each derivative independently, and is
particularly useful because the occupancies of heavy atom sites are
quite accurately estimated.  See the tutorial on USING HASSP and HEAVY at
the end of this document for hints on using this refinement method.
When using this package, the recommended refinement method is this one,
with JALT=0 and KALT=0.

This refinement minimizes the sum over all reflections of,

        R =  WGT * DEL**2

with respect to heavy atom parameters. WGT is a weighting factor, and
DEL is defined as:

DEL = (Fph-Fnat)**2 - K*FH**2 - < (Fph-Fnat)**2 - K*FH**2 >

where the average <> is taken in a shell of resolution and FH is the magnitude
of the calculated heavy atom structure factor. 

K is 1 for centric reflections, 1/2 for acentric reflections.


 2..HINTS FOR INTERPRETING STATISTICAL OUTPUT FROM HEAVY:

    A.  Many of the values listed at the end of a set of refinements are
more-or-less self explanatory. This should include the number of reflections
read, within resolution limits, and greater than the minimum figure of merit.
As these statistics are usually printed for a cycle in which refinement is
not carried out, the number of reflections used to refine is usually zero
in this listing.

   Other values listed at the end of a set of refinements include:

RMS HEAVY ATOM F:   The rms value of the calculated heavy atom F 
        in the resolution range
RMS PHASE AVG'D RESIDUAL:  This is the rms value of the difference between
        calculated and observed derivative F, where it is averaged not only
        over all reflections, but over all phases for each reflection, weighted
        by the phase probability
RMS(FH)/RMS(E): This is the ratio of the rms heavy atom F to the rms
        phase averaged residual
CENTRIC R FACTOR:  This is <| |Fder-Fnat| - |FH| | >/< |Fder-Fnat| >

RMS DERIVATIVE F:  This is the rms value of Fder
RMS SIGMA OF FPH:  This is the rms sigma of Fder
RMS SIGMA OF FP:   This is the rms sigma of Fnat

RMS OBSERVED DIFFERENCE:  For anomalous differences, this is the rms value
        of DelAno= (F+  -  F-)
RMS CALCULATED DIFFERENCE: This is the rms calculated anomalous difference

MEAN RATIO OF ISO TO ANO:  This is the ratio of calculated |FH| due to
        normal scattering relative to that due to anomalous scattering. If
        all anomalous scatterers are identical, this is equal to (f+f')/f"
        for that anomalous scatterer.

RMS(RES HA SF+LACK OF ISO SF):  This is an estimate of the total errors in
        the heavy atom model plus lack of isomorphism that remain.  It is
        obtained from the rms phase averaged residual and the rms native and
        derivative sigmas.
RMS LACK OF ISOMORPHISM SF:  This is an estimate of the remaining lack of
        isomorphism.  It is based on a comparison of the anomalous and
        isomorphous differences that remain
RMS RESIDUAL HEAVY ATOM SF:  This is an estimate of the remaining heavy atom
        structure factor, based on the anomalous differences and the errors
        in measurement.

CENTRIC LOC:  This is an estimate of the "centric" lack-of-closure residual,
        obtained using both centric and acentric reflections and correcting
        acentric lack-of-closure residuals by a factor of 2.  These residuals
        are all corrected for errors in measurement, so that if the derivative
        is "solved" and there is little lack of isomorphism, these values
        should all be near zero.

ANOMALOUS LOC: This is the lack-of-closure error for anomalous differences,
        corrected for errors in measurement.

  The lack-of-closure residuals, miscellaneous statistics, and  structure 
factor tables are all calculated in shells of resolution defined by 
DMIN and DMAX.   These shells are designed to contain equal numbers of 
centric reflections and are thus equally spaced in 1/D**2.

The shells are determined by the relations:

        THMIN = 1./(2.*DMAX)**2
        THMAX = 1./(2.*DMIN)**2


For a reflection with SSQOLA = (sin theta/lambda)**2, the shell is:

        NORDER = 1 + INT (8 * (SSQOLA - THMIN)/(THMAX-THMIN) )



3.   Description of normal refinement/phasing cycles.


   A.   Refinement vs. origin-removed Patterson map.

   Input parameters: all defaults used
                     NCYCLE = 1 to 30
                     IREFCY(I) = 1,1,1,2,2,2.....6,6,6,0

   results:

   Zeroth cycle:  phases calculated for all derivatives identified with 
   INPHASE using input lack-of-closure residuals.  New lack-of-closure residuals
   are calculated for these derivatives.  Statistics are printed.

   Cycles 1 through NCYCLE-1: in this example, IREFCY(I) is zero on
   last cycle, but non-zero for all other cycles.  For each cycle when
   IREFCY(I) is non-zero: no phases are calculated
                          no new residuals are calculated
                          derivative IREFCY(I) is refined as described above


   Note that only 1 derivative is refined at a time and all are independent.
   Therefore in polar space groups, the coordinate(s) of at least one atom
   in each derivative must be fixed.  In space group P1 parameters for a
   single heavy atom may not be refined at all.  If two atoms are present,
   the occupancy, xyz, B of one of them only may be refined.

   Cycle NCYCLE:  IREFCY(NCYCLE)=0 in this example, so this cycle is
   like the zeroth cycle: phases are calculated, new residuals calculated.
   If KOUT is non-zero, output data are calculated as well.


   B.  Refinement by minimization of lack-of-closure at most probable phase.

   Input parameters: all default except JALT=1, KALT=0.

   Results:  identical to the above example except:

   (1) phases will be calculated every cycle
   (2) derivatives will be refined by minimization of (Fph-Fc)**2


   This is not the recommended manner of using HEAVY in this package.  In 
   most circumstances origin-removed Patterson refinement is much more 
   accurate.  There are some instances in which phase refinement may be
   useful, however.  One is when it is necessary to correlate the origins
   in different derivatives.  In space group C2, for example, the y-coordinate
   is indeterminate.  That means that if you have two derivatives and refine
   them independently, you will not have refined the relative y-coordinates
   of the atoms in the two derivatives (though you will have refined the
   relative y-coordinates of atoms within each derivative).  You might wish
   to use phase refinement to carry this out, using one derivative to phase
   and refining y-coordinates in the other derivative.  In practice, however,
   these relative y-coordinates can be obtained even more accurately by
   simply calculating a difference Fourier for one derivative, phasing with
   the other derivative.  The centroid of the peak corresponding to the
   heavy atom site (which can be found, for example, by PEAKSEARCH in this
   package) will give you the relative y-coordinate you need with very
   good accuracy, and refinement of this coordinate is unnecessary.

   Note that still only 1 derivative may be refined at one time.
   (If you really want to phase only once per refinement of all derivatives,
   calculate phases during one run and write them out with KOUT=7.  Then
   merge file containing phases with input DORGBN file (3 extra columns).
   Then run HEAVY with INPHAS=0 for each derivative and specifying INOLD=1.
   Also set INRESD=-1.  The program will then use the input phases during
   phase refinement if JALT=1.  Its probably faster
   to just phase each time.)


   C. Just calculating phases and a map or other output.

   Input: all default, except NCYCLE >0

   If the input lack-of-closure residuals are ok., set INRESD = -1 so
   that new residuals will not be calculated and a zeroth cycle will not
   be included.  Otherwise leave INRESD = 0.

   Specify the type of map with KOUT, the derivative (if applicable) with KDER.


   D.  Carrying out a procedure with IHEAVYPROC.  Heavy has the capability
   of carrying out an ordered sequence of refinements.  These are useful if
   you want to carry out refinement in a semi-automatic fashion.  When
   you specify a procedure with iheavyproc, you need to specify all the
   parameters that you want refined at all.  Then the procedure you choose
   decides which parameters to refine on which cycles.  Usually you will 
   specify REFINEALL for all atoms, then let the procedures decide which
   to refine.  If you use a procedure, the program will automatically fix
   all coordinates that cannot possibly be refined.  For example, in 
   space group C2 one atom in each derivative must have y fixed if
   origin-removed Patterson refinement is used, because the y-direction
   is polar.  The program will fix the coordinate(s) of the atom that 
   is the strongest in each derivative.  If you have already fixed the
   coordinate(s) of an atom in a derivative (by not specifying that they
   be refined) then the program will just fix the atom you chose and not
   fix any others. 

   Note that you can carry out any series of refinements that you want
   by setting up all your keywords for the first type of refinement, 
   initiating refinement with the command HEAVY, then going back to
   KEYWORD mode, specifying the next type of refinement without changing
   or setting any other parameters unless you want to, then initiating
   the next refinement cycles  with HEAVY, and so on.  For example, you
   might type in all your heavy atom parameters, finishing with

...
NREP 5
IHEAVYPROC 2
DONE
!  now refine 5 cycles with iheavyproc=2
HEAVY

KEYWORD
NREP 7
IHEAVYPROC 4
DONE
! now refine 7 cycles with iheavyproc=4
HEAVY

   This sequence of commands results in 5 cycles of refinement of xyz of 
   all atoms that you specified refinement of xyz, then 7 cycles of
   refinement of xyz,occ, and B of all atoms that you specified these
   parameters to be refined in.  You can do this sort of thing 
   in any order and ad infinitum if you wish.

   Note that there is no procedure to refine just thermal factors.  With this
   package there is no need to alternately refine occupancies and thermal
   factors.  If there is insufficient data (i.e., very low resolution) to
   refine both occupancies and thermal factors, then set the thermal factors
   to any reasonable value and just refine the occupancies. 

------------------------------------------------------------------------------------


**** SYMMETRY     Symmetry files

    Space group symmetry is read in from a formatted file.  The first 
line is the number of symmetry equivalents (NSYM).  The next NSYM lines
are the symmetry equivalents in this space group, as illustrated below:

Space group P6122:

12              ! number of symmetry equivalents to follow
x,y,z
-y,x-y,z+1/3
-x+y,-x,z+2/3
-x,-y,z+1/2
y,-x+y,z+5/6
x-y,x,z+1/6
y,x,-z+1/3
-x,-x+y,-z+2/3
x-y,-y,-z
-y,-x,-z+5/6
x,x-y,-z+1/6
-x+y,y,-z+1/2


Note:  You can also use a MEP (matrices of equivalent positions) file in the
way that was used in previous versions of this program if you wish. 
 -------------------------------------------------------------------------------
**** DORGBN files

The files used in this package are binary files with data sorted by hkl.

. Format of the binary (FORTRAN unformatted) data file:
   1.   INTEGER*4   NCOL  -
        the number of columns of data in the file.
   2.  (LOGICAL*1) TITLE(80) - A  title  to  describe  the
        contents of the file.
   3.  NCOL more titles, one for each column of data.
   4.  Data records - IH,IK,IL,RES,(F(I),I=1,NCOL)
         1.  IH,IK,IL - INTEGER*4  The indices of the reflection.
         2.  RES - The d-spacing in Angstroms.
         3.  F(I) - Data.  These can be  structure  factors, sigmas, phase 
information stored as phase, figure-of-merit, etc.  When data are  missing  for 
one or more columns the value -1.0 is stored.

-------------------------------------------------------------------------------


****  USING HASSP and HEAVY in heavy atom searches and refinement


What is the goal in heavy atom searches and heavy atom refinement?

In the MIR method, one generally has available a single dataset corresponding 
to a "native" macromolecule, and one or more datasets corresponding to (more-
or-less) isomorphous "derivatives" containing a limited number of "heavy atoms" 
bound at specific locations.
The goal of the heavy atom searches and refinement in the MIR method is to 
obtain a list of sites of heavy atoms in each derivative of interest, along 
with measures of occupancy and disorder of these sites.  Of particular 
importance, of course, is that this list of sites contain only correct sites, 
as incorrect sites will bias both the phasing and the measures of quality of 
the phasing.

There are several steps in obtaining and refining heavy atom positions.  These 
include (with an example of a program or package that carries out each step):

        1.  calculation of a difference Patterson function  for each derivative 
                (FFT)
        2.  search for single or multiple site solutions to Pattersons (HASSP)
        3.  refining heavy atom parameters (HEAVY)
        4.  cross-Fourier or self-Fourier analysis to evaluate and identify 
                sites (FFT)


How will we know if a set of heavy atom sites is correct?

There are several tests that can help show whether a heavy atom site is 
correct.  Here are some that should usually be tried:

1.  The difference Patterson shows the self (Harker) peaks predicted for this 
site.
2.  If there is more than one site in the derivative, the difference Patterson 
shows the predicted cross-vectors.
3.  Refinement of occupancy and thermal factors yields an occupancy that is in 
the range of 0.2 to 1.0 (if the data is on an absolute scale), and a B in the 
range of about 10 to 100 (if the data has been measured to a resolution better 
than about 4 A).
4a.  A difference Fourier for the derivative of interest, phased only with 
other derivatives and removing any sites in common between the derivative of 
interest and the other derivatives, shows this site as one of the higher peaks 
in the Fourier.
-or-
4b.  If SIR data is available only, a difference Fourier phased only with the 
other sites in the derivative, shows this as one of the higher peaks in the 
Fourier.


What this tutorial will do.

In this tutorial, a simple example using model data for one derivative with 
anomalous information will be used to demonstrate the use of HASSP and HEAVY in 
heavy atom determination and refinement.  The data used here will actually be 
based on model MAD data that has been converted to MIR format using MADMRG (1), 
but the treatment is identical to that for any other SIR+anomalous data.

In the model data to be considered here, the space group is C2, with cell 
parameters of a=76.1, b=28.0, c=42.4, and b=103.1 degrees.  
There are two "true" heavy atom sites.  They are at the fractional 
coordinates and have the occupancies and thermal factors listed below:

x       y         z       occupancy     B

0.141   0.344   0.219           1.0     20.0
0.484   0.500   0.093           1.0     20.0

This model data contains 1763 data from 10 to 3 A, and includes "random" 
errors added to the model structure factor amplitudes.  In the model, 
the "derivative" is perfectly isomorphous with the native.


**** I.  Searching for heavy atom positions in a derivative: HASSP

In the MIR method, it is usually only necessary to "solve" one derivative by 
Patterson methods.  This is because once one derivative is solved, or even one 
site in one derivative is found, the much more powerful difference Fourier 
method can be used to solve all the others.  For these other derivatives, the 
Patterson function is only needed as a check on the solution.   Nevertheless, 
finding even one derivative that can be solved by Patterson methods is often 
the hardest step in the MIR procedure.


What does HASSP do?

HASSP  (Heavy Atom Search and Superposition Program) is simply an automated 
procedure for identifying potential solutions to a difference Patterson 
function (4).  It has an additional and very useful feature of roughly 
estimating the probability that a particular solution could have occurred by 
chance alone, so you have an idea of the significance of a solution as well. 

The basic idea used in HASSP is quite straightforward:  place a "test" heavy 
atom or atoms at every possible position in the asymmetric unit of the unit 
cell, and predict all the self and cross vectors from this arrangement of 
sites.  Then look at the difference Patterson function at all these locations, 
and note the lowest value of the Patterson at all of these self and cross 
vectors.  A high value of this "minimum function" means all the peaks are 
present, a low value means at least some are missing.  This is just what you 
would do by hand, but here it's all done for you.


Significance of a solution.

The significance of a solution found by HASSP is a function not just of how 
high the minumum function value is for that solution.  It also depends on how 
many peaks were predicted as well as on the noise level in the map and the 
number of different solutions that were tried.  For example, suppose we look 
for single-site solutions to a map at a resolution of 3 A in a space group with 
a single Harker vector, and where the cell dimensions are about 30 x 30 x 30A.
In this case we have a very good chance of coming up with a solution that is 2 
times the rms value in the map--and being wrong.  That is because we have 
effectively looked at about 100 possible solutions ( even if we choose a finely 
spaced grid, the independent solutions we examine will be spaced by about the 
resolution of the map in each of two directions), and about 1 in 50 peaks in a 
(random) map will be greater than twice the rms value.  On the other hand, if 
we are in a space group with the same cell dimensions and the map was 
calculated to the same resolution, but the map has 3 Harker sections, the 
chance that we come up with a single-site solution that matches all 3 Harker 
sections with heights of twice the rms in the map is only about  0.008.  This 
is because we have looked at about 1000 possible solutions (10 along each of 3 
directions) , and the chance that any one solution would give all 3 peaks 
greater than 2 times the rms would be about (1/50)**3.  Therefore the chance 
that at least one of the 1000 test solutions would, by chance, give this height 
at all 3 peaks is about 1-(1-(1/50)**3)**1000, or about 0.008.


Single-site searches.  

The simplest Patterson search is simply to look at the Harker sections and 
identify heavy atom positions consistent with the major peaks on these 
sections.  In HASSP, all of the positions in the unit cell that give different 
sets of Harker vectors are tested, and a list of the positions that yield the 
highest values of the minumum function are listed.

Example in space group C2

4 equivalent positions:  (x,y,z); (-x,y,-z); (1/2+x,1/2+y,z); (1/2-x,1/2+y,-z)

Unique portion of real cell: 0 to 1 in x; 0 to 1/2 in y; 0 to 1/2 in z.

Symmetry of the Patterson cell: this is the symmetry of the real cell plus 
inversion:
8 equivalent positions: (u,v,w); (-u,-v,-w); (-u,v,-w); (u,-v,w) and each of 
these + (1/2,1/2,0)

Unique portion of Patterson cell: 0 to 1/2 in x; 0 to 1/2 in y; 0 to 1/2 in z.

An atom at (x,y,z) has 1 unique Harker (self) vector in the Patterson (all 
other self vectors are related to this one by inversion or symmetry of the 
space group):  (u,v,w)=(2x,0,2z)


Sample output from HASSP on space group C2:

THIS SPACE GROUP HAS   2 SETS OF   2 IDENTICAL GROUPS
OF EQUIVALENT POSITIONS RELATED BY CENTERING TRANSLATIONS:

GROUP 2 IS RELATED TO GROUP 1 BY THE TRANSLATION:  (0.500,0.500,0.000)


THE FUNDAMENTAL SET OF  2  ROTATION MATRICES AND TRANSLATION VECTORS FOR THIS 
SPACE GROUP IS:

 1  0  0  0.000         -1  0  0  0.000
 0  1  0  0.000          0  1  0  0.000
 0  0  1  0.000          0  0 -1  0.000

(ASIDE FROM CENTERING, IF ANY) THERE ARE  4 EQUIVALENT POSITIONS IN THE 
PATTERSON:

 1  0  0            -1  0  0            -1  0  0               1  0  0
 0  1  0             0 -1  0             0  1  0               0 -1  0
 0  0  1             0  0 -1             0  0 -1               0  0  1


THERE ARE  1 UNIQUE HARKER VECTORS:

 2  0  0  0.000
 0  0  0  0.000
 0  0  2  0.000

COORDINATES ALONG Y-AXIS ARE DETERMINED ONLY TO WITHIN A CONSTANT.


Single-site search in C2:  just look at all values of x from 0 to 1/2, all z 
from 0 to 1/2.  For each, look at the value of the Patterson at (2x,0,2z) and 
note the highest peaks:

LIST OF MAJOR PEAKS IN SINGLE-ATOM SEARCH:GENERAL POSITIONS

  PEAK         X         Y         Z        HEIGHT     PROB THAT THIS           
                                                                IS BY CHANCE

    1         0.484     0.000     0.094     9822.5              0.000
    2         0.141     0.000     0.219     9478.0              0.000
    3         0.055     0.000     0.094     3322.7              0.669
    4         0.430     0.000     0.000     3132.8              0.794

Note that the top two solutions in this single-atom search correspond (in x and 
z) to the two "true" sites in the derivative. The value of "y" cannot be 
determined from the Patterson.  The height of the Harker vectors for these two 
solutions are 9.5 and 9.8 times the rms of the map, and obtaining peaks this 
high by chance is of course very unlikely.  The next two possible solutions are 
3 times the rms of the map, and each of these has a good chance of appearing by 
chance (which they did).


Two-site solutions to a Patterson.  

In a two-site search,  two sites are considered together as a possible solution 
to the Patterson.  Their self-vectors and all cross-vectors are considered and 
the minimum height of all these peaks is noted.  As it is a bit time-consuming 
to test all possible pairs of atoms in the asymmetric unit, a shortcut is used 
here and only pairs of atoms that are related by a cross-vector found in the 
map are considered.  In this way it is certain that at least one of the cross-
vectors predicted from the pair of atoms will be a high peak in the map.  In 
practice, then, a high peak in the map (X,Y,Z) is noted, then atom A is placed 
at all possible positions in the asymmetric unit (x,y,z), and atom B is always 
placed at the coordinates (x+X,y+Y,z+Z).  Then all self and cross-vectors are 
calculated.

Searches for additional sites.  

Once a two-site solution has been found, additional sites can be tested for 
consistency with the existing solution.  Atoms are placed at all possible 
positions in the asymmetric unit, and once again all self and cross vectors and 
the minimum functions are calculated, and the highest minimum function is 
chosen.  If the likelihood of obtaining this value of the minimum function is 
very low, the new site is included in the solution.

Example of a two-site search in space group C2:

LIST OF ISOLATED PATTERSON PEAKS: GENERAL POSITIONS.

  PEAK         X         Y         Z         HEIGHT

    1         0.156     0.344     0.125     10135.
    2         0.125     0.344     0.313     10082.
    3         0.070     0.219     0.172     2869.9

These peaks were found in a search of the difference Patterson function for our 
test case, considering (first) just locations in general positions in the map.  
Peaks in special positions are considered later, only after all peaks in 
general positions have been tried.  Each of these is considered, in turn, as a 
possible cross-vector in a two atom search. The first two are "real" cross-
vectors.  The first yields,


TWO-ATOM ORIGIN SEARCH.  TRY # 1 .
CROSS-VECTOR BETWEEN SITES = (0.156,0.344,0.125)


MAXIMUM AGREEMENT FOUND IN SEARCH WITH THE TWO SITES:
(0.484,0.000,0.094) AND