People
Software
PROFIT Manual
 
Short Description of ProFIT

ProFIT (Protein Fold Identification Tool) combines an aminoacid sequence with
a database of 3D structures, and has the potential to detect a fold related
to the native structure of the input sequence. This is a beta-release. It is
still experimental and will change during the following months.

For a description of the program, the approaches used, the expected success
rate and the quality of information retrieved, please consult the accompanying 
manuscript (paper.ps; postscript). Please report bugs, problems and suggestions
to profit@agnes.came.sbg.ac.at.

There is no optimal parameter set for all proteins and protein classes.
When you use ProFIT always vary the parameters, especially gapenalty, ss_gaps,
use_star (see below). Changing the parameters is necessary in order to find 
optimal parameters for certain sequence/structure alignment problems. If you 
want to search the database also use different parameter sets. Usefull ranges 
of parameter values are indicated in the description below. However we provide
a default set of parameters, which work in many cases, in the examples below 
(consult files rtp1hst-a.dlg and rtpsearchdb.dlg in the Example directory).
 
You can run ProFIT either interactively or in a batch job mode using a command
file. The default startup is interactive mode. Starting ProFIT with option "-f" 
and the name of your command file will start ProFIT using the commands given 
in the file and the program will exit when completed. Lines beginning with # in 
the first colum of the command file are interpreted as comments. Starting ProFIT 
with the "-s" swicth and a filename will start ProFIT, execute the commands in 
the given file and then put you in interactive mode of ProFIT, afterwards.

If you know ProsaII you will find many commands familiar; and the ProsaII manual 
will give you some additional information on several commands (ProsaII is 
available from our anonymous ftp server).


Short description of commands and variables:
===========================================

ProFIT currently suuports two modes:
        1. Align a sequence with a structure 
           (command 'align')
        2. Align a sequence with a database of structures 
           (command 'search db')

The output produced by (1) contains the alignment and scores for the sequence 
structure combination. The output in (2) is similar but the alignments are not 
included. IF after a database search some alignments are wanted, just rerun the 
individual sequence strucutre alignments using 'align'.

align
        Align sequence and structure (seqfile_a and backbone_a).

exit
        Exit ProFIT.

gap st init
        Initialize gap vector. Everytime a backbone is loaded the gap
        vector has to be initialized. Otherwise the program will
        core dump, sorry for that. This command is also necessary after
        ss_gaps is set fomr 1 to 0.
        
execute filename
        Execute commands given in the file filename. You can start
        comand files from within ProFIT interactive mode.

gap st set x y
        Don't open gaps between residues x and y of the backbone.
        This can be usefull in order to avoid gaps in certain
        regions of the structure.

gapelongation = x 
        Set gap elongation penalty. Suggested values for x are between
        01.and 0.5.

gapenalty = x 
        Set gap penalty. Suggested values for x are between 0.75 and 2.
        (Feel free to try others).

help
        Show available commands. 

read backbase listfile:
        Read structure database defined in listfile. These structures comprise
        the database used for fold recognition. "standard.3.95.lst" includes 
        all backbone files included in this package.

read backbone_a filename
        Read one backbone file. The program will produce an alignment
        combining seqfile_a and backbone_a, when the command 'align'
        is issued.

read seqfile_a filename:
        Read the sequence stored in filename.
        Formats of seqfile:
                .nof    no special format. Sequence is stored as one letter
                        code without any blanks. Every letter except whitspace
                        is interpreted as an amino acid in one letter code.
                .bbn    read one of the backbones of the suuplied structure 
                        database and use it's sequence.

search db outputfile:
        Start fold recognition. The result will be appended to outputfile.

sh
        Execute a UNIX shell command.

show all [filename]:
        Show the result of aligning seqfile_a with backbone_a. filename is 
        an optional parameter. If filename is given, the alignment will be
        appended to the file [filename], otherwise output is stdout.

ss_gaps = 1/0
        If ss_gaps is set to 1 no gaps are allowed in helices and strands.
        If set to 0 gaps are allowed in helices and strands.
        After changing this value always reinitialze the gap vector
        using "gap st init".
       
use_star = 1/0
        Two distinct vairants of alignment algorithms used to align sequence 
        and structure.


Most important output parameters:
================================
The scores NZcomb, NZpair and NZsurfare indicative of the quality of a model. 
Normally they should be close to 1. You should pay attention to the relations
among these scores. In example 2 below, 1fxd and 1hst-a both give high scores 
in NZcomb, but 1fxd has bad NZpair and bad NZsurf (i.e. too high) indicating
that the more balanced scores of 1hst-a are more favourable (always be
suspcious if NZsurf is higher than 1.2).

Following are two examples for using ProFIT. You can find the command files
in the "Example" directory.


Example 1:
=======================================
Aligning one sequence and one backbone:
=======================================

sample jobfile for aligning the sequence of replication terminating protein
from Bacillus subtilis (rtp) on the chicken histon H5 1hst-a 
(see file rtp1hst-a.dlg in the example directory):

#
# Sample jobfile for aligning the sequence of rtp
# and the strucutre of 1hst-a
# 
# dissalow gaps in helices and strands
ss_gaps = 1
# set to alignment variant 1
use_star = 1
# set the gap penalty
gapenalty = 1.5
# set penalty for gap elongation
gapelongation = 0.2
# read the sequence
read seqfile_a rtp.nof
# read the backbone
read backbone_a 1hst-a.bbn
# initialize the gap vector
gap st init
# align sequence and structure
align
# show result on screen
show all
# and store result in file
show all rtp-1hst-a.ali


Output of aligning the sequence of rtp on 1hst-a:
================================================
#  Current settings are:
#
#   Sequence:           rtp.nof
#   Seqlength:          122
#   Conformation:       1hst-a
#   Conflength:         74
#
# Alignment parameters:
#   SEC flag:           1
#   Gap penalty:        1.50
#   Gap extension:      0.20
# 
rtp.nof
1hst-a

    1  MKEEKRSSTGFLVKQRAFLKLYMITMTEQERLYGLKLLEVLRSEFKEIGFKPNHTEVYRS    60
                                                       |          |       
    1  ....SHPT-----YSEMIAAAIRA--EKSRGGSSRQSIQKYIKSHYKVG-HNADLQIKLS    48
    1               HHHHHHHHHHT     SS EEHHHHHHHHHHHS    TTHHHHHHHH    48


   61  LHELLDDGILKQIKVKKEGAKLQEVVLYQFKDYEAAKLYKKQLKVELDRCKKLIEKALSD   120
          ||  |     |  |                                                  
   61  IRRLLAAG---VLKQTKGVGA-SGSFRLA--K............................    74
   61  HHHHHHTT   SEEEE  SS    EEEE                                    74


  121  NF   122
                
  121  ..    74
  121        74


# alg_len: 122   alg_num: 74   la: 122   lb: 74
# Idents :   7    pa:  5.74   pb:  9.46   pl:  5.74
#
# Energy  : -6.59e+01
# Zcomb   : -7.02  Zpair   : -3.58  Zsurf   : -5.68
# NZcomb  : 0.88   NZpair  : 0.65   NZsurf  : 1.03


Example 2:
========================================================================
Searching for possible folds for a sequence in a database of structures:
========================================================================

sample jobfile for aligning the sequence of replication terminating protein
from Bacillus subtilis on all backbones given in a database in order to
identify possible folds (see file rtpsearchdb.dlg in the example directory):

#
# Sample dialog file for searching possible folds for rtp
# in a database of strucutres.
#
# dissalow gaps in helices and strands
ss_gaps = 1
# set to alignment variant 1
use_star = 1
# set the gap penalty
gapenalty = 1.5
# set penalty for gap elongation
gapelongation = 0.2
# read the sequence
read seqfile_a rtp.nof
# read database of structures.
# all structures given in listfile
read backbase standard.3.95.lst
# start the search and store result in rtp.out
search db rtp.out


Output of database search:
=========================

The output is not sorted yet. In order to find best candidate the output
has to be sorted f.e. using the Unix command "sort +3 -n" in order to
sort the output using Zcomb as sort criteria). 


#  Current settings are:
#
#   Sequence:           rtp.nof
#   Seqlength:          122
#   Backbone database:  standard.3.95.lst
#
# Alignment parameters:
#   SEC flag:           1
#   Gap penalty:        1.50
#   Gap extension:      0.20
# 
PDB        len  Energy   Zcomb   Zpair   Zsurf  NZcomb  NZPair  NZsurf 
=======================================================================
1fxd       58  -52.40   -7.15   -2.27   -7.39    0.93    0.43    1.38  
1hst-a     74  -65.94   -7.02   -3.58   -5.68    0.88    0.65    1.03  
1enh       54  -38.86   -6.15   -3.41   -4.66    0.80    0.66    0.87  
1abk      211  -65.13   -5.86   -3.04   -5.24    0.69    0.50    0.90  
1ovo-d     56  -29.47   -5.82   -1.13   -6.83    0.77    0.22    1.30  
1tag      314  -76.66   -5.80   -2.81   -5.36    0.68    0.46    0.93  
1aka-a    401  -36.57   -5.75   -3.60   -4.45    0.67    0.58    0.76  
1lmb-3     87  -52.76   -5.72   -3.36   -4.51    0.70    0.58    0.80  


The lines in the shown list are truncated.

A full description of all the other fields will follow in a more detailed
manual.