| |
| Short Description of ProFIT |
ProFIT (Protein Fold Identification Tool) combines an aminoacid sequence with
a database of 3D structures, and has the potential to detect a fold related
to the native structure of the input sequence. This is a beta-release. It is
still experimental and will change during the following months.
For a description of the program, the approaches used, the expected success
rate and the quality of information retrieved, please consult the accompanying
manuscript (paper.ps; postscript). Please report bugs, problems and suggestions
to profit@agnes.came.sbg.ac.at.
There is no optimal parameter set for all proteins and protein classes.
When you use ProFIT always vary the parameters, especially gapenalty, ss_gaps,
use_star (see below). Changing the parameters is necessary in order to find
optimal parameters for certain sequence/structure alignment problems. If you
want to search the database also use different parameter sets. Usefull ranges
of parameter values are indicated in the description below. However we provide
a default set of parameters, which work in many cases, in the examples below
(consult files rtp1hst-a.dlg and rtpsearchdb.dlg in the Example directory).
You can run ProFIT either interactively or in a batch job mode using a command
file. The default startup is interactive mode. Starting ProFIT with option "-f"
and the name of your command file will start ProFIT using the commands given
in the file and the program will exit when completed. Lines beginning with # in
the first colum of the command file are interpreted as comments. Starting ProFIT
with the "-s" swicth and a filename will start ProFIT, execute the commands in
the given file and then put you in interactive mode of ProFIT, afterwards.
If you know ProsaII you will find many commands familiar; and the ProsaII manual
will give you some additional information on several commands (ProsaII is
available from our anonymous ftp server).
Short description of commands and variables:
===========================================
ProFIT currently suuports two modes:
1. Align a sequence with a structure
(command 'align')
2. Align a sequence with a database of structures
(command 'search db')
The output produced by (1) contains the alignment and scores for the sequence
structure combination. The output in (2) is similar but the alignments are not
included. IF after a database search some alignments are wanted, just rerun the
individual sequence strucutre alignments using 'align'.
align
Align sequence and structure (seqfile_a and backbone_a).
exit
Exit ProFIT.
gap st init
Initialize gap vector. Everytime a backbone is loaded the gap
vector has to be initialized. Otherwise the program will
core dump, sorry for that. This command is also necessary after
ss_gaps is set fomr 1 to 0.
execute filename
Execute commands given in the file filename. You can start
comand files from within ProFIT interactive mode.
gap st set x y
Don't open gaps between residues x and y of the backbone.
This can be usefull in order to avoid gaps in certain
regions of the structure.
gapelongation = x
Set gap elongation penalty. Suggested values for x are between
01.and 0.5.
gapenalty = x
Set gap penalty. Suggested values for x are between 0.75 and 2.
(Feel free to try others).
help
Show available commands.
read backbase listfile:
Read structure database defined in listfile. These structures comprise
the database used for fold recognition. "standard.3.95.lst" includes
all backbone files included in this package.
read backbone_a filename
Read one backbone file. The program will produce an alignment
combining seqfile_a and backbone_a, when the command 'align'
is issued.
read seqfile_a filename:
Read the sequence stored in filename.
Formats of seqfile:
.nof no special format. Sequence is stored as one letter
code without any blanks. Every letter except whitspace
is interpreted as an amino acid in one letter code.
.bbn read one of the backbones of the suuplied structure
database and use it's sequence.
search db outputfile:
Start fold recognition. The result will be appended to outputfile.
sh
Execute a UNIX shell command.
show all [filename]:
Show the result of aligning seqfile_a with backbone_a. filename is
an optional parameter. If filename is given, the alignment will be
appended to the file [filename], otherwise output is stdout.
ss_gaps = 1/0
If ss_gaps is set to 1 no gaps are allowed in helices and strands.
If set to 0 gaps are allowed in helices and strands.
After changing this value always reinitialze the gap vector
using "gap st init".
use_star = 1/0
Two distinct vairants of alignment algorithms used to align sequence
and structure.
Most important output parameters:
================================
The scores NZcomb, NZpair and NZsurfare indicative of the quality of a model.
Normally they should be close to 1. You should pay attention to the relations
among these scores. In example 2 below, 1fxd and 1hst-a both give high scores
in NZcomb, but 1fxd has bad NZpair and bad NZsurf (i.e. too high) indicating
that the more balanced scores of 1hst-a are more favourable (always be
suspcious if NZsurf is higher than 1.2).
Following are two examples for using ProFIT. You can find the command files
in the "Example" directory.
Example 1:
=======================================
Aligning one sequence and one backbone:
=======================================
sample jobfile for aligning the sequence of replication terminating protein
from Bacillus subtilis (rtp) on the chicken histon H5 1hst-a
(see file rtp1hst-a.dlg in the example directory):
#
# Sample jobfile for aligning the sequence of rtp
# and the strucutre of 1hst-a
#
# dissalow gaps in helices and strands
ss_gaps = 1
# set to alignment variant 1
use_star = 1
# set the gap penalty
gapenalty = 1.5
# set penalty for gap elongation
gapelongation = 0.2
# read the sequence
read seqfile_a rtp.nof
# read the backbone
read backbone_a 1hst-a.bbn
# initialize the gap vector
gap st init
# align sequence and structure
align
# show result on screen
show all
# and store result in file
show all rtp-1hst-a.ali
Output of aligning the sequence of rtp on 1hst-a:
================================================
# Current settings are:
#
# Sequence: rtp.nof
# Seqlength: 122
# Conformation: 1hst-a
# Conflength: 74
#
# Alignment parameters:
# SEC flag: 1
# Gap penalty: 1.50
# Gap extension: 0.20
#
rtp.nof
1hst-a
1 MKEEKRSSTGFLVKQRAFLKLYMITMTEQERLYGLKLLEVLRSEFKEIGFKPNHTEVYRS 60
| |
1 ....SHPT-----YSEMIAAAIRA--EKSRGGSSRQSIQKYIKSHYKVG-HNADLQIKLS 48
1 HHHHHHHHHHT SS EEHHHHHHHHHHHS TTHHHHHHHH 48
61 LHELLDDGILKQIKVKKEGAKLQEVVLYQFKDYEAAKLYKKQLKVELDRCKKLIEKALSD 120
|| | | |
61 IRRLLAAG---VLKQTKGVGA-SGSFRLA--K............................ 74
61 HHHHHHTT SEEEE SS EEEE 74
121 NF 122
121 .. 74
121 74
# alg_len: 122 alg_num: 74 la: 122 lb: 74
# Idents : 7 pa: 5.74 pb: 9.46 pl: 5.74
#
# Energy : -6.59e+01
# Zcomb : -7.02 Zpair : -3.58 Zsurf : -5.68
# NZcomb : 0.88 NZpair : 0.65 NZsurf : 1.03
Example 2:
========================================================================
Searching for possible folds for a sequence in a database of structures:
========================================================================
sample jobfile for aligning the sequence of replication terminating protein
from Bacillus subtilis on all backbones given in a database in order to
identify possible folds (see file rtpsearchdb.dlg in the example directory):
#
# Sample dialog file for searching possible folds for rtp
# in a database of strucutres.
#
# dissalow gaps in helices and strands
ss_gaps = 1
# set to alignment variant 1
use_star = 1
# set the gap penalty
gapenalty = 1.5
# set penalty for gap elongation
gapelongation = 0.2
# read the sequence
read seqfile_a rtp.nof
# read database of structures.
# all structures given in listfile
read backbase standard.3.95.lst
# start the search and store result in rtp.out
search db rtp.out
Output of database search:
=========================
The output is not sorted yet. In order to find best candidate the output
has to be sorted f.e. using the Unix command "sort +3 -n" in order to
sort the output using Zcomb as sort criteria).
# Current settings are:
#
# Sequence: rtp.nof
# Seqlength: 122
# Backbone database: standard.3.95.lst
#
# Alignment parameters:
# SEC flag: 1
# Gap penalty: 1.50
# Gap extension: 0.20
#
PDB len Energy Zcomb Zpair Zsurf NZcomb NZPair NZsurf
=======================================================================
1fxd 58 -52.40 -7.15 -2.27 -7.39 0.93 0.43 1.38
1hst-a 74 -65.94 -7.02 -3.58 -5.68 0.88 0.65 1.03
1enh 54 -38.86 -6.15 -3.41 -4.66 0.80 0.66 0.87
1abk 211 -65.13 -5.86 -3.04 -5.24 0.69 0.50 0.90
1ovo-d 56 -29.47 -5.82 -1.13 -6.83 0.77 0.22 1.30
1tag 314 -76.66 -5.80 -2.81 -5.36 0.68 0.46 0.93
1aka-a 401 -36.57 -5.75 -3.60 -4.45 0.67 0.58 0.76
1lmb-3 87 -52.76 -5.72 -3.36 -4.51 0.70 0.58 0.80
The lines in the shown list are truncated.
A full description of all the other fields will follow in a more detailed
manual.
|