MAPS: an automated program for Multiple Alignment of Protein Structures

by Guoguang Lu

General Description

MAPS (which stands for Multiple Alignment of Protein Structures) is an automated program for comparisons of multiple protein structures. From several homologous proteins with common structural similarities, the program can automatically superimpose the 3d models, detect which residues are structural equivalent among all the structures and provide the residue-to-residue alignment.The structurally equivalent residues are defined according to the approximate position of both main-chain and side-chain atoms of all the proteins. According to structure similarity, the program calculate a score of structure diversity, which can be used to build a phylogenetic tree. For detailed descriptions about how it works, please see the reference paper)


The program can be used either on Unix computers or via a Web server.

Reference

Lu, G. An Approach for Multiple Alignment of Protein Structures (1998, in manuscript)

Keyword Commands

The parameters of the MAPS program can be controlled by different lines of texts, each of which is a "keywords command". Any command line which starts with "!" will be ignored.
  • TOPFIT option
    If the option is "yes" (default), the program will first perform superposition and alignment between all the pairs of the compared structures using
    TOP algorithm. The multiple structure alignment will be carried out after this stage. In this option, the coordinates of protein structures do not need to be superimposed previously.
    If the option is "no", the program assume all the compared structures have already been superimposed so it only carried the multiple structure alignment.
    If the option is "fast" the program will carry output superposition from all the proteins to the first protein. It is similar, but slightly different from the option "yes" option and this option should be faster.
    (default: TOPFIT yes)

  • PDB file_name ID zone
    After the keyword PDB, the first column is a file name. Separated by a blank, the second column is an ID name which should be be longer than 10 characters. Users can optionally specified a zone region in the 3rd column which should be in SCOP format.
    for example: PDB 1zpd.pdb ZmPDC a:2-190

  • RESIDUE number_of_residues
    This command is to specify the shortest length of a consecutive fragment.
    (default 3)

  • DISTANCE distance
    This command is to specify the maximum distance between Calpha atoms of structurally equivalent residues.
    (default 3.8 Angstrom)

  • CBETA angle1 angle2
    This command is to specify the maximum angles of direction difference formed by Calpha-Cbeta atoms between structurally equivalent residues. The program first use angle 1 as the criteria then use angle 2 to search the N-terminal and C-terminal of the detected fragment (see reference paper)
    (default: CBETA 45. 80.) br>

  • MATCH
    same as the MATCH command in the TOP program.

  • LIST yes/no
    If the option is yes the program will listed more detailed information.
    (default no)

  • REORDER yes/no
    If NO is presented, the alignment output of proteins will be ordered as original input order. Otherwise, it will be reordered according the distance of structural diversity (i.e. closer proteins will be put together)

  • WRITE yes/no
    If the option is yes the program will write out the coordinates of superimposed structures from all the proteins (except first protein) to the first protein. The names of the output files end with a suffix _maps.
    For examples, if the input file names are mol1.pdb mol2.pdb mol3.pdb .... , the output name will be mol2.pdb_maps, mol3.pdb_maps,
    (default no)

  • Example Input (Unix)

    # # input: # PDB filename id zone # cat > t.inp << END !dump yes pdb $PDB/1pyd.pdb 1PYD_ScPDC2 a: pdb $PDB/1pox.pdb 1POX_POX a: pdb $PDB/1bfd.pdb 1BFD match rate 0.5 1.0 residue 3 distance 3.8 END maps < t.inp

    Example Output

    ======================================================== M A P S Multiple Alignment OF Proteins Structures Version 0.1 Nov-12-1998 ======================================================== Author: Guoguang Lu Karolinska Institute, Stockholm, Sweden E-mail: guoguang@alfa.mbb.ki.se WWW: http://gamma.mbb.ki.se/~guoguang/maps.html --------------------------------------------------------- .... Total 4 will be used for 3d comparison 4 models will be aligned Model 1 Residues 567 Name: 1ZPD_ZmPDC PDB file: zmA_z.pdb 1 Model 2 Residues 537 Name: 1PYD_ScPDC PDB file: 1pyd_z.pdb 568 Model 3 Residues 585 Name: 1POX_POX PDB file: 1poxA_z.pdb 1105 Model 4 Residues 523 Name: 1BFD PDB file: /nfs/scratch/guoguan 1690 Maxinium residue number 585 .... Average fitting residues: 321.0 ------------------------------------- Matrix of fitting residues ------------------------------------- 1ZPD_ZmP1PYD_ScP1POX_POX1BFD 1ZPD_ZmPDC 1PYD_ScPDC 397. 1POX_POX 335. 279. 1BFD 313. 281. 321. ------------------------------------- Sequence identities of aligned residues ------------------------------------- 1ZPD_ZmP1PYD_ScP1POX_POX1BFD 1ZPD_ZmPDC 1PYD_ScPDC 8.1 1POX_POX 12.5 10.0 1BFD 6.4 7.1 3.7 ------------------------------------- Matrix for fitting scores ------------------------------------- 1ZPD_ZmP1PYD_ScP1POX_POX1BFD 1ZPD_ZmPDC 1PYD_ScPDC 0.9 1POX_POX 1.2 1.7 1BFD 1.3 1.7 1.3 New Matrix 1 1 1 1 0.000 0.872 1.224 1.338 0.872 0.000 1.739 1.709 1.224 1.739 0.000 1.294 1.338 1.709 1.294 0.000 .... Total 2212 residues distance 3.800000 minimum number of residues 3 4 models will be aligned Total 2898 equations with 18 parameters Cycle= 1 RMS 1.064 Mean-distance 0.955 with 161 aligned residues 4 models will be aligned Total 3024 equations with 18 parameters Cycle= 2 RMS 1.073 Mean-distance 0.964 with 168 aligned residues 4 models will be aligned Total 3132 equations with 18 parameters Cycle= 3 RMS 1.086 Mean-distance 0.976 with 174 aligned residues 4 models will be aligned Total 3168 equations with 18 parameters Cycle= 4 RMS 1.096 Mean-distance 0.983 with 176 aligned residues 4 models will be aligned Total 3168 equations with 18 parameters Cycle= 5 RMS 1.096 Mean-distance 0.983 with 176 aligned residues Total 18 fragment with 176 residues are equivalent in this set of the structures Fragment 1 length 13 1ZPD_ZmPDC A22 FAVAGDYNLVLLD A35 1PYD_ScPDC A23 FGLPGDFNLSLLD A36 1POX_POX A30 YGIPGGSINSIMD A43 1BFD 21 FGNPGSNELPFLK 34 | Fragment 2 length 17 1ZPD_ZmPDC A43 EQVYCCNELNCGFSAEG A60 1PYD_ScPDC A44 RWAGNANELNAAYAADG A61 1POX_POX A52 HYIQVRHEEVGAMAAAA A69 1BFD 40 RYILALQEACVVGIADG 57 | | Fragment 3 length 14 1ZPD_ZmPDC A67 AAAVVTYSVGALSA A81 1PYD_ScPDC A68 SCIITTFGVGELSA A82 1POX_POX A77 GVCFGSAGPGGTHL A91 1BFD 65 AFINLHSAAGTGNA 79 | Fragment 4 length 3 1ZPD_ZmPDC A86 GAY A89 1PYD_ScPDC A87 GSY A90 1POX_POX A96 DAR A99 1BFD 84 NAW 87 Fragment 5 length 12 1ZPD_ZmPDC A92 LPVILISGAPNN A104 1PYD_ScPDC A93 VGVLHVVGVPSI A105 1POX_POX A102 VPVLALIGQFGT A114 1BFD 90 SPLIVTAGQQTR 102 | Fragment 6 length 8 1ZPD_ZmPDC A131 ITAAAEAI A139 1PYD_ScPDC A132 ISETTAMI A140 1POX_POX A133 VADYNVTA A141 1BFD 122 LVKWSYEP 130 Fragment 7 length 8 1ZPD_ZmPDC A148 IDHVIKTA A156 1PYD_ScPDC A149 IDRCIRTT A157 1POX_POX A150 IDEAIRRA A158 1BFD 139 MSRAIHMA 147 | Fragment 8 length 14 1ZPD_ZmPDC A159 KKPVYLEIACNIAS A173 1PYD_ScPDC A160 QRPVYLGLPANLVD A174 1POX_POX A161 QGVAVVQIPVDLPW A175 1BFD 151 QGPVYLSVPYDDWD 165 Fragment 8 length 14 1ZPD_ZmPDC A159 KKPVYLEIACNIAS A173 1PYD_ScPDC A160 QRPVYLGLPANLVD A174 1POX_POX A161 QGVAVVQIPVDLPW A175 1BFD 151 QGPVYLSVPYDDWD 165 Fragment 9 length 5 1ZPD_ZmPDC A383 TVIAE A388 1PYD_ScPDC A383 VVIAE A388 1POX_POX A389 IYSID A394 1BFD 371 IYLNE 376 Fragment 10 length 3 1ZPD_ZmPDC A407 EYE A410 1PYD_ScPDC A407 ISQ A410 1POX_POX A414 ITS A417 1BFD 396 YFC 399 Fragment 11 length 4 1ZPD_ZmPDC A413 GHIG A417 1PYD_ScPDC A413 GSIG A417 1POX_POX A420 ATMG A424 1BFD 401 GGLG 405 | Fragment 12 length 6 1ZPD_ZmPDC A418 SVPAAF A424 1PYD_ScPDC A418 TTGATL A424 1POX_POX A425 GIPGAI A431 1BFD 406 ALPAAI 412 Fragment 13 length 3 1ZPD_ZmPDC A427 VGA A430 1PYD_ScPDC A427 FAA A430 1POX_POX A434 LNY A437 1BFD 415 LAE 418 Fragment 13 length 3 1ZPD_ZmPDC A427 VGA A430 1PYD_ScPDC A427 FAA A430 1POX_POX A434 LNY A437 1BFD 415 LAE 418 Fragment 14 length 45 1ZPD_ZmPDC A433 RNILMVGDGSFQLTAQEVAQMVRLKLPVIIFLINNYGYTIEVMIH A478 1PYD_ScPDC A437 RVILFIGDGSLQLTVQEISTMIRWGLKPYLFVLNNDGYTIEKLIH A482 1POX_POX A440 QVFNLAGDGGASMTMQDLVTQVQYHLPVINVVFTNCQYGFIKDEQ A485 1BFD 421 QVIAVIGDGSANYSISALWTAAQYNIPTIFVIMNNGTYGALRWFA 466 ||| | | Fragment 15 length 4 1ZPD_ZmPDC A481 YNNI A485 1PYD_ScPDC A487 YNEI A491 1POX_POX A494 GVEF A498 1BFD 475 GLDV 479 Fragment 16 length 5 1ZPD_ZmPDC A486 NWDYA A491 1PYD_ScPDC A492 GWDHL A497 1POX_POX A499 DIDFS A504 1BFD 480 GIDFR 485 | Fragment 17 length 3 1ZPD_ZmPDC A494 EVF A497 1PYD_ScPDC A500 PTF A503 1POX_POX A507 DGV A510 1BFD 488 KGY 491 Fragment 18 length 9 1ZPD_ZmPDC A532 PTLIECFIG A541 1PYD_ScPDC A533 IRMIEIMLP A542 1POX_POX A537 PVLIDAVIT A546 1BFD 516 PVLIEVSTV | Total 14 residues are identical among all 4 structures Rate of overall identity 0.080 Statistics for residues which share least identity 1ZPD_ZmPDC 93 0.528 1PYD_ScPDC 104 0.591 1POX_POX 74 0.420 1BFD 77 0.438
    According to "structure diversity" values between each pair of protein structures, the program calculate a phylogenetic tree using "neighbor join method" (Saitou & Nei Mol.Biol.Evol.4:406-425, 1987) It outputs the tree in PHYLIP format with name "filetree". This tree file can be used to display graphcally by the DRAWGRAM in PHYLIP package or by the NJPLOT program.

    Distribution of the program

    All rights of the program are reserved. No re-distribution is allowed without permission from the author. If the program contributes to the work in any publication, the author should be acknowledged by citing a proper reference. The program is free for academic users (The current version is also free for industry users). So far, executable code for DEC/Alpha/OSF1, Linux/Intel, SGI IRIX5 and SGI IRIX6.5

    Web Server

    A Web server is currently located in http://bioinfo1.mbfys.lu.se/TOP/webmaps.html together with the TOP server To use this server, users should first prepare the coordinates of several homologous proteins in PDB format. In the server, users give a maximum number of proteins to be prepared, and click "press for more", then input file name, ID and zone of each protein. A title and an e-mail address is preferred. The server will print the results of the multiple structure alignment on screen and send the results by e-mail if the e-mail address is presented.

    Visitor Since Nov 24 MET 1998
    Other programs