Home
Resume
About Me
Research
Quotes, etc.
Fun Reading
Local Docs
Spacey Stuff
Useful Links
TB Genomics
Xroads Mag

Fold Recognition Related Files and Docs

SEQUENCES

PROSITE
PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs.

Protein Identification Resource (PIR)
PIR Web version has hot links to GenBank - DNA Sequence Database, DDBJ - The DNA Data Bank of Japan, EC-Enzyme - The EC Enzyme Classification Database, GDB - The Genome Data Base, and Refbase - A Protein Sequence Citation Database.

SWISS-PROT
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc), a minimal level of redundancy and high level of integration with other databases.

SBASE
SBASE is a searchable collection of protein domain sequences.

OWL
The OWL database is a non-redundant protein sequence database produced from the following source databases: SWISS-PROT, PIR (1-3), GenBank, NRL-3D ( a sequence-structure database derived from the protein data bank (PDB)).

Protein motif fingerprint database (PRINTS)
PRINTS Database, derived from the OWL Database, is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family. The diagnostic power of fingerprints is refined by iterative scanning of OWL. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs: the database thus provides a useful adjunct to PROSITE.

BLAST
Basic Local Alignment Search Tool (BLAST) BLAST performs fast database searching combined with rigorous statistics for judging the significance of matches. Five BLAST programs search many different combinations of query and database sequences. The BLAST algorithm is described in S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, J. Mol. Biol. 215, 403-10 (1990).

GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. A five-page description is available. There are approximately 730,500,000 bases in 1,115,000 sequence records as of December 1996.

BLOCKS database
The blocks for the BLOCKS database are made automatically by looking for the most highly conserved regions in groups of proteins represented in the PROSITE database. These blocks are then calibrated against the SWISS-PROT database to obtain a measure of the chance distribution of matches. It is these calibrated blocks that make up the BLOCKS database.

The Institute for Genomics Research
The TIGR page holds the sequences for many published and soon to be published genomes and proteomes. It is an invaluable resource for genomics databases.

IBC Databases
This is simply a random mirror of WUSTL site containing vast amounts of sequence information mirroring.


STRUCTURES

Protein Data Bank (PDB)
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules, serving a global community of researchers, educators, and students.

PROCHECK
Procheck is a program to check the quality of protein structures. It is the official quality check program from the Brookhaven National Laboratory Protein Data Bank and can be used to evaluate new X-ray structures and homology models. Both bonded and non-bonded contacts are listed in superb postscript output.

Molecules R Us
Welcome to the NIH Molecules R US Utililty. This facility combines a full text search of the PDB database with a FORM interface to customize the format of the selected structure.

SCOP
Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. The scop database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.

DALI
The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is mailed back to you. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. If you want to know the structural neighbours of a protein already in the Protein Data Bank, you can find them in the FSSP database.

CATH
The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. All non-protein, model, and "C-alpha only" structures are not classified in CATH. Only crystal structures solved to resolution better than 3.0 angstroms are considered, together with NMR structures. This filtering of the Brookhaven databank is performed using the program SIFT (Michie et al, (1996)). There are four major levels in this hierarchy; Class, Architecture, Topology (fold family) and Homologous superfamily.


PREDICTION

Protein Structure Prediction Center

PROSTAR The protein potential site (CARB/UMBI/NIST)

PredictProtein Server

  • Generation of multiple sequence alignments (MaxHom)
  • Prediction of secondary structure (PHDsec)
  • Prediction of solvent accessibility (PHDacc)
  • Prediction of transmembrane helices (PHDhtm)
  • Prediction of topology for transmembrane proteins (PHDtopology)
  • Fold recognition by prediction-based threading (PHDthreader)
  • Evaluation of secondary structure prediction accuracy (EvalSec)
NNPREDICT Secondary structure prediction (Cohen group, UCSF)

Homology modeling

Fold recognition