Fold Recognition Related Files and Docs
SEQUENCES PROSITE PROSITE is a method of
determining what is the function of uncharacterized proteins translated from genomic or
cDNA sequences. It consists of a database of biologically significant sites, patterns
and profiles that help to reliably identify to which known family of protein (if any) a
new sequence belongs.
Protein
Identification Resource (PIR) PIR Web version has hot links to GenBank - DNA Sequence Database, DDBJ - The DNA Data Bank of Japan, EC-Enzyme - The EC Enzyme Classification Database, GDB - The Genome Data Base, and Refbase - A Protein Sequence Citation Database.
SWISS-PROT
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the
description of the function of a protein, its domains structure, post-translational modifications, variants, etc), a minimal
level of redundancy and high level of integration with other databases.
SBASE
SBASE is a searchable collection of protein domain sequences.
OWL
The OWL database is a non-redundant protein sequence database produced from the following
source databases: SWISS-PROT, PIR (1-3), GenBank, NRL-3D ( a sequence-structure database derived
from the protein data bank (PDB)).
Protein motif fingerprint
database (PRINTS) PRINTS Database, derived from the OWL Database,
is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used
to characterise a protein family. The diagnostic power of fingerprints is refined by iterative scanning of
OWL. Usually the motifs do not overlap, but are separated along a sequence, though they may be
contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and
powerfully than can single motifs: the database thus provides a useful adjunct to PROSITE.
BLAST
Basic Local Alignment Search Tool (BLAST)
BLAST performs fast database searching combined with rigorous statistics for judging the significance of matches. Five
BLAST programs search many different combinations of query and database sequences. The BLAST algorithm is
described in S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, J. Mol. Biol. 215, 403-10 (1990).
GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
A five-page description is available. There are approximately 730,500,000 bases in 1,115,000 sequence records
as of December 1996.
BLOCKS database
The blocks for the BLOCKS database are made automatically by looking for the most highly conserved regions in groups
of proteins represented in the PROSITE database. These blocks are then calibrated against the SWISS-PROT database to
obtain a measure of the chance distribution of matches. It is these calibrated blocks that make up the BLOCKS database.
The Institute for Genomics Research
The TIGR page holds the sequences for many published and soon to be published genomes and proteomes. It is an invaluable resource for genomics databases.
IBC Databases
This is simply a random mirror of WUSTL site containing vast amounts of sequence information mirroring.
STRUCTURES
Protein Data Bank (PDB)
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological
macromolecules, serving a global community of researchers, educators, and students.
PROCHECK
Procheck is a program to check the quality of protein structures. It is the official quality check program from the Brookhaven
National Laboratory Protein Data Bank and can be used to evaluate new X-ray structures and homology models. Both bonded and
non-bonded contacts are listed in superb postscript output.
Molecules R Us
Welcome to the NIH Molecules R US Utililty. This facility combines a full text search of the PDB database with a
FORM interface to customize the format of the selected structure.
SCOP
Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common
evolutionary origin. The scop database, created by manual inspection and abetted by a battery of automated methods, aims
to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins
whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the
close relatives of any particular protein, and a framework for future research and classification.
DALI
The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein
structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is
mailed back to you. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are
not detectable by comparing sequences. If you want to know the structural neighbours of a protein already in the Protein
Data Bank, you can find them in the FSSP database.
CATH
The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. All
non-protein, model, and "C-alpha only" structures are not classified in CATH. Only crystal structures solved to
resolution better than 3.0 angstroms are considered, together with NMR structures. This filtering of the Brookhaven
databank is performed using the program SIFT (Michie et al, (1996)). There are four major levels in this hierarchy;
Class, Architecture, Topology (fold family) and Homologous superfamily.
PREDICTION
Protein Structure Prediction
Center
PROSTAR The protein potential site (CARB/UMBI/NIST)
PredictProtein Server
- Generation of multiple sequence alignments (MaxHom)
- Prediction of secondary structure (PHDsec)
- Prediction of solvent accessibility (PHDacc)
- Prediction of transmembrane helices (PHDhtm)
- Prediction of topology for transmembrane proteins (PHDtopology)
- Fold recognition by prediction-based threading (PHDthreader)
- Evaluation of secondary structure prediction accuracy (EvalSec)
NNPREDICT Secondary structure prediction (Cohen group, UCSF)
Homology modeling
Fold recognition
|