Structural databases

STRUCTURAL
DATABASES
PDB , CSD , CATH

INTRODUCTION:
• Structural databases are the essential tools for all
crystallographic works.
• They are used in the process of producing, solving
,refining and publishing the structure of a new material.

THE COMMON INFORMATION FOUND IN THE
STRUCTURAL DATABASE INCLUDE:
• Bibliographic information- author name, journal reference.
• The chemical compound name, formula and oxidation states
of the element present.
• Number of formula units per unit cell(contents)
• Dimension and symmetry of the unit cell.
• symmetry of the structure.
• Atomic coordinates, occupancies and thermal parameters.
• Any special features of the experiment to collect the
diffraction data.
• The structures in the database have been solved using X-ray,
neutron and electron diffraction techniques on sample,
computational modelling or by using NMR.

PDB:(PROTEIN DATABASES)
• Protein database contains the information about 3D structures of
the proteins.
• The structural information of the protein can be determined by
X-ray crystallography or Nuclear magnetic resonance(NMR)
spectroscopy methods.
• The PDB is overseen by an organisation called World Wide
Protein Data Bank,wwPDB.
• It is available at
• www.wwpdb.org
• www.pdbe.org
• www.pdbj.org
• Each entry in the PDB is provided with a unique identification
number called PDB ID.It is a 4 letter identification number which
consists of both alpha numeric characters.

PDB FILE FORMAT:
The PDB file format is the standard file format for protein
structure file. It describes how molecules are held together in
3-D Structure of a protein.
• The file contain hundreds or thousands of lines called
records. Each record provides a different set of information
like
• HEADER: This reocord contains file name, date of submission
and the PDB ID of the molecule.
• TITLE: This record contains the title of the PDB entry.
• COMPND: This record includes the protein name.
• SOURCE: This record contains the name of the organism in
which the particular protein is obtained.
• KEYWDS: This record contains the keywords that describes
about the protein.

PDB FILE FORMAT:
• EXPDTA: This record contains the method used for the
protein structure experiment.
• AUTHOR: This record contains the name of the
contributors who put the data into the database.
• REVDATA: This record contains the revision date of the
data related to protein.(Date of modification)
• JRNL: This record contains the journal details of the
literature about the protein
• REMARK: This record contains the remarks about the
protein structure.
• DBREF: This record contains the reference to the protein
in the sequence databases.

PDB FILE FORMAT:
• SEQRES: This record contains information about the
amino acid sequence of protein.
• HET: This record contains details about the non protein
substances in the protein.
• HETNAM: This record contain the compound name of
the non protein substances.
• HETSYN: This record contains the identical compound
name for the non protein substances.
• FORMUL: This record contain the chemical formula of
the non protein substances.
• HELIX: This record holds the recognition of helical
substructures.

PDB FILE FORMAT:
• LINK: This record holds the recognition of inter-residue bonds.
• ATOM: This record contains the atomic coordinates for the
structure.
• HEATM: This record contains the atomic coordinate record for
non protein substances.
• CONECT: This record contains the details about the bonds
involved in non protein atoms.
• MASTER: This record contains the details about the number of
REMARK records, HET records, HELIX records, CONECT records
and SEQRES records, etc.
• END: This record represent the end of the file.
•

THE PDB FORMAT
• 123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+
• HEADER RETINOIC-ACID TRANSPORT 28-SEP-94 1CBS 1CBS 2
• COMPND CELLULAR RETINOIC-ACID-BINDING PROTEIN TYPE II COMPLEXED 1CBS 3
• COMPND 2 WITH ALL-TRANS-RETINOIC ACID (THE PRESUMED PHYSIOLOGICAL 1CBS 4
• COMPND 3 LIGAND) 1CBS 5
• SOURCE HUMAN (HOMO SAPIENS) 1CBS 6
• SOURCE 2 EXPRESSION SYSTEM: (ESCHERICHIA COLI) BL21 (DE3) 1CBS 7
• SOURCE 3 PLASMID: PET-3A 1CBS 8
• SOURCE 4 GENE: HUMAN CRABP-II 1CBS 9
• AUTHOR G.J.KLEYWEGT,T.BERGFORS,T.A.JONES 1CBS 10
• REVDAT 1 26-JAN-95 1CBS 0 1CBS 11
• -------------------------------------------------------------------------------------------------------------------------------------------

CATH:
• The CATH means Class, Architecture,Topology and
homologouus super family database for proteins
• It was created by Janet Thornton and colleagues at the
university college London.
• It is available at
http://www.biochem.ucl.ac.uk/bsm/cath
• http://www.cathdb.info
• It is a protein classification tool

IT CONSISTS OF FOUR LEVELS
• Class: It includes structural conformations of proteins
and their contents(alpha, beta, alpha/beta, etc.)
• Architecture: It describes the gross orientation of
secondary structures. It also gives information about
folding of polypeptide chains.
• Topology: It deals with the structures formed due to
different topological arrangement of secondary
structures. It explains the super families of the proteins.
• Homologous super family: It compares the sequence
and structure of various proteins. It helps to trace the
evolutionary relationship among the proteins.

CATH
• The CATH aims to provide official releases of protein
structures every 12 months
• It is a free publicly available online resource.
• The latest version of CATH contains 1,14,215
domains,2178 homologous superfamilies,1110 fold
groups.

THE CATH SERVER
• The CATH have recently set up a server which allows
the user to submit the co-ordinates of the newly
determined structure for automatic classification in
CATH.
• DOMAIN BOUNDARIES AND SEQUENCE COMPARISON
• CATH contains a detective program which is good for
identifying multidomain proteins.
• The results from the detective are returned to the user in
less than a minutes.
• Identified domains are scanned against non identical
representatives from CATH using a global sequence
alignment method

CATH SERVER
• If a sequence match 95% then the domain is identical
to one in CATH.
• If a sequence match less than 30% then the structures
are compared with all the sequence families (s-level).
• ASSESING STRUCTURAL SIMILARITY:
• TOPSCAN compares the secondary strucutres in each
fold family to identify the possible fold families to which
the new structures belong.
• Subsequently the fast version of structure comparison
SSAP scans represetatives from all the families
• Structural pairs having a ssap score more than 80 are
possible homologues while the score with 70-80 don’t
have no sequence or functional similiarity.
• Finally the SSAP structural alignment is displayed using a
graphical display package.

CSD
• The cambridge structural Database is both a repository
and a validated resource for 3-D structural data of
molecules containing carbon and hydrogen.
• It is used to know about the structures of organic,
metal-organic and organometallic molecules
• The specific entries in the CSD are complementary to
PDB and Inorganic crystal structure database.
• The data in the CSD is typically obtained by X-ray
crystallography and less frequently by neutron
diffraction

CSD
• The data in the CSD is submitted by crystallographers and
chemists from all over the world.
• The CSD is maintained by an incorporated company called
Cambridge Crystallographic Data centre, CCDC
• The CCDC are publicly available for download at the point of
publication.
• The CSD is updated with about 50,000 new structures each
year and are freely available to support teaching and other
activities
• The CSD is available at
• www.ccdc.cam.ac.uk
• webcsd.ccdc.cam.ac.uk

Structural
Database
Applications
Prediction
Analysis
Mining
Compariso
n
Classificatio
n
Structure
Refinement
Databases
Annotation

Structural databases

In this document

More Related Content

What's hot

Similar to Structural databases

More from Dr.M.Priyadharshana

Recently uploaded

Structural databases