0% found this document useful (0 votes)

231 views76 pages

Protein Modeling: Protein Structure Prediction Other Topics

This document provides a summary of a lecture on protein modeling. It discusses the levels of protein structure from primary to quaternary structure. It also describes common secondary structures like alpha helices and beta strands. While the structure of a protein is determined by its amino acid sequence, computational methods are needed to predict structure due to the large sequence-structure gap. Approaches to protein structure prediction include predictions in 1D, 2D and 3D using techniques like homology modeling, fold recognition, and ab initio prediction. Mass spectrometry is also discussed as a method for analyzing proteins on a large scale.

Uploaded by

uma-chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

231 views76 pages

Protein Modeling: Protein Structure Prediction Other Topics

Uploaded by

uma-chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Lecture 6 Protein Modeling

June 7, 2007

Protein Structure Prediction Other topics

Protein Architecture
proteins are polymers consisting of amino acids linked by peptide bonds each amino acid consists of a central carbon atom an amino group NH 2 a carboxyl group COOH a side chain differences in side chains distinguish different amino acids

Peptide Bonds
amino group side chain carboxyl group

carbon (common reference point for coordinates of a structure)

Amino Acid Side Chains

side chains vary in: shape, size, charge, polarity

Levels of Description
protein structure is often described at four different scales primary structure secondary structure tertiary structure quaternary structure

Levels of Description

Secondary Structure
secondary structure refers to certain common repeating structures it is a local description of structure two common secondary structures helices strands/sheets a third category, called coil or loop, refers to everything else

Helices
carbon

individual amino acid hydrogen bond

Sheets

Ribbon Diagram Showing Secondary Structures

The Protein Folding Problem

we know that the function of a protein is determined in large part by its 3D shape (fold, conformation) can we predict the 3D shape of a protein given only its amino-acid sequence? in general NO, current methods cannot do this accurately but the methods can often provide a partial description of the 3D structure, which is often helpful

Motivation
Want to identify the function of genes we find, and what different mutations/alleles do One gene = one protein (sort of)
Function of protein = function of gene

Function can be determined in many ways

Gene expression, knockouts, etc

But these take time, and are prone to mistakes Goal: If we can structure every protein, learning their functions isnt too far away

Thornton et al 2000 (Nature)

Similar problems
Straight up 3D prediction hard (Nobel awaits) Subproblem 1: Identify patterns in sequence
Profile HMMs, multiple sequence alignments

Subproblem 2: Identify common motifs

Various methods

Subproblem 3: Identify classes of proteins

SCOP

Subproblem 4: Identify homologs

BLAST

http://www.ludwig.edu.au/course/course2002/

What Determines Conformation?

in general, the amino-acid sequence of a protein determines the 3D shape of a protein [Anfinsen et al., 1950s] but some exceptions all proteins can be denatured some proteins are inherently disordered (i.e. lack a regular structure) some proteins get folding help from chaperones there are various mechanisms through which the conformation of a protein can be changed in vivo post-translational modifications such as phosphorylation prions etc.

What Determines Conformation?

what physical properties of the protein determine its fold? rigidity of the protein backbone interactions among amino acids, including electrostatic interactions van der Waals forces volume constraints hydrogen, disulfide bonds interactions of amino acids with water

Determining Protein Structures

protein structures can be determined experimentally (in many cases) by x-ray crystallography nuclear magnetic resonance (NMR)

DNA

Picture by Anthony North

Myoglobin

From www.inst.bnl.gov/GasDetectorLab/x-rays/SRI94.htm

Myoglobin

S.E.V. Phillips. "Structure and refinement of oxymyoglobin at 1.6 resolution.", J. Mol. Biol. 1980, 142, 531.

NMR
Nuclear Magnetic Resonance Spectroscopy Cannot handle large proteins like X-ray Exploits the chemical environment to return distances between atoms
Can use knowledge of restraints to identify positions of atoms that produce peaks

Protein structure determination in solution by NMR spectroscopy Wuthrich K. J Biol Chem. 1990 December 25;265(36):22059-62

Experimental Methods
Very expensive and time-consuming
Computational methods can help with time (Frank DiMaio)

Many proteins still cannot be done in this manner

More motivation
there is a large sequence-structure gap 158K protein sequences in SwissProt database 27K protein structures in PDB database key question: can we predict structures by computational means instead?

Approaches to Protein Structure Prediction

prediction in 1D secondary structure solvent accessibility (which residues are exposed to water, which are buried) transmembrane helices (which residues span membranes) prediction in 2D inter-residue/strand contacts prediction in 3D homology modeling fold recognition (e.g. via threading) ab initio prediction (e.g. via molecular dynamics)

Prediction in 1D, 2D and 3D

predicted secondary structure and solvent accessibility

known secondary structure (E = beta strand) and solvent accessibility

Figure from B. Rost, Protein Structure in 1D, 2D, and 3D, The Encyclopaedia of Computational Chemistry, 1998

2D Prediction Approaches
use secondary structure predictions to predict short-range contacts (e.g. hydrogen bonds in helices)

use secondary structure predictions to predict strand alignments

use correlated mutations to predict contacts

Prediction in 3D
homology modeling given: a query sequence Q, a database of protein structures do: find protein P such that structure of P is known P has high sequence similarity to Q return Ps structure as an approximation to Qs structure fold recognition given: a query sequence Q, a database of known folds do: find fold F such that Q can be aligned with F in a highly compatible manner return F as an approximation to Qs structure ab initio prediction given: a query sequence Q (assuming no similar sequence or fold is known) do: return a predicted structure S for Q

Homology Modeling
most pairs of proteins with similar structure are remote homologs (< 25% sequence identity) homology modeling usually doesnt work for remote homologs ; most pairs of proteins with < 25% sequence identity are unrelated

probably unrelated

remote homologs

homologs

20%

30%

100%

pairwise sequence identity

Threading
Form of fold recognition

prediction.ppt
From ai.stanford.edu/~serafim/CS262_2006/Slides/

Proteomics
Microarrays are useful primarily because mRNA concentrations serve as surrogate for protein concentrations Like to measure protein concentrations directly, but at present cannot do so in same high-throughput manner Proteins do not have obvious direct complements Could build molecules that bind, but binding greatly affected by protein structure

Time-of-Flight (TOF) Mass Spectrometry (thanks Sean McIlwain)

Detector Measures the time for an ionized particle, starting from the sample plate, to hit the detector Laser

Sample +V

Time-of-Flight (TOF) Mass Spectrometry 2

Matrix-Assisted Laser Desorption-Ionization (MALDI) Crystalloid structures made using proton-rich matrix molecule Hitting crystalloid with laser causes molecules to ionize and fly towards Sample +V detector

Detector Laser

Time-of-Flight Demonstration 0

Sample Plate

Time-of-Flight Demonstration 1
Matrix Molecules

Time-of-Flight Demonstration 2

Protein Molecules

Time-of-Flight Demonstration 3
Laser Detector

+10KV

Positive Charge

Time-of-Flight Demonstration 4

Laser pulsed directly onto sample

Proton kicked off matrix molecule onto another molecule

+10KV

Time-of-Flight Demonstration 5

Lots of protons kicked off matrix ions, giving rise to more positively charged molecules

+ +

+10KV

Time-of-Flight Demonstration 6
The high positive potential under sample plate, causes positively charged molecules to accelerate towards detector
+ + + + +

+10KV

Time-of-Flight Demonstration 7
+ + + + +

Smaller mass molecules hit detector first, while heavier ones detected later

+10Kv

Time-of-Flight Demonstration 8
+ + + + +

The incident time measured from when laser is pulsed until molecule hits detector

+10KV

Time-of-Flight Demonstration 9
+ + + + + +

Experiment repeated a number of times, counting frequencies of flight-times

+10KV

Example Spectra from a Competition by Lin et al. at Duke

These are different fractions from the same sample.

Intensity

M/Z

Trypsin-Treated Spectra

Frequency

M/Z

Many Challenges Raised by Mass Spectrometry Data

Noise: extra peaks from handling of sample, from machine and environment (electrical noise), etc. M/Z values may not align exactly across spectra (resolution ~0.1%) Intensities not calibrated across spectra: quantification is difficult Cannot get all proteins typically only several hundred. To improve odds of getting the ones we want, may fractionate our sample by 2D gel electrophoresis or liquid chromatography.

Challenges (Continued)
Better results if partially digest proteins (break into smaller peptides) first Can be difficult to determine what proteins we have from spectrum Isotopic peaks: C13 and N15 atoms in varying numbers cause multiple peaks for a single peptide

Handling Noise: Peak Picking

Want to pick peaks that are statistically significant from the noise signal
Want to use these as features in our learning algorithms.

Many Supervised Learning Tasks

Learn to predict proteins from spectra, when the organisms proteome is known Learn to identify isotopic distributions Learn to predict disease from either proteins, peaks or isotopic distributions as features Construct pathway models

Using Mass Spectrometry for Early Detection of Ovarian Cancer [Petricoin to al., early, often Ovarian cancer difficult et detect2002]
leading to poor prognosis Trained and tested on mass spectra from blood serum 100 training cases, 50 with cancer Held-out test set of 116 cases, 50 with cancer 100% sensitivity, 95% specificity (63/66) on heldout test set

Not So Fast
Data mining methodology seems sound But Keith Baggerly argues that cancer samples were handled differently than normal samples, and perhaps data were preprocessed differently too If we run cancer samples Monday and normals Wednesday, could get differences from machine breakdown or nearby electrical equipment thats running on Monday but not Wed Lesson: tell collaborators they must randomize samples for the entire processing phase and of course all our preprocessing must be same Debate is still raging results not replicated in trials

Other Proteomics: Interactions

Figure from Ideker et al., Science 292(5518):929-934, 2001

each node represents a gene product (protein) blue edges show direct protein-protein interactions yellow edges show interactions in which one protein binds to DNA and affects the expression of another

Protein-Protein Interactions
Yeast 2-Hybrid Immunoprecipitation
Antibodies (immuno) are made by combinatorial combinations of certain proteins Millions of antibodies can be made, to recognize a wide variety of different antigens (invaders), often by recognizing specific proteins
antibody protein

Protein-Protein Interactions

Immunoprecipitation
antibody

Co-Immunoprecipitation
antibody

Many Supervised Learning Tasks

Learn to predict protein-protein interactions: protein 3D structures may be critical Use protein-protein interactions in construction of pathway models Learn to predict protein function from interaction data

ChIP-Chip Data
Immunoprecipitation can also be done to identify proteins interacting with DNA rather than other proteins Chromatin immunoprecipitation (ChIP): grab sample of DNA bound to a particular protein (transcription factor) ChIP-Chip: run this sample of DNA on a microarray to see which DNA was bound Example of analysis of such new data: Keles et al., 2006

Metabolomics
Measures concentration of each low-molecular weight molecule in sample These typically are metabolites, or small molecules produced or consumed by reactions in biochemical pathways These reactions typically catalyzed by proteins (specifically, enzymes) This data typically also mass spectrometry, though could also be NMR

Lipomics
Analogous to metabolomics, but measuring concentrations of lipids rather than metabolites Potentially help induce biochemical pathway information or to help disease diagnosis or treatment choice

To Design a Drug:
Identify Target Protein Determine Target Site Structure Synthesize a Molecule that Will Bind Knowledge of proteome/genome Relevant biochemical pathways Crystallography, NMR Difficult if Membrane-Bound

Imperfect modeling of structure Structures may change at binding And even then

Molecule Binds Target But May:

Bind too tightly or not tightly enough. Be toxic. Have other effects (side-effects) in the body. Break down as soon as it gets into the body, or may not leave the body soon enough. It may not get to where it should in the body (e.g., crossing blood-brain barrier). Not diffuse from gut to bloodstream.

And Every Body is Different:

Even if a molecule works in the test tube and works in animal studies, it may not work in people (will fail in clinical trials). A molecule may work for some people but not others. A molecule may cause harmful side-effects in some people but not others.

Typical Practice when Target Structure is Unknown

High-Throughput Screening (HTS): Test many molecules (1,000,000) to find some that bind to target (ligands). Infer (induce) shape of target site from 3D structural similarities. Shared 3D substructure is called a pharmacophore. Perfect example of a machine learning task with spatial target.

An Example of Structure Learning

Inactive

Active

Common Data Mining Approaches

Represent a molecule by thousands to millions of features and use standard techniques (e.g., KDD Cup 2001) Represent each low-energy conformer by feature vector and use multiple-instance learning (e.g., Jain et al., 1998) Relational learning
Inductive logic programming (e.g., Finn et al., 1998) Graph mining

Supervised Learning Task

Given: a set of molecules, each labeled by activity -- binding affinity for target protein -- and a set of low-energy conformers for each molecule Do: Learn a model that accurately predicts activity (may be Boolean or real-valued)

Clinical Databases of the Future (Dramatically Simplified)

PatientID Gender Birthdate P1 M 3/22/63 PatientID Date P1 P1 1/1/01 2/1/03 Physician Symptoms Smith Jones Diagnosis palpitations hypoglycemic fever, aches influenza

PatientID Date P1 P1

Lab Test

Result 42 45

PatientID SNP1 SNP2 SNP500K P1 P2 AA AB AB BB Dose 10mg BB AA Duration 3 months

1/1/01 blood glucose 1/9/01 blood glucose

PatientID Date Prescribed Date Filled Physician Medication P1 5/17/98 5/18/98 Jones prilosec

Final Wrap-up
Molecular biology collecting lots and lots of data in post-genome era Opportunity to connect molecular-level information to diseases and treatment Need analysis tools to interpret Data mining opportunities abound Hopefully this tutorial provided solid start toward applying data mining to high-throughput biological data

Lecture 12 (Structural Bioinformatics)
No ratings yet
Lecture 12 (Structural Bioinformatics)
30 pages
Bioinfo - S1 2021 - L9 - Protein Structure - 1 Slide
No ratings yet
Bioinfo - S1 2021 - L9 - Protein Structure - 1 Slide
87 pages
Generation of 3D Structure of Protein
No ratings yet
Generation of 3D Structure of Protein
11 pages
Protein Structure: Predictive Methods and Experimental Methodologies
No ratings yet
Protein Structure: Predictive Methods and Experimental Methodologies
33 pages
3D Structure Prediction
No ratings yet
3D Structure Prediction
18 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
Protein Engineering
No ratings yet
Protein Engineering
45 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
13 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
17 pages
Week 10
No ratings yet
Week 10
21 pages
SSRN 4541252
No ratings yet
SSRN 4541252
25 pages
Protein Structure and Function
No ratings yet
Protein Structure and Function
52 pages
Hydrophobic Residue Patterning in - Strands and Implications For - Sheet Nucleation
No ratings yet
Hydrophobic Residue Patterning in - Strands and Implications For - Sheet Nucleation
124 pages
Protein Structure
No ratings yet
Protein Structure
52 pages
Protein Folds and Structure
No ratings yet
Protein Folds and Structure
19 pages
Lecture 03 Protein Sequence Analysis
No ratings yet
Lecture 03 Protein Sequence Analysis
69 pages
Tertiary Structure Prediction Methods: Any Given Protein Sequence
No ratings yet
Tertiary Structure Prediction Methods: Any Given Protein Sequence
29 pages
Lecture 5 Molecular Modelling
No ratings yet
Lecture 5 Molecular Modelling
13 pages
Tramontano A. - Protein Structure Prediction 2007 - t1v3
No ratings yet
Tramontano A. - Protein Structure Prediction 2007 - t1v3
46 pages
CS273 - Protein Structure Prediction
No ratings yet
CS273 - Protein Structure Prediction
39 pages
Protein Structure & Sequencing Techniques
No ratings yet
Protein Structure & Sequencing Techniques
30 pages
Second Done w12 13 Protein Structure and Fold Prediction
No ratings yet
Second Done w12 13 Protein Structure and Fold Prediction
62 pages
Protein Structure Prediction: Faruk Berat Akcesme
No ratings yet
Protein Structure Prediction: Faruk Berat Akcesme
44 pages
Ijms 25 08426
No ratings yet
Ijms 25 08426
21 pages
Structural Bioinformatics
No ratings yet
Structural Bioinformatics
23 pages
Protein 3d
No ratings yet
Protein 3d
86 pages
Bioinformatics for Science Enthusiasts
No ratings yet
Bioinformatics for Science Enthusiasts
10 pages
Gene Pridiction and Orf
No ratings yet
Gene Pridiction and Orf
34 pages
Protein Folding
No ratings yet
Protein Folding
21 pages
Analytical Molecular Biology Unlimited Download
100% (18)
Analytical Molecular Biology Unlimited Download
16 pages
Sloid Phase Peptide
No ratings yet
Sloid Phase Peptide
97 pages
Genome Sequencing Projects: Increase in The Number of Protein Sequences
No ratings yet
Genome Sequencing Projects: Increase in The Number of Protein Sequences
27 pages
Module 5 Notes
No ratings yet
Module 5 Notes
151 pages
Pre-Assessment Questions
No ratings yet
Pre-Assessment Questions
18 pages
Template Based Protein Structure Modeling
No ratings yet
Template Based Protein Structure Modeling
98 pages
Protein Structure Prediction Methods
No ratings yet
Protein Structure Prediction Methods
38 pages
Ieee Review
No ratings yet
Ieee Review
9 pages
An Introduction To Proteomics: The Protein Complement of The Genome
No ratings yet
An Introduction To Proteomics: The Protein Complement of The Genome
40 pages
Protein Structure Determination: Bookmark This Page
No ratings yet
Protein Structure Determination: Bookmark This Page
25 pages
Protein Tertiaty Structure Prediction
No ratings yet
Protein Tertiaty Structure Prediction
12 pages
Peptides and Proteins: M.Prasad Naidu MSC Medical Biochemistry, PH.D
No ratings yet
Peptides and Proteins: M.Prasad Naidu MSC Medical Biochemistry, PH.D
30 pages
AI in Protein Structure Prediction
No ratings yet
AI in Protein Structure Prediction
47 pages
Bif 401 100% Solved Final Term Paper by Sulman Ali
No ratings yet
Bif 401 100% Solved Final Term Paper by Sulman Ali
5 pages
Protein Structure Prediction Guide
No ratings yet
Protein Structure Prediction Guide
53 pages
Protein STR
No ratings yet
Protein STR
63 pages
3.7 Protein Structure Prediction and Classification
No ratings yet
3.7 Protein Structure Prediction and Classification
20 pages
Transcriptomics: Shivangi Asthana B.Sc. Biotech
No ratings yet
Transcriptomics: Shivangi Asthana B.Sc. Biotech
22 pages
Protein Functions
No ratings yet
Protein Functions
28 pages
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
Structural Bioinformatics and Protein Structure Prediction
No ratings yet
Structural Bioinformatics and Protein Structure Prediction
14 pages
Advances in Protein Structure Prediction and Design
No ratings yet
Advances in Protein Structure Prediction and Design
17 pages
Lecture3-Structural Bioinformatics-Secondary Resources
No ratings yet
Lecture3-Structural Bioinformatics-Secondary Resources
26 pages
Bookchapter Proteinstructure
No ratings yet
Bookchapter Proteinstructure
16 pages
Modelling of 3D Str. of Protein
No ratings yet
Modelling of 3D Str. of Protein
4 pages
FALLSEM2024-25 BBIT202L TH VL2024250104080 2024-10-25 Reference-Material-I
No ratings yet
FALLSEM2024-25 BBIT202L TH VL2024250104080 2024-10-25 Reference-Material-I
24 pages
Biochem Notes 13
No ratings yet
Biochem Notes 13
8 pages
Proteins Vicens Figueres Julia
No ratings yet
Proteins Vicens Figueres Julia
6 pages
Ab Initio
No ratings yet
Ab Initio
9 pages
PShapiro Kinase Regulation 08
No ratings yet
PShapiro Kinase Regulation 08
29 pages
Pattern Matching With Regular Expressions: Perl For Biologists
No ratings yet
Pattern Matching With Regular Expressions: Perl For Biologists
11 pages
Pattern Matching With Regular Expressions: Perl For Biologists
No ratings yet
Pattern Matching With Regular Expressions: Perl For Biologists
11 pages
In-Depth cDNA Library Sequencing Provides Quantitative Gene Expression Prof Iling in Cancer Biomarker Discovery
No ratings yet
In-Depth cDNA Library Sequencing Provides Quantitative Gene Expression Prof Iling in Cancer Biomarker Discovery
12 pages
Testing of Packaging Materials
100% (2)
Testing of Packaging Materials
72 pages
Technical Specification: Submersible Pump B 2066, 60 HZ
No ratings yet
Technical Specification: Submersible Pump B 2066, 60 HZ
8 pages
ALiterature Reviewof Hydroponic Crop Cultivation Research
No ratings yet
ALiterature Reviewof Hydroponic Crop Cultivation Research
18 pages
Paula's Choice - Google Search
No ratings yet
Paula's Choice - Google Search
1 page
Duke Steam System Design Guide
No ratings yet
Duke Steam System Design Guide
4 pages
GSB WMM DBM BC
No ratings yet
GSB WMM DBM BC
64 pages
Eosin Methylene Blue Agar
No ratings yet
Eosin Methylene Blue Agar
3 pages
Iso 2811 1 2023
No ratings yet
Iso 2811 1 2023
10 pages
Himalaya Health Care
No ratings yet
Himalaya Health Care
18 pages
Safety Data Sheet NALCON™ 7649: Section: 1. Product and Company Identification
No ratings yet
Safety Data Sheet NALCON™ 7649: Section: 1. Product and Company Identification
12 pages
Neutron Capture Process Chart
No ratings yet
Neutron Capture Process Chart
1 page
S-000-13A0-0004V - 1 - Specification For Painting (1 80)
0% (1)
S-000-13A0-0004V - 1 - Specification For Painting (1 80)
80 pages
Duron HP 15W-40: Safety Data Sheet
No ratings yet
Duron HP 15W-40: Safety Data Sheet
11 pages
Cereal Processing for Animal Feed
No ratings yet
Cereal Processing for Animal Feed
27 pages
Thesis
No ratings yet
Thesis
126 pages
Acid and Base Experiment
No ratings yet
Acid and Base Experiment
3 pages
Case Study NRL Fire
No ratings yet
Case Study NRL Fire
5 pages
Materials Chemistry A: Journal of
No ratings yet
Materials Chemistry A: Journal of
9 pages
ClassNK Instruction - FFA - E80e - Panama
No ratings yet
ClassNK Instruction - FFA - E80e - Panama
14 pages
Cambridge O Level: Biology 5090/11
No ratings yet
Cambridge O Level: Biology 5090/11
24 pages
Anti-Doping Insights for Athletes
No ratings yet
Anti-Doping Insights for Athletes
4 pages
Rocket Propulsion Basics
No ratings yet
Rocket Propulsion Basics
19 pages
Pds Hempadur Zinc 17360 En-Gb
No ratings yet
Pds Hempadur Zinc 17360 En-Gb
2 pages
2 Soil Fertility and Management
No ratings yet
2 Soil Fertility and Management
38 pages
Drug Profile
No ratings yet
Drug Profile
5 pages
A Numerical Investigation Into The Cooling Curves of Stainless Steel Porous Materials For The Quenching Process
No ratings yet
A Numerical Investigation Into The Cooling Curves of Stainless Steel Porous Materials For The Quenching Process
16 pages
Titration Lab Report
71% (21)
Titration Lab Report
5 pages
Rbi 2000
No ratings yet
Rbi 2000
40 pages
Batch Distillation of Water-Methanol System
50% (4)
Batch Distillation of Water-Methanol System
78 pages
Bread Dough and Baker's Yeast: An Uplifting Synergy
No ratings yet
Bread Dough and Baker's Yeast: An Uplifting Synergy
18 pages

Protein Modeling: Protein Structure Prediction Other Topics

Uploaded by

Protein Modeling: Protein Structure Prediction Other Topics

Uploaded by

Lecture 6 Protein Modeling

Protein Structure Prediction Other topics

carbon (common reference point for coordinates of a structure)

Amino Acid Side Chains

individual amino acid hydrogen bond

Ribbon Diagram Showing Secondary Structures

The Protein Folding Problem

Function can be determined in many ways

Thornton et al 2000 (Nature)

Subproblem 2: Identify common motifs

Subproblem 3: Identify classes of proteins

Subproblem 4: Identify homologs

What Determines Conformation?

What Determines Conformation?

Determining Protein Structures

Picture by Anthony North

Many proteins still cannot be done in this manner

Approaches to Protein Structure Prediction

Prediction in 1D, 2D and 3D

known secondary structure (E = beta strand) and solvent accessibility

use secondary structure predictions to predict strand alignments

use correlated mutations to predict contacts

pairwise sequence identity

Time-of-Flight (TOF) Mass Spectrometry (thanks Sean McIlwain)

Time-of-Flight (TOF) Mass Spectrometry 2

Laser pulsed directly onto sample

Proton kicked off matrix molecule onto another molecule

Experiment repeated a number of times, counting frequencies of flight-times

Example Spectra from a Competition by Lin et al. at Duke

Many Challenges Raised by Mass Spectrometry Data

Handling Noise: Peak Picking

Many Supervised Learning Tasks

Other Proteomics: Interactions

Figure from Ideker et al., Science 292(5518):929-934, 2001

Many Supervised Learning Tasks

Molecule Binds Target But May:

And Every Body is Different:

Typical Practice when Target Structure is Unknown

An Example of Structure Learning

Common Data Mining Approaches

Supervised Learning Task

Clinical Databases of the Future (Dramatically Simplified)

PatientID SNP1 SNP2 SNP500K P1 P2 AA AB AB BB Dose 10mg BB AA Duration 3 months

1/1/01 blood glucose 1/9/01 blood glucose

You might also like