Proteomics &
Mass Spectrometry
Nathan Edwards
Center for Bioinformatics and Computational Biology
Outline
• Proteomics
• Mass Spectrometry
• Protein Identification
• Peptide Mass Fingerprint
• Tandem Mass Spectrometry
2
Proteomics
• Proteins are the machines that drive
much of biology
• Genes are merely the recipe
• The direct characterization of a
sample’s proteins en masse.
• What proteins are present?
• How much of each protein is present?
3
Systems Biology
• Establish relationships by
• Choosing related samples,
• Global characterization, and
• Comparison.
Gene / Transcript / Protein
Measurement Predetermined Unknown
Discrete (DNA) Genotyping Sequencing
Continuous Gene Expression Proteomics
4
Samples
• Healthy / Diseased
• Cancerous / Benign
• Drug resistant / Drug susceptible
• Bound / Unbound
• Tissue specific
• Cellular location specific
• Mitochondria, Membrane
5
2D Gel-Electrophoresis
• Protein separation
• Molecular weight (MW)
• Isoelectric point (pI)
• Staining
• Birds-eye view of
protein abundance
6
2D Gel-Electrophoresis
Bécamel et al., Biol. Proced. Online 2002;4:94-104.
7
Paradigm Shift
• Traditional protein chemistry assay
methods struggle to establish identity.
• Identity requires:
• Specificity of measurement (Precision)
• Mass spectrometry
• A reference for comparison
(Measurement → Identity)
• Protein sequence databases
8
Mass Spectrometer
Sample
+
_
Ionizer Mass Analyzer Detector
• MALDI • Time-Of-Flight (TOF) • Electron
• Electro-Spray • Quadrapole Multiplier
Ionization (ESI) • Ion-Trap (EM)
9
Mass Spectrometer
(MALDI-TOF)
UV (337 nm) Microchannel
Field-free drift zone
Source plate detector
Pulse
voltage
Analyte/
Ed = 0
matrix
Length = D
Length = s
Backing plate
(grounded) Extraction grid
(source voltage -Vs) Detector grid -Vs
10
Mass Spectrum
11
Mass is fundamental
12
Peptide Mass Fingerprint
Cut out
2D-Gel
Spot
13
Peptide Mass Fingerprint
Trypsin Digest
14
Peptide Mass Fingerprint
MS
15
Peptide Mass Fingerprint
16
Peptide Mass Fingerprint
• Trypsin: digestion enzyme
• Highly specific
• Cuts after K & R except if followed by P
• Protein sequence from sequence database
• In silico digest
• Mass computation
• For each protein sequence in turn:
• Compare computer generated masses with
observed spectrum
17
Protein Sequence
• Myoglobin - Plains zebra
GLSDGEWQQV LNVWGKVEAD IAGHGQEVLI
RLFTGHPETL EKFDKFKHLK TEAEMKASED
LKKHGTVVLT ALGGILKKKG HHEAELKPLA
QSHATKHKIP IKYLEFISDA IIHVLHSKHP
GDFGADAQGA MTKALELFRN DIAAKYKELG
FQG
18
Protein Sequence
• Myoglobin - Plains zebra
GLSDGEWQQV LNVWGKVEAD IAGHGQEVLI
RLFTGHPETL EKFDKFKHLK TEAEMKASED
LKKHGTVVLT ALGGILKKKG HHEAELKPLA
QSHATKHKIP IKYLEFISDA IIHVLHSKHP
GDFGADAQGA MTKALELFRN DIAAKYKELG
FQG
19
Peptide Masses
1811.90 GLSDGEWQQVLNVWGK
1606.85 VEADIAGHGQEVLIR
1271.66 LFTGHPETLEK
1378.83 HGTVVLTALGGILK
1982.05 KGHHEAELKPLAQSHATK
1853.95 GHHEAELKPLAQSHATK
1884.01 YLEFISDAIIHVLHSK
1502.66 HPGDFGADAQGAMTK
748.43 ALELFR
20
ALELFR
LFTGHPETLEK
21
HGTVVLTALGGILK
HPGDFGADAQGAMTK
VEADIAGHGQEVLIR
Peptide Mass Fingerprint
GLSDGEWQQVLNVWGK
GHHEAELKPLAQSHATK
YLEFISDAIIHVLHSK
KGHHEAELKPLAQSHATK
Mass Spectrometry
• Strengths
• Precise molecular weight
• Fragmentation
• Automated
• Weaknesses
• Best for a few molecules at a time
• Best for small molecules
• Mass-to-charge ratio, not mass
• Intensity ≠ Abundance
22
Sample Preparation for
MS/MS
Enzymatic Digest
and
Fractionation
23
Single Stage MS
MS
24
Tandem Mass Spectrometry
(MS/MS)
Precursor selection
25
Tandem Mass Spectrometry
(MS/MS)
Precursor selection +
collision induced dissociation
(CID)
MS/MS
26
Peptide Fragmentation
Peptides consist of amino-acids
N-terminus arranged in a linear backbone.
H…-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
C-terminus
AA residuei-1 AA residuei AA residuei+1
27
Peptide Fragmentation
28
Peptide Fragmentation
yn-i
yn-i-1
-HN-CH-CO-NH-CH-CO-NH-
Ri CH-R’
i+1
bi R”
i+1
bi+1
29
Peptide Fragmentation
Peptide: S-G-F-L-E-E-D-E-L-K
MW ion ion MW
88 b1 S GFLEEDELK y9 1080
145 b2 SG FLEEDELK y8 1022
292 b3 SGF LEEDELK y7 875
405 b4 SGFL EEDELK y6 762
534 b5 SGFLE EDELK y5 633
663 b6 SGFLEE DELK y4 504
778 b7 SGFLEED ELK y3 389
907 b8 SGFLEEDE 30
LK y2 260
Peptide Fragmentation
88 145 292 405 534 663 778 907 1020 1166 b ions
S G F L E E D E L K
1166 1080 1022 875 762 633 504 389 260 147 y ions
100
% Intensity
0 m/z
250 500 750 1000
31
Peptide Fragmentation
88 145 292 405 534 663 778 907 1020 1166 b ions
S G F L E E D E L K
1166 1080 1022 875 762 633 504 389 260 147 y ions
y6
100
y7
% Intensity
y5
b3
b4
y2 y3 y4 b5 b6 b8 y
y9
b7 b9 8
0 m/z
250 500 750 1000
32
Peptide Identification
Given:
• The mass of the precursor ion, and
• The MS/MS spectrum
Output:
• The amino-acid sequence of the peptide
33
Peptide Identification
Two paradigms:
• De novo interpretation
• Sequence database search
34
De Novo Interpretation
100
% Intensity
0 m/z
250 500 750 1000
35
De Novo Interpretation
100
% Intensity
E L
0 m/z
250 500 750 1000
36
De Novo Interpretation
100
% Intensity
SGF L E E L F G
E
KL E D E D E L
0 m/z
250 500 750 1000
37
De Novo Interpretation
Amino-Acid Residual MW Amino-Acid Residual MW
A Alanine 71.03712 M Methionine 131.04049
C Cysteine 103.00919 N Asparagine 114.04293
D Aspartic acid 115.02695 P Proline 97.05277
E Glutamic acid 129.04260 Q Glutamine 128.05858
F Phenylalanine 147.06842 R Arginine 156.10112
G Glycine 57.02147 S Serine 87.03203
H Histidine 137.05891 T Threonine 101.04768
I Isoleucine 113.08407 V Valine 99.06842
K Lysine 128.09497 W Tryptophan 186.07932
L Leucine 113.08407 Y Tyrosine 163.06333
38
De Novo Interpretation
…from Lu and Chen (2003), JCB 10:1
39
De Novo Interpretation
40
De Novo Interpretation
…from Lu and Chen (2003), JCB 10:1
41
De Novo Interpretation
• Find good paths in spectrum graph
• Can’t use same peak twice
• Simple peptide fragmentation model
• Usually many apparently good solutions
• Amino-acids have duplicate masses!
• “Best” de novo interpretation may have no
biological relevance
• Identifies relatively few peptides in high-
throughput workflows
42
Sequence Database
Search
• Compares peptides from a protein
sequence database with spectra
• Filter peptide candidates by
• Precursor mass
• Digest motif
• Score each peptide against spectrum
• Generate all possible peptide fragments
• Match putative fragments with peaks
• Score and rank
43
Peptide Fragmentation
S G F L E E D E L K
100
% Intensity
0 m/z
250 500 750 1000
44
Peptide Fragmentation
88 145 292 405 534 663 778 907 1020 1166 b ions
S G F L E E D E L K
1166 1080 1022 875 762 633 504 389 260 147 y ions
100
% Intensity
0 m/z
250 500 750 1000
45
Peptide Fragmentation
88 145 292 405 534 663 778 907 1020 1166 b ions
S G F L E E D E L K
1166 1080 1022 875 762 633 504 389 260 147 y ions
y6
100
y7
% Intensity
y5
b3
b4
y2 y3 y4 b5 b6 b8 y
y9
b7 b9 8
0 m/z
250 500 750 1000
46
Sequence Database Search
• Sequence fills in gaps in the spectrum
• All candidates have biological relevance
• Practical for high-throughput peptide
identification
• Correct peptide might be missing from
database!
47
Peptide Candidate
Filtering
Digestion Enzyme: Trypsin
• Cuts just after K or R unless followed
by a P.
• Must allow for “missed” cleavage sites
• “Average” peptide length about 10-15
amino-acids
48
Peptide Candidate
Filtering
>ALBU_HUMAN
MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFE
DHVKLVNEVTEFAK…
No missed cleavage sites
MK
WVTFISLLFLFSSAYSR
GVFR
R
DAHK
SEVAHR
FK
DLGEENFK
ALVLIAFAQYLQQCPFEDHVK
LVNEVTEFAK49
Peptide Candidate
Filtering
>ALBU_HUMAN
MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFE
DHVKLVNEVTEFAK…
One missed cleavage site
MKWVTFISLLFLFSSAYSR
WVTFISLLFLFSSAYSRGVFR
GVFRR
RDAHK
DAHKSEVAHR
SEVAHRFK
FKDLGEENFK
DLGEENFKALVLIAFAQYLQQCPFEDHVK
ALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK
… 50
Peptide Scoring
• Peptide fragments vary based on
• The instrument
• The peptide’s amino-acid sequence
• The peptide’s charge state
• Etc…
• Search engines model peptide
fragmentation to various degrees.
• Speed vs. sensitivity tradeoff
• y-ions & b-ions occur most frequently
51
Mascot Search Engine
52
Mascot MS/MS Ions
Search
53
Mascot MS/MS Search
Results
54
Mascot MS/MS Search
Results
55
Mascot MS/MS Search
Results
56
Mascot MS/MS Search
Results
57
Mascot MS/MS Search
Results
58
Mascot MS/MS Search
Results
59
Mascot MS/MS Search
Results
60
Mascot MS/MS Search
Results
61
Mascot MS/MS Search
Results
62
Mascot MS/MS Search
Results
63
Summary
• Protein identification by mass
spectrometry is a key element of
proteomics and systems biology.
• Mass spectrometry + sequence
databases represent a huge leap for
protein (bio-)chemistry.
• Sample prep, instruments and algorithms
still maturing, much work to be done.
64
Further Reading
• Matrix Science (Mascot) Web Site
• www.matrixscience.com
• Seattle Proteome Center (ISB)
• www.proteomecenter.org
• Proteomic Mass Spectrometry Lab at
The Scripps Research Institute
• fields.scripps.edu
• UCSF ProteinProspector
• prospector.ucsf.edu
65