KEMBAR78
Biological Data Science Lecture7 | PDF | Regression Analysis | Support Vector Machine
0% found this document useful (0 votes)
14 views17 pages

Biological Data Science Lecture7

Biological Data Science
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Biological Data Science Lecture7

Biological Data Science
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Dr Athanasios Tsanas (‘Thanasis’)

Associate Prof. in Data Science


Usher Institute, Medical School
University of Edinburgh
Day 1 • Introduction and overview; reminder of basic concepts
Day 2 • Data collection and sampling

Day 3 • Data mining: signal/image processing and information extraction

Day 4 • Data visualization: density estimation, statistical descriptors

Day 5 • Exploratory analysis: hypothesis testing and quantifying relationships

Day 6 • Feature selection and feature transformation

Day 7 • Statistical machine learning and model validation

Day 8 • Statistical machine learning and model validation

Day 9 • Practical examples: bringing things together

Day 10 • Revision and exam preparation


© A. Tsanas, 2020
ECG, EEG Activity Location

Subjects feature1 feature2 ... feature M


P1 3.1 1.3 0.9
P2 3.7 1.0 1.3
X
N P3 2.9 2.6 0.6

PN 1.7 2.0 0.7

M (features or characteristics) © A. Tsanas, 2020


Feature generation Feature selection Statistical
from raw data or transformation mapping

X y
Subjects feature1 feature2 ... feature M result
P1 3.1 1.3 0.9 1
P2 3.7 1.0 1.3 2
N P3 2.9 2.6 0.6 1
… …
PN 1.7 2.0 0.7 3

M (features or characteristics) outcome


 Depending on the problem, “features” can be demographics, genes, …

 y = f (X), f : mechanism X: feature set y: outcome © A. Tsanas, 2020


Exploratory
Data
analysis: Feature Statistical
visualization
hypothesis selection or mapping
(density
testing and transformation (regression/clas
estimation,
statistical (e.g. PCA) sification)
scatter plots)
associations

© A. Tsanas, 2020
 Understanding the setting of statistical
mapping

 Assessing the accuracy of statistical model

 Everything we have done in the course


culminates in today’s two lectures on
statistical mapping
© A. Tsanas, 2020
 Information has been collected and presented in
the form of design matrix X
 Experts typically provide outcome of interest in the
biomedical domain, y
 Having both X & y: determining functional mapping
y = f (X) is known as supervised learning
 When the outcome y is not available, we can still
work in unsupervised learning mode. For example
clustering
© A. Tsanas, 2020
Outcome y • Unsupervised learning
• Visualization
is not • Transformation (e.g. PCA)
available • Clustering (not covered here)

• Supervised learning
Outcome y
is available • Determine functional mapping
strategy: y = f (X)

© A. Tsanas, 2020
Classification Discrete outcome (oftentimes binary)
• Learners f (X) = y: classifiers
• Examples: kNN, Logistic Regression (LR), Naïve Bayes, Support Vector
Machines (SVM), Random Forests (RF)…

Regression Continuous outcome (typically real numbers)


• Learners f (X) = y: regressors
• Examples: Ordinary Least Squares (OLS) regression (linear regression),
Support Vector Machines (SVM), Random Forests (RF)…

© A. Tsanas, 2020
 Determine the functional relationship in a simple
linear model form: 𝑦 = 𝑎 + 𝑏𝑥

 Indicative regression model: Explanatory


variable
𝑈𝑃𝐷𝑅𝑆 = 3 + 8.5 ∙ 𝐽𝑖𝑡𝑡𝑒𝑟
intercept
Coefficient
(or slope

 Coefficient = Unit increase in x => increase in y


© A. Tsanas, 2020
ELEC 𝑈𝑃𝐷𝑅𝑆 = 3 + 8.5 ∙ 𝐽𝑖𝑡𝑡𝑒𝑟
2000
𝑁
Myocardial infarction risk

min ෍ 𝑒𝑖2
1600 𝑖=1

1200 𝑒ด
𝑁
𝑖𝑛𝑑𝑖𝑐𝑎𝑡𝑖𝑣𝑒 𝑒𝑟𝑟𝑜𝑟
Coefficient
800
(or slope
𝑒ถ
132
𝑖𝑛𝑑𝑖𝑐𝑎𝑡𝑖𝑣𝑒 𝑒𝑟𝑟𝑜𝑟 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒 132
intercept 400

0 C76
0 600 1200 1800
X (explanatory variable)
© A. Tsanas, 2020
𝑦 = 𝑎 + 𝑏1 ∙ 𝑥1 + 𝑏2 ∙ 𝑥2 + ⋯ + 𝑏𝑀 ∙ 𝑥𝑀

𝑈𝑃𝐷𝑅𝑆 = 3 + 8.5 ∙ 𝐽𝑖𝑡𝑡𝑒𝑟 − 3.2 ∙ 𝑆ℎ𝑖𝑚𝑚𝑒𝑟 + ⋯

 Expresses how much each variable contributes to


the outcome

 Signs of coefficients express direction of


contribution

© A. Tsanas, 2020
 Many algorithms have been proposed for
regression problems

 This is an area beyond the scope of this


course

 We will now look into classification

© A. Tsanas, 2020
▪ Find the optimal approach to separate the following two types:

▪ Given 𝐱 𝑖 , 𝑦𝑖 𝑖=1…𝑁 , with data samples 𝐱 𝑖 𝜖 ℝM and corresponding


response 𝑦𝑖 = −1, +1

−1, 𝑓 𝐱 𝑖 < 0
▪ Design a classifier 𝑓 𝐱 𝑖 : 𝑦𝑖 = ቊ
+1, 𝑓 𝐱 𝑖 ≥ 0

© A. Tsanas, 2020
p
1

1
p=
1 + e −( +  x ) Linear

Logistics

0 x

 Logistic function taking values in the range [0,1].

Logistic function is a misnomer, it is a classification algorithm!


© A. Tsanas, 2020
 A model was computed as follows:

1
𝑝(𝑑𝑖𝑠𝑐ℎ𝑎𝑟𝑔𝑒) =
1 + 𝑒 −(5+2∙𝑏𝑙𝑜𝑜𝑑_𝑡𝑒𝑠𝑡)

 Find the probability that patient should be


discharged if 𝑏𝑙𝑜𝑜𝑑_𝑡𝑒𝑠𝑡 = 5
2.7182
 Substitute values: 𝑝(𝑑𝑖𝑠𝑐ℎ𝑎𝑟𝑔𝑒) = 0.99
© A. Tsanas, 2020
G. James et al. An introduction to statistical learning
(pages: 15-42, 59-83, 127-138)

https://www-
bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf

OPTIONAL G. James et al. An introduction to statistical


learning (pages: 83-104)

© A. Tsanas, 2020

You might also like