An Introduction to ROC Curve (Receiver Operating Characteristics)
Ming-Chang Lee Department of Information Management Yu Da College of Business
1/16
Outline
1. 2. 3. 4. 5.
Introduction Create an ROC curve Area Under an ROC Curve (AUC) R-package ROCR demo References
2/16
1. Introduction
History : Signal detection theory hit rates and false alarm rates Development:
Diagnostic system Medical decision making Machine learning
3/16
Classier Performance
Problem: two classes classification
Classification model Input (instance, I )
Actual class Actual class {p n} { p ,, n }
PS: actual class {p: positive class, n: negative class}
Predicted class classified {Y,N ) (instance, I }
4/16
Confusion matrix (Contingency table)
Given a classifier and an instance:
Classifier TRUE CLASS
Predicted class
p (positive)
True Positives False Negatives P
n (negative)
False Positives True Negatives N
Y N
Total
P = True Positives + False Negatives
5/16
Performance index
TP FN FP TN
TP FP TPR = = Recall , FPR = P N TP TP + TN Precision = , Accuracy = TP + FP P+N Sensitivity = Recall , Specificity = 1 FPR
6/16
ROC curve
Y axis: TPR X axis: FPR
(0,1)
Benefits (TP) Costs (FP)
(1,1)
(0,0)
(1,0)
7/16
Compare ROC curve
TP FN
FP TN
y=x
(0,0) Numbers of P =0, No FP error, No TP (0,1) perfect D classifiers Northwest location is better. Near x axis and on the left side Conservative e.g. A vs. B Near upper right-hand side Liberal Lower Lower Right ? Right ? (?) Triangle Triangle
8/16
2. Create an ROC curve
A ranking or scoring classifier can be used with a threshold to produce a binary classifier. If the classifier output is above the threshold, the classifier produces a Y, else a N.
9/16
Use thresholds to create ROC curve
(0.1,0.5)
If threshold =0.54 Numbers of Score 0.54
5 5 10
1 9 10
6 14 20
1 x : = 0.1 10 5 y : = 0.5 10
10/16
f(i) : the probabilistic classifier's estimate that instance i is positive;
min and max, the smallest and largest values returned by f; increment : the smallest difference between any two f values.
L Inputs: the set of test instances;
Conceptual Algorithm
11/16
Practical Algorithm
12/16
3. Area Under an ROC Curve (AUC)
AUC (Bradley, 1997) Wilcoxon test of ranks Area : Classifier B > A Average performance B>A
13/16
4. R demo ROCR package
package : ROCR plot ROC curve plot SVM vs. Neural Network
14/16
4. References
1.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition, 30 (7), 1145-1159. Fawcett, T. (2003) ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, HP Laboratories technical report. Witten, I.H. and Frank, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Morgan Kaufmann. The magnificent ROC: http://www.anaesthetist.com/mnm/stats/roc/
15/16
2.
3.
4.
THANKS
Q&A
Web: http://web.ydu.edu.tw/~alan9956/ Email: alan9956@webmail.ydu.edu.tw
16/16