KEMBAR78
churn prediction in telecom | PDF
May, 2015
Bui Van Hong
Email: hongbv@fpt.com.vn
Agenda
Churn prediction in prepaid mobile telecommunication network
Machine Learning
Introduction customer churn
Diagram of possible customer states
Churn prediction Model
Classification accuracy
Machine learning algorithm
Support vector machine
Nearest neighbour machine
Multilayer percenptron neural network machine
Decision tree machine
Native bayes machine
Feature presentation & Demo
List of attributes
Build Training & testing data
Demo
3
Machine Learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from labeled
training data.
unsupervised learning
Supervised learning is the machine learning task of inferring a function from unlabeled
training data.
Semi-supervised learning falls between unsupervised learning (without any labeled
training data)
and supervised learning (with completely labeled training data).
Reinforcement learning
Reinforcement learning is an area of machine learning inspired by behaviorist
psychology
(chess player)
Deep learning
Deep learning (deep machine learning, or deep structured learning, or hierarchical
learning, or sometimes DL) is a branch of machine learning based on a set of
algorithms that attempt to model high-level abstractions in data by using model
architectures, with complex structures or otherwise, composed of multiple non-linear
transformations
4
Introduction
- Customer churn is define as the lose of customers
- Churn in prepaid service is actually measure base on the
lack of activity in the network over a period time, thus
there is no formal notification from customer of ending a
contract term, our goal is to infer when this lack of
activity may happen in the future for each active
customer
- Churn prediction can be viewed as supervised
classification problem where the behavior of previously
know churners an non-churners are used to train a binary
classified
5
Diagram of possible customer states
Each customer can be in one of the following states:
New, active, inactive or churn
6
Churn prediction Model
Training data
Testing data
RealModelFeature
extraction
Requires Expert
Knowledge in Telco
industry
Learning algorithm & Prediction
Independent
(1) Build
(2) Test
(3) Test=OK
(4) Predict
(5) Update
ETL 1
ETL 2
7
Classification Accuracy
Tiêu chí đánh giá độ chính xác của kết quả
Actual class/Predictive class Churned Non-churned
Churned True Positive (TP) False Negative (FN)
Non-churned False Positive (FP) True Negative(TN)
Accuracy=
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
Churn rate =
𝑇𝑃
𝑇𝑃+𝐹𝑃
Agenda
Churn prediction in prepaid mobile telecommunication network
Introduction
Diagram of possible customer states
Churn prediction Model
Classification accuracy
Machine learning algorithm
Support vector machine
Nearest neighbor machine
Multilayer perceptron neural network machine
Decision tree machine
Native Bayes machine
Feature presentation & Demo
List of attributes
Demo-POC
9
Support Vector Machine Learning
Non-linear/Linear SVM
Linear
Functions
( ) T
g b x w x
Nonlinear
Functions
=-1
=+1
10
Nearest Neighbor Learning
k-Nearest Neighbor
Nearest
Neighbor
=-1
=+1
11
Multilayer Perceptron (MLP) Neural Network
Learing
MLP algorithm
1
2
3
x1
x2
=-1
=+1
12
Decision Tree Learning
Decision Tree algorithm
Decision
Tree
=-1
=+1
Y:churn
N: non-churn
13
Naïve Bayes Learning
Naïve Bayes algorithm
Naïve Bayes classification
– Assumption that all input features are conditionally independent!
– MAP classification rule: for
)|()|()|(
)|,,()|(
)|,,(),,,|()|,,,(
21
21
22121
CXPCXPCXP
CXXPCXP
CXXPCXXXPCXXXP
n
n
nnn



Lnn ccccccPcxPcxPcPcxPcxP ,,,),()]|()|([)()]|()|([ 1
*
1
***
1 
),,,( 21 nxxx x
Agenda
Churn prediction in prepaid mobile telecommunication network
Introduction
Diagram of possible customer states
Churn prediction Model
Classification accuracy
Machine learning algorithm
Support vector machine
Nearest neighbor machine
Multilayer perceptron neural network machine
Decision tree machine
Native Bayes machine
Feature presentation & Demo
List of attributes
Logical design
Demo
Evaluation & compartion
15
List of attribute
Attribute Name Description
Number Of Outgoing calls Number Of Outgoing calls
Number Of Messages Sent Number Of Messages Sent
Usage GPRS Subscriber is provided with GPRS service (Yes, No)
Total MOU Minutes of usage ( MoU) per subscriber
Total Number Of Recharges Total Number Of Recharges
Total Credit Revenue Credit Revenue Per subscriber (Voice, SMS, GPRS,
VAS)
Total Bonus Revenue Bonus Revenue Per subscriber (Voice, SMS, GPRS,
VAS)
Customer Relationship Age classifies measures pertaining to Customers
according to the number of years for which the
Customer has had a relationship with the
Telecommunications Services Provider.
Remaining credit Remaining credit per subscriber
Churn Churning customer status (yes, No)
16
Logical Design
Input Description
Data BI system
Feature Domain expert in industry
Algorithm SVM, NB, Decision tree, MLP, k-Nearest
Neighbour
Output Description
Churn/ no-churn Specify churn/no-churn for each subscribers
17
Demo
Sampling for training data
Attribute Description
NUM_OG_CALLS Number of outgoing call for Month -1
SUM_DURATION_OG Minutes of usage ( MoU) per subscriber
NUM_SMO Number Of Messages Sent for Month -1
NUM_DATA_UP Subscriber is provided with GPRS service (Yes, No) for Month -1
NUM_OG_CALLS_1 Number of outgoing call for Month -2
SUM_DURATION_OG_1 Minutes of usage ( MoU) per subscriber for Month -2
NUM_SMO_1 Number Of Messages Sent for Month -2
NUM_DATA_UP_1 Subscriber is provided with GPRS service (Yes, No) for Month -2
Recharge Number of rechard times for Month -1
Recharge_1 Number of rechard times for Month -2
RemainCredit Remain credit in Month -3
RemainCredit_1 Remain credit in Month -2
RemainCredit_2 Remain credit in Month -1
TOTAL_BONUS Revenue bonus of Month -2
TOTAL_CREDIT Revenue credit of Month -2
TOTAL_BONUS_1 Revenue bonus of Month -1
TOTAL_CREDIT_1 Revenue credit of Month -1
CLASS {1,0} Chun/no-churn
Total records: 2876 rows
18
Demo
Sampling for Input data
Attribute Description
NUM_OG_CALLS Number of outgoing call for Month -1
SUM_DURATION_OG Minutes of usage ( MoU) per subscriber
NUM_SMO Number Of Messages Sent for Month -1
NUM_DATA_UP Subscriber is provided with GPRS service (Yes, No) for Month -1
NUM_OG_CALLS_1 Number of outgoing call for Month -2
SUM_DURATION_OG_1 Minutes of usage ( MoU) per subscriber for Month -2
NUM_SMO_1 Number Of Messages Sent for Month -2
NUM_DATA_UP_1 Subscriber is provided with GPRS service (Yes, No) for Month -2
Recharge Number of rechard times for Month -1
Recharge_1 Number of rechard times for Month -2
RemainCredit Remain credit in Month -3
RemainCredit_1 Remain credit in Month -2
RemainCredit_2 Remain credit in Month -1
TOTAL_BONUS Revenue bonus of Month -2
TOTAL_CREDIT Revenue credit of Month -2
TOTAL_BONUS_1 Revenue bonus of Month -1
TOTAL_CREDIT_1 Revenue credit of Month -1
CLASS {1,0} ?
Total records: 15175 rows
19
Demo
Sampling for Input data
Attribute Description
NUM_OG_CALLS Number of outgoing call for Month -1
SUM_DURATION_OG Minutes of usage ( MoU) per subscriber
NUM_SMO Number Of Messages Sent for Month -1
NUM_DATA_UP Subscriber is provided with GPRS service (Yes, No) for Month -1
NUM_OG_CALLS_1 Number of outgoing call for Month -2
SUM_DURATION_OG_1 Minutes of usage ( MoU) per subscriber for Month -2
NUM_SMO_1 Number Of Messages Sent for Month -2
NUM_DATA_UP_1 Subscriber is provided with GPRS service (Yes, No) for Month -2
Recharge Number of rechard times for Month -1
Recharge_1 Number of rechard times for Month -2
RemainCredit Remain credit in Month -3
RemainCredit_1 Remain credit in Month -2
RemainCredit_2 Remain credit in Month -1
TOTAL_BONUS Revenue bonus of Month -2
TOTAL_CREDIT Revenue credit of Month -2
TOTAL_BONUS_1 Revenue bonus of Month -1
TOTAL_CREDIT_1 Revenue credit of Month -1
CLASS {1,0} ?
Total records: 15175 rows
20
Demo
Result
0%
10%
20%
30%
40%
50%
60%
70%
80%
Accuracy
Churn rate
Jan-15
NaiveBayes MultilayerPerceptron Nearest neighbour Decision tree SVM
Prediction
actual-
Churn
actual-non-
Churn Prediction
actual-
Churn
actual-
non-Churn Prediction
actual-
Churn
actual-non-
Churn Prediction
actual-
Churn
actual-
non-Churn Prediction
actual-
Churn
actual-
non-
Churn
Churn 7353 2955 4398 6550 2956 3594 5980 2906 3074 6137 2956 3181 3161 2497 664
non-Churn 7823 3596 4227 8626 3595 5031 9196 3645 5551 9039 3595 5444 12015 4054 7961
Accuracy 47% 53% 56% 55% 69%
Churn rate 45% 45% 44% 45% 38%
Page 21

churn prediction in telecom

  • 1.
    May, 2015 Bui VanHong Email: hongbv@fpt.com.vn
  • 2.
    Agenda Churn prediction inprepaid mobile telecommunication network Machine Learning Introduction customer churn Diagram of possible customer states Churn prediction Model Classification accuracy Machine learning algorithm Support vector machine Nearest neighbour machine Multilayer percenptron neural network machine Decision tree machine Native bayes machine Feature presentation & Demo List of attributes Build Training & testing data Demo
  • 3.
    3 Machine Learning Supervised learning Supervisedlearning is the machine learning task of inferring a function from labeled training data. unsupervised learning Supervised learning is the machine learning task of inferring a function from unlabeled training data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Reinforcement learning Reinforcement learning is an area of machine learning inspired by behaviorist psychology (chess player) Deep learning Deep learning (deep machine learning, or deep structured learning, or hierarchical learning, or sometimes DL) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations
  • 4.
    4 Introduction - Customer churnis define as the lose of customers - Churn in prepaid service is actually measure base on the lack of activity in the network over a period time, thus there is no formal notification from customer of ending a contract term, our goal is to infer when this lack of activity may happen in the future for each active customer - Churn prediction can be viewed as supervised classification problem where the behavior of previously know churners an non-churners are used to train a binary classified
  • 5.
    5 Diagram of possiblecustomer states Each customer can be in one of the following states: New, active, inactive or churn
  • 6.
    6 Churn prediction Model Trainingdata Testing data RealModelFeature extraction Requires Expert Knowledge in Telco industry Learning algorithm & Prediction Independent (1) Build (2) Test (3) Test=OK (4) Predict (5) Update ETL 1 ETL 2
  • 7.
    7 Classification Accuracy Tiêu chíđánh giá độ chính xác của kết quả Actual class/Predictive class Churned Non-churned Churned True Positive (TP) False Negative (FN) Non-churned False Positive (FP) True Negative(TN) Accuracy= 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 Churn rate = 𝑇𝑃 𝑇𝑃+𝐹𝑃
  • 8.
    Agenda Churn prediction inprepaid mobile telecommunication network Introduction Diagram of possible customer states Churn prediction Model Classification accuracy Machine learning algorithm Support vector machine Nearest neighbor machine Multilayer perceptron neural network machine Decision tree machine Native Bayes machine Feature presentation & Demo List of attributes Demo-POC
  • 9.
    9 Support Vector MachineLearning Non-linear/Linear SVM Linear Functions ( ) T g b x w x Nonlinear Functions =-1 =+1
  • 10.
    10 Nearest Neighbor Learning k-NearestNeighbor Nearest Neighbor =-1 =+1
  • 11.
    11 Multilayer Perceptron (MLP)Neural Network Learing MLP algorithm 1 2 3 x1 x2 =-1 =+1
  • 12.
    12 Decision Tree Learning DecisionTree algorithm Decision Tree =-1 =+1 Y:churn N: non-churn
  • 13.
    13 Naïve Bayes Learning NaïveBayes algorithm Naïve Bayes classification – Assumption that all input features are conditionally independent! – MAP classification rule: for )|()|()|( )|,,()|( )|,,(),,,|()|,,,( 21 21 22121 CXPCXPCXP CXXPCXP CXXPCXXXPCXXXP n n nnn    Lnn ccccccPcxPcxPcPcxPcxP ,,,),()]|()|([)()]|()|([ 1 * 1 *** 1  ),,,( 21 nxxx x
  • 14.
    Agenda Churn prediction inprepaid mobile telecommunication network Introduction Diagram of possible customer states Churn prediction Model Classification accuracy Machine learning algorithm Support vector machine Nearest neighbor machine Multilayer perceptron neural network machine Decision tree machine Native Bayes machine Feature presentation & Demo List of attributes Logical design Demo Evaluation & compartion
  • 15.
    15 List of attribute AttributeName Description Number Of Outgoing calls Number Of Outgoing calls Number Of Messages Sent Number Of Messages Sent Usage GPRS Subscriber is provided with GPRS service (Yes, No) Total MOU Minutes of usage ( MoU) per subscriber Total Number Of Recharges Total Number Of Recharges Total Credit Revenue Credit Revenue Per subscriber (Voice, SMS, GPRS, VAS) Total Bonus Revenue Bonus Revenue Per subscriber (Voice, SMS, GPRS, VAS) Customer Relationship Age classifies measures pertaining to Customers according to the number of years for which the Customer has had a relationship with the Telecommunications Services Provider. Remaining credit Remaining credit per subscriber Churn Churning customer status (yes, No)
  • 16.
    16 Logical Design Input Description DataBI system Feature Domain expert in industry Algorithm SVM, NB, Decision tree, MLP, k-Nearest Neighbour Output Description Churn/ no-churn Specify churn/no-churn for each subscribers
  • 17.
    17 Demo Sampling for trainingdata Attribute Description NUM_OG_CALLS Number of outgoing call for Month -1 SUM_DURATION_OG Minutes of usage ( MoU) per subscriber NUM_SMO Number Of Messages Sent for Month -1 NUM_DATA_UP Subscriber is provided with GPRS service (Yes, No) for Month -1 NUM_OG_CALLS_1 Number of outgoing call for Month -2 SUM_DURATION_OG_1 Minutes of usage ( MoU) per subscriber for Month -2 NUM_SMO_1 Number Of Messages Sent for Month -2 NUM_DATA_UP_1 Subscriber is provided with GPRS service (Yes, No) for Month -2 Recharge Number of rechard times for Month -1 Recharge_1 Number of rechard times for Month -2 RemainCredit Remain credit in Month -3 RemainCredit_1 Remain credit in Month -2 RemainCredit_2 Remain credit in Month -1 TOTAL_BONUS Revenue bonus of Month -2 TOTAL_CREDIT Revenue credit of Month -2 TOTAL_BONUS_1 Revenue bonus of Month -1 TOTAL_CREDIT_1 Revenue credit of Month -1 CLASS {1,0} Chun/no-churn Total records: 2876 rows
  • 18.
    18 Demo Sampling for Inputdata Attribute Description NUM_OG_CALLS Number of outgoing call for Month -1 SUM_DURATION_OG Minutes of usage ( MoU) per subscriber NUM_SMO Number Of Messages Sent for Month -1 NUM_DATA_UP Subscriber is provided with GPRS service (Yes, No) for Month -1 NUM_OG_CALLS_1 Number of outgoing call for Month -2 SUM_DURATION_OG_1 Minutes of usage ( MoU) per subscriber for Month -2 NUM_SMO_1 Number Of Messages Sent for Month -2 NUM_DATA_UP_1 Subscriber is provided with GPRS service (Yes, No) for Month -2 Recharge Number of rechard times for Month -1 Recharge_1 Number of rechard times for Month -2 RemainCredit Remain credit in Month -3 RemainCredit_1 Remain credit in Month -2 RemainCredit_2 Remain credit in Month -1 TOTAL_BONUS Revenue bonus of Month -2 TOTAL_CREDIT Revenue credit of Month -2 TOTAL_BONUS_1 Revenue bonus of Month -1 TOTAL_CREDIT_1 Revenue credit of Month -1 CLASS {1,0} ? Total records: 15175 rows
  • 19.
    19 Demo Sampling for Inputdata Attribute Description NUM_OG_CALLS Number of outgoing call for Month -1 SUM_DURATION_OG Minutes of usage ( MoU) per subscriber NUM_SMO Number Of Messages Sent for Month -1 NUM_DATA_UP Subscriber is provided with GPRS service (Yes, No) for Month -1 NUM_OG_CALLS_1 Number of outgoing call for Month -2 SUM_DURATION_OG_1 Minutes of usage ( MoU) per subscriber for Month -2 NUM_SMO_1 Number Of Messages Sent for Month -2 NUM_DATA_UP_1 Subscriber is provided with GPRS service (Yes, No) for Month -2 Recharge Number of rechard times for Month -1 Recharge_1 Number of rechard times for Month -2 RemainCredit Remain credit in Month -3 RemainCredit_1 Remain credit in Month -2 RemainCredit_2 Remain credit in Month -1 TOTAL_BONUS Revenue bonus of Month -2 TOTAL_CREDIT Revenue credit of Month -2 TOTAL_BONUS_1 Revenue bonus of Month -1 TOTAL_CREDIT_1 Revenue credit of Month -1 CLASS {1,0} ? Total records: 15175 rows
  • 20.
    20 Demo Result 0% 10% 20% 30% 40% 50% 60% 70% 80% Accuracy Churn rate Jan-15 NaiveBayes MultilayerPerceptronNearest neighbour Decision tree SVM Prediction actual- Churn actual-non- Churn Prediction actual- Churn actual- non-Churn Prediction actual- Churn actual-non- Churn Prediction actual- Churn actual- non-Churn Prediction actual- Churn actual- non- Churn Churn 7353 2955 4398 6550 2956 3594 5980 2906 3074 6137 2956 3181 3161 2497 664 non-Churn 7823 3596 4227 8626 3595 5031 9196 3645 5551 9039 3595 5444 12015 4054 7961 Accuracy 47% 53% 56% 55% 69% Churn rate 45% 45% 44% 45% 38%
  • 21.