Data Science ProcessProcess
CRISP DM process
The methodical discovery of useful relationships and patterns in data is enabled by
a set of iterative activities collectively known as data science process
Understanding the process
Business Data
Preparing the data samples Understanding Understanding
Developing the model
Data Preparation
Applying the model on dataset Deployment
Data
Modeling
Deploying and maintaining the
model
Evaluation
Process
Business Data
Understanding Understanding 1. Prior Knowledge
Prepare Data
2. Preparation
Building Model using
Training Data
Algorithms
3. Modeling
Test Data Applying Model and
performance evaluation
4. Application
Deployment
Knowledge and Actions 5. Knowledge
1. Prior Knowledge
Prior knowledge refers to information that is already known about a subject
Gaining information on:
Objective of the problem
Subject area of the problem
Data
Example: for the lending example, a simple data set of ten points
Terminologies used
A Dataset
A datapoint
An Attribute
A label
Identifiers
2. Data Preparation
Data Exploration
Data quality
Handling missing values
Data type conversion
Transformation
Outliers
Feature selection
Sampling
3. Modeling
Training Data Build model
Test Data Evaluation
Final Model
3.Spliting
Modeling
training and test data sets
3.Spliting
Modeling
training and test data sets
Training Data
Test Data
3. Modeling
3. Modeling
Evaluation of test dataset
3. Application
Product readiness
Technical integration
Model response time
Remodeling
Assimilation
5. Knowledge
Posterior knowledge
Kotu, V., & Deshpande, B. (2014). Predictive analytics and data mining: concepts and practice with rapidminer. Morgan Kaufmann.