KEMBAR78
What Is CRISP in Data Mining - Javatpoint | PDF | Data | Computing
0% found this document useful (0 votes)
42 views10 pages

What Is CRISP in Data Mining - Javatpoint

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views10 pages

What Is CRISP in Data Mining - Javatpoint

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Decision Tree Induction CRISP-DM provides an overview of the data mining life cycle as a process model.

The life
cycle model comprises six phases, with arrows indicating the most important and frequent
Educational Data Mining
dependencies between phases. The sequence of the phases is not strict. And most projects
Data Mining in Healthcare
move back and forth between phases as necessary. The CRISP-DM model is flexible and can
Apriori Algorithm be customized easily.
Data Integration in Data Mining

Data mining vs Text mining

← prev next →

Advertisement

Advertisement
Try for 1 month at no charge Sign Up

What is CRISP in Data Mining?


CRISP-DM stands for the cross-industry standard process for data mining. The CRISP-DM
methodology provides a structured approach to planning a data mining project. It is a For example, if your organization aims to detect money laundering, you will likely sift
robust and well-proven methodology. We do not claim any ownership over it. We did not through large amounts of data without a specific modelling goal. Instead of modelling, your
invent it. We are a converter of its powerful practicality, flexibility, and usefulness when work will focus on data exploration and visualization to uncover suspicious patterns in
using analytics to solve business issues. It is the golden thread that runs through almost financial data. CRISP-DM allows you to create a data mining model that fits your needs.
every client meeting.
It includes descriptions of typical phases of a project, the tasks involved with each phase,
This model is an idealized sequence of events. In practice, many tasks can perform in a and an explanation of the relationships between these tasks.
different order, and it will often be necessary to backtrack to previous tasks and repeat
certain actions. The model does not try to capture all possible routes through the data
mining process.

How does CRISP Help?


Advertisement Advertisement
CRISP DM provides a roadmap, it gives you best practices, and it provides structures for
better and faster results of using data mining, so that's how it helps the business follow
while planning and carrying out a data mining project.

Phases of CRISP-DM
and measurable, for example, reducing customer beat to a certain level. However, sometimes
it might be necessary to have more subjective criteria, such as giving useful insights into the

 relationships.
Home Python Java JavaScript HTML SQL PHP C#
Assess the current situation

This involves more detailed fact-finding about the resources, constraints, assumptions and
other factors you'll need to consider when determining your data analysis goal and project
plan.

Phase 1: Business Understanding


1. Inventory of resources: List the resources available to the project, including:
The first stage of the CRISP-DM process is understanding what you want to accomplish
from a business perspective. Your organization may have competing objectives and Personnel (business experts, data experts, technical support, data mining
constraints that must be properly balanced. This process stage aims to uncover important experts)
factors influencing the project's outcome. Neglecting this step can mean much effort is put
Data (fixed extracts, access to live, warehoused, or operational data)
into producing the right answers to the wrong questions.
Computing resources (hardware platforms)
What are the desired outputs of the project?
Software (data mining tools, other relevant software)
1. Set objectives: Describe your primary objective from a business perspective. There may also
2. Requirements, assumptions and constraints: List all requirements of the project, including
be other related questions that you would like to mention. For example, your primary goal
the schedule of completion, the required comprehensibility and quality of results, and any
might be to keep current customers by predicting when they are prone to move to a
Advertisement Advertisement
data security concerns and legal issues. Make sure that you are allowed to use the data. List
competitor.
the assumptions made by the project. These may be assumptions about the data that can be
2. Produce project plan: Describe the plan for achieving the data mining and business goals.
verified during data mining but may also include non-verifiable assumptions about the
The plan should specify the steps to perform during the rest of the project, including the
business related to the project. It is important to list the latter if they affect the validity of the
initial selection of tools and techniques.
results. List the constraints on the project. These may be constraints on the availability of
3. Business success criteria: Here, you'll lay out the criteria you'll use to determine whether the
resources but may also include technological constraints such as the size of the data set that
project has been successful from the business point of view. These should ideally be specific
it is practical to use for modelling.
3. Risks and contingencies: List the risks or events that might delay the project or cause it to 1. Project plan: List the stages to be executed in the project, with their duration, resources
fail. List the corresponding contingency plans, like what action will you take if these risks or required, inputs, outputs, and dependencies. Where possible, try and make explicit the
events occur? large-scale iterations in the data mining process, for example, repetitions of the modelling
4. Terminology: Compile a glossary of terminology relevant to the project. This will generally and evaluation phases.
have two components:
As part of the project plan, it is important to analyze the dependencies between time
A glossary of relevant business terminology forms part of the business schedules and risks. Mark the results of these analyses explicitly in the project plan, ideally
understanding available to the project. Constructing this glossary is a useful with actions and recommendations if the risks are manifested. Decide which evaluation
"knowledge elicitation" and education exercise. strategy will be used in the evaluation phase.
A glossary of data mining terminology is illustrated with examples relevant to
Your project plan will be a dynamic document. At the end of each phase, you'll review
the business problem.
progress and achievements and update the project plan accordingly. Specific review points
5. Costs and benefits: Construct a cost-benefit analysis for the project, which compares the for these updates should be part of the project plan.
project's costs with the potential benefits to the business if it is successful. This comparison
should be as specific as possible. For example, you should use financial measures in a
commercial situation.

Determine data mining goals

A business goal states objectives in business terminology. A data mining goal states project
objectives in technical terms. For example, the business goal might be Increase catalogue
sales to existing customers. A data mining goal might be to Predict how many widgets a
customer will buy, given their purchases over the past three years, demographic
information (age, salary, city, etc.), and the item's price.

1. Business success criteria: It describes the intended outputs of the project that enable the
achievement of the business objectives.
2. Data mining success criteria: It defines the criteria for a successful project outcome. For
example, a certain level of predictive accuracy or a propensity-to-purchase profile with a given 2. Initial assessment of tools and techniques: At the end of the first phase, you should
degree of "lift." As with business success criteria, it may be necessary to describe these in undertake an initial assessment of tools and techniques. For example, you select a data
subjective terms, in which case the person or persons making the subjective judgment mining tool that supports various methods for different stages of the process. It is
should be identified. important to assess tools and techniques early in the process since the selection of tools
and techniques may influence the entire project.
Produce project plan
Advertisement Advertisement

Phase 2: Data Understanding


Describe the intended plan for achieving the data mining goals and business goals. Your
plan should specify the steps to perform during the rest of the project, including the initial The second phase of the CRISP-DM process requires you to acquire the data listed in the
selection of tools and techniques. project resources. This initial collection includes data loading if this is necessary for data
understanding. For example, if you use a specific tool for data understanding, it makes
perfect sense to load your data into this tool. If you acquire multiple data sources, you need
to consider how and when you will integrate these.

Initial data collection report: List the data sources acquired, their locations, the
methods used to acquire them, and any problems encountered. Record
problems you encountered and any resolutions achieved. This will help with During this stage, you'll address data mining questions using querying, data visualization
future replication of this project and the execution of similar future projects. and reporting techniques. These may include:

Describe data Distribution of key attributes

Relationships between pairs or small numbers of attributes

Results of simple aggregations

Properties of significant sub-populations

Simple statistical analyses

These analyses may directly address your data mining goals. They may contribute to or
refine the data description and quality reports and feed into the transformation and other
data preparation steps needed for further analysis.

Data exploration report: Describe the results of your data exploration, including
the first findings or initial hypothesis and their impact on the remainder of the
project. If appropriate, you could include graphs and plots here to indicate data
Examine the "gross" or "surface" properties of the acquired data and report on the results.
characteristics that suggest further examination of interesting data subsets.

Data description report: Describe the data that has been acquired, including its
Verify data quality
format, its quantity, the identities of the fields and any other surface features
which have been discovered. Evaluate whether the data acquired satisfies your Examine the quality of the data, addressing questions such as:
requirements.
Is the data complete, or does it cover all the cases required?
Explore data
Is it correct, or does it contain errors, and if there are errors, how common are
Advertisement they? Advertisement

Are there missing values in the data? If so, how are they represented, where do
they occur, and how common are they?

Data quality report


List the results of the data quality verification. If quality problems exist, suggest possible This task includes constructive data preparation operations such as producing derived
solutions. Solutions to data quality problems generally depend heavily on data and business attributes, entire new records, or transformed values for existing attributes.
knowledge.
Derived attributes: These are new attributes constructed from one or more
Phase 3: Data Preparation existing attributes in the same record. For example, you might use the variables
of length and width to calculate a new variable of area.

Generated records: Here, you describe the creation of any completely new
records. For example, you might need to create records for customers who did
not purchase during the past year. There was no reason to have such records in
the raw data. Still, it might make sense to represent that particular customers
explicitly made zero purchases for modelling purposes.

Integrate data

These methods combine information from multiple databases, tables or records to create
new records or values.

In this project phase, you decide on the data you will use for analysis. The criteria you might
use to make this decision include the relevance of the data to your data mining goals, the
data's quality, and technical constraints such as limits on data volume or data types.

The rationale for inclusion/exclusion: List the data to be included/excluded and


the reasons for these decisions.

Clean your data

This task involves raising the data quality to the level required by the analysis techniques
that you've selected. This may involve selecting clean subsets of the data, the insertion of
suitable defaults, or more ambitious techniques such as estimating missing data by
modelling.
Merged data: Merging tables refers to joining two or more tables with different
Data cleaning report:
Advertisement Describe what decisions and actions you took to address information about the same objects. For example, a retail chain might have one
Advertisement

data quality problems. Consider any data transformations made for cleaning table with information about each store's general characteristics (e.g., floor space,
purposes and their possible impact on the analysis results. type of mall), another table with summarized sales data (e.g., profit, percent
change in sales from the previous year), and another with information about the
Construct required data demographics of the surrounding area. Each of these tables contains one record
for each store. These tables can be merged into a new table with one record for Run the modelling tool on the prepared dataset to create one or more models.
each store, combining fields from the source tables.
Parameter settings: With any modelling tool, there are often a large number of
Aggregations: Aggregations are operations in which new values are computed parameters that can be adjusted. List the parameters, their values, and the
by summarizing information from multiple records or tables. For example, rationale for selecting parameter settings.
converting a table of customer purchases where one record for each purchase
into a new table and one record for each customer, with fields such as the Models: These are the models produced by the modelling tool, not a report on
number of purchases, average purchase amount, percent of orders charged to the models.
credit card, percent of items under promotion etc.
Model descriptions: Describe the resulting models, report on the interpretation
of the models and document any difficulties encountered with their meanings.
Phase 4: Modelling
Assess model
Select modelling technique: As the first step, you'll select the basic modelling technique
you will use. Although you may have already selected a tool during the business
understanding phase, at this stage, you'll be selecting the specific modelling technique, e.g.
decision-tree building with C5.0 or neural network generation with back propagation. If
multiple techniques are applied, perform this task separately for each technique.

Modelling technique: Document the basic modelling technique that is to be


used.

Modelling assumptions: Many modelling techniques make specific


assumptions about the data, for example, that all attributes have uniform
distributions, no missing values are allowed, the class attribute must be symbolic
etc. Record any assumptions made.

Generate test design


Interpret the models according to your domain knowledge, data mining success criteria,
Before you build a model, you need to generate a procedure or mechanism to test the and desired test design. Judge the success of the application of modelling and discovery
model's quality and validity. For example, in supervised data mining tasks such as techniques, and then contact business analysts and domain experts later to discuss the
classification, it is common to use error rates as quality measures for data mining models. data mining results in the business context. This task only considers models, whereas the
Therefore, you typically separate the dataset into train and test sets, build the model on the evaluation phase also considers all other results produced during the project.
train set, and estimate its quality on the separate test set.
Advertisement Advertisement
At this stage, you should rank the models and assess them according to the evaluation
Test design: Describe the intended plan for training, testing, and evaluating the criteria. You should consider the business objectives and success criteria as far as you can
models. A primary component of the plan is determining how to divide the here. In most data mining projects, a single technique is applied more than once, and data
available dataset into training, test and validation datasets. mining results are generated with several different techniques.

Build model
Model assessment: Summaries the results of this task, list the qualities of your Determine next steps
generated models (e.g.in, in terms of accuracy) and rank their quality with each You now decide how to proceed depending on the assessment results and the process
other. review. Do you finish this project and move on to deployment, initiate further iterations, or
Revised parameter settings: According to the model assessment, revise them set up new data mining projects? You should also take stock of your remaining resources
and tune them for the next modelling run. Iterate model building and and budget, which may influence your decisions.
assessment until you strongly believe that you have found the best model(s).
Document all such revisions and assessments. List of possible actions: List the potential further actions and the reasons for
and against each option.

Phase 5: Evaluation Decision: Describe the decision on how to proceed, along with the rationale.

Evaluate your results: Previous evaluation steps dealt with factors such as the accuracy and
generality of the model. During this step, you'll assess the degree to which the model meets Phase 6: Deployment
your business objectives and seek to determine if there is some business reason why this Plan deployment: In the deployment stage, you'll take your evaluation results and
model is deficient. Another option is to test the model on test applications in the real determine a strategy for their deployment. If a general procedure has been identified to
application if time and budget constraints permit. The evaluation phase also involves create the relevant model(s), this procedure is documented here for later deployment. It
assessing any other data mining results you've generated. Data mining results involve makes sense to consider the ways and means of deployment during the business
models that are necessarily related to the original business objectives and all other findings understanding phase because deployment is crucial to the project's success. This is where
that are not necessarily related to the original business objectives but might also unveil predictive analytics helps improve your business's operational side.
additional challenges, information, or hints for future directions.
Deployment plan: Summarise your deployment strategy, including the
Assessment of data mining results: Summarise assessment results in business necessary steps and how to perform them.
success criteria, including a final statement regarding whether the project
already meets the initial business objectives. Plan monitoring and maintenance

Approved models: After assessing models to business success criteria, the Monitoring and maintenance are important issues if the data mining result becomes part
generated models that meet the selected criteria become the approved models. of the day-to-day business and its environment. The careful preparation of a maintenance
strategy helps to avoid unnecessarily long periods of incorrect usage of data mining results.
Review process The project needs a detailed monitoring process plan to monitor the deployment of the
At this point, the resulting models appear to be satisfactory and satisfy business needs. It is data mining result(s). This plan takes into account the specific type of deployment.
now appropriate for you to do a more thorough review of the data mining engagement to
Monitoring and maintenance plan: Summarise the monitoring and
determine if Advertisement
there is an important factor or task that has somehow been overlooked. This Advertisement
maintenance strategy, including the necessary steps and how to perform them.
review also covers quality assurance issues. For example: did we correctly build the model?
Did we use only the attributes that we are allowed to use and that are available for future
Produce final report
analyses?
At the end of the project, you will write a final report. Depending on the deployment plan,
Review of the process: Summarise the process review and highlight activities this report may be only a summary of the project and its experiences (if they have not
that have been missed and those that should be repeated.
already been documented as an ongoing activity), or it may be a final and comprehensive
presentation of the data mining result.

Final report: This is the final written report of the data mining engagement. It
includes all of the previous deliverables, summarising and organizing the results.

Final presentation: There will often be a meeting after the project at which the
results are presented to the customer.

Review project

Assess what went right and wrong, what was done well and what needs improvement.

Learn Important Tutorial

Python Java

Experience documentation: Summarise important experience gained during


the project. For example, this documentation could include any pitfalls you
encountered, misleading approaches, or hints for selecting the best-suited data
mining techniques in similar situations. In ideal projects, experience
documentation also covers any reports that individual project members have
written during previous phases of the project. Javascript HTML
Advertisement Advertisement

Next Topic FP Growth Algorithm in Data Mining

← prev next →
Database PHP Computer Discrete
Organization Mathematics

C++ React Ethical Computer


Hacking Graphics

B.Tech / MCA

Web Software
Technology Engineering

Data
DBMS
Structures

Cyber
Automata
Security

Operating
DAA
System

C
C++
Programming

Computer Compiler
Advertisement Advertisement
Network Design

Java .Net
Python Programs

Control Data
System Warehouse

Preparation

Aptitude Reasoning

Verbal Interview
Ability Questions

Advertisement

Company
Questions

You might also like