0% found this document useful (0 votes)

16 views6 pages

Unit2data Science Methodology

Uploaded by

Parth Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Unit2data Science Methodology

Uploaded by

Parth Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CLASS: XII

ARTIFICIAL INTELLIGENCE
UNIT 2:
Data Science Methodology: An Analytic Approach to Capstone Project

DATA SCIENCE METHODOLOGY

• A Methodology gives the Data Scientist a framework for designing an AI Project. The framework will help the
team to decide on the methods, processes and strategies that will be employed to obtain the correct output
required from the AI Project.
• Data Science Methodology is a process with a prescribed sequence of iterative steps that data scientists
follow to approach a problem and find a solution.
• The foundation methodology of Data Science provides a deep insight on how every AI project can be solved
from beginning to end.
• It was put forward by John Rollins, a Data Scientist at IBM Analytics.
• It consists of 10 steps.
• There are five modules, each going through two stages of the methodology, explaining the rationale as to
why each stage is required.
1. From Problem to Approach
2. From Requirements to Collection
3. From Understanding to Preparation
4. From Modelling to Evaluation
5. From Deployment to Feedback

1. Business understanding –

“ What is the problem that you are trying to solve? ”

• In this stage, first, we understand the problem of the customer by asking questions and try to
comprehend what is exactly required for them. With this understanding we can figure out the objectives
that support the customer’s goal. This is also known as Problem Scoping and defining.
• The team can use 5W1H Problem Canvas to deeply understand the issue. This stage also involves using DT
(Design Thinking) Framework.
• To solve a problem, it's crucial to understand the customer's needs. This can be achieved by asking
relevant questions and engaging in discussions with all stakeholders.

2. Analytic approach –

“ How can you use the data to answer the question? ”

• When the business problem has been established clearly, the data scientist will be able to define the
analytical approach to solve the problem.
• This stage involves asking more questions to the stakeholders so that the AI Project team can decide on
the correct approach to solve the problem.

• The different questions that can be asked now are:

1. Do I need to find how much or how many? (Regression)
2. Which category does the data belong to? (Classification)
3. Can the data be grouped? (Clustering)
4. Is there any unusual pattern in the data? (Anomaly detection)
5. Which option should be given to the customer? (Recommendation)

• There are four main types of data analytics:

1. Descriptive Analytics
2. Diagnostic Analytics
3. Predictive Analytics
4. Prescriptive Analytics

Descriptive Analytics: This summarizes past data to understand what has happened. It is the first step
undertaken in data analytics to describe the trends and patterns using tools like graphs, charts etc. and
statistical measures like mean, median, mode to understand the central tendency. This method also
examines the spread of data using range, variance and standard deviation.
For example: To calculate the average marks of students in an exam or analyzing sales data from the
previous year.

Diagnostic Analytics: It helps to understand the reason behind why some things have happened. This is
normally done by analyzing past data using techniques like root cause analysis, hypothesis testing,
correlation analysis etc. The main purpose is to identify the causes or factors that led to a certain outcome.
For example: If the sales of a company dropped, diagnostic analysis will help to find the cause for it, by
analyzing questions like “Is it due to poor customer service” or “low product quality” etc.

Predictive Analytics: This uses the past data to make predictions about future events or trends, using
techniques like regression, classification, clustering etc. Its main purpose is to foresee future outcomes and
make informed decisions.
For example: A company can use predictive analytics to forecast its sales, demand, inventory, customer
purchase pattern etc., based on previous sales data.

Prescriptive Analytics: This recommends the action to be taken to achieve the desired outcome, using
techniques such as optimization, simulation, decision analysis etc. Its purpose is to guide decisions by
suggesting the best course of action based on data analysis.
For example: To design the right strategy to increase the sales during festival season by analyzing past data
and thus optimize pricing, marketing, production etc.
3. Data requirements

“What are the data requirements? ”

• This step identifies the data contents, formats, and the sources for data collection.
• The 5W1H questioning method can be employed in this stage also to determine the data requirements.
The data selected should be able to answer all the ‘what’, ‘who’, ‘when’, ‘where’, ‘why’ and ‘how’ questions
about the problem.
• This stage involves defining our data requirements, including the type, format, source, and necessary
preprocessing steps to ensure the data is usable and accurate for our needs.
• identifying the types of data required, such as numbers, words, or images.
• considering the structure in which the data should be organized, whether it is in a table, text file, or
database.
• identifying the sources from which we can collect the data
• Data for a project can be categorized into three types: structured data (organized in tables, e.g., customer
databases), unstructured data (without a predefined structure, e.g., social media posts, images), and semi-
structured data (having some organization, e.g., emails, XML files).

4. Data collection

“What occurs during data collection? ”

• In this stage, the data scientist identifies all the data resources and collects data in all forms such as
structured, unstructured, and semi-structured data that is relevant to the problem. There are mainly
two sources of data collection:
• Primary data Source - A primary data source refers to the original source of data, where the data is
collected firsthand through direct observation, experimentation, surveys, interviews, or other methods.
This data is raw, unprocessed, and unbiased, providing the most accurate and reliable information for
research, analysis, or decision-making purposes. Examples include marketing campaigns, feedback
forms, IoT sensor data etc.
• Secondary data Source - A secondary data source refers to the data which is already stored and ready
for use. Data given in books, journals, websites, internal transactional databases, etc. can be reused for
data analysis. Some methods of collecting secondary data are social media data tracking, web scraping,
and satellite data tracking. Some sources of online data are data.gov, World bank open data, UNICEF,
open data network, Kaggle, World Health Organization, Google etc.
• Once the data is collected, the data scientist will have a good understanding of what they will be
working with. The Data Collection stage may be revisited after the Data Understanding stage,
where gaps in the data are identified, and strategies are developed to either collect additional
data or make substitutions to ensure data completeness.
5. Data understanding

“What additional work is required to manipulate and work with the

data? ”

• Data Understanding encompasses all activities related to constructing the dataset. In this stage, we
check whether the data collected represents the problem to be solved or not.
• In this stage, the data scientist tries to understand the data collected. Techniques such as descriptive
statistics and visualization (Histogram) can be applied to the dataset, to assess the content, quality, and
initial insights about the data.

6. Data preparation

“Is the data collected representative of the problem to be solved? ”

• This stage comprises all the activities needed to construct the data to make it suitable to be used for
the modeling stage.
• Data preparation includes-
➢ cleaning i.e. managing missing data, deleting duplicates, changing the data into a uniform format
etc.
➢ combine data from multiple sources (archives, tables and platforms)
➢ transform data into meaningful input variables
• Feature Engineering is a part of Data Preparation. The preparation of data is the most time-consuming
step among the Data Science stages.
7. Modeling

“In what way can the data be visualized to get to the required
answer?”

• The modelling stage uses the initial version of the dataset prepared and focuses on developing models
according to the analytical approach previously defined. The modelling process is usually iterative, leading
to the adjustments in the preparation of data.

• Data Modelling focuses on developing models that are either descriptive or predictive.
1. Descriptive Modeling :- It is a concept in data science and statistics that focuses on summarizing and
understanding the characteristics of a dataset without making predictions or decisions. The goal of
descriptive modeling is to describe the data rather than predict or make decisions based on it. This
includes summarizing the main characteristics, patterns, and trends that are present in the data.
Descriptive modeling is useful when you want to understand what is happening within your data and how
it behaves, but not necessarily why it happens.
Common Descriptive Techniques:
➢ Summary Statistics: This includes measures like: Mean (average), Median, Mode

o Standard deviation, Variance

o Range (difference between the highest and lowest values)
o Percentiles (e.g., quartiles)
➢ Visualizations: Graphs and charts to represent the data, such as: Bar charts
o Histograms
o Pie charts
o Box plots
o Scatter plots

2. Predictive modeling:- It involves using data and statistical algorithms to identify patterns and trends in
order to predict future outcomes or values. It relies on historical data and uses it to create a model that
can predict future behavior or trends or forecast what might happen next. It involves techniques like
regression, classification, and time-series forecasting, and can be applied in a variety of fields, from
predicting exam scores to forecasting weather or stock prices.

8. Evaluation

“Does the model used really answer the initial question or does it
need to be adjusted?”

• Evaluates how well your model predicts correct outcomes that match labeled test data.
• It involves using test data to measure metrics like accuracy, precision, recall, or F1 score. This helps
determine if the model is reliable and effective before deploying it in real-world situations.
• Model evaluation can have two main phases.
➢ First phase – Diagnostic measures
The first phase is the diagnostic measurement phase, which ensures that the model works as
intended. If the model is predictive, a decision tree can be used to assess whether the response
provided by the model matches the original design or requires any adjustments. If the model is a
descriptive model that evaluates the relationships, a set of tests with known results can be applied and
the model can be refined as needed.

➢ Second phase – Statistical significance test

The second evaluation phase that can be used is the statistical significance test. This type
of evaluation can be applied to the model to ensure that the model data is processed and interpreted
correctly. This is to avoid a second unnecessary assumption when the answer is revealed.

9. Deployment

“How does the solution reach the hands of the user?”

• Deployment refers to the stage where the trained AI model is made available to the users in real-world
applications.
• Data scientists must make the stakeholders familiar with the tool produced in different scenarios.
• Once the model is evaluated and the data scientist is confident it will work, it is deployed and put to the
ultimate test.
• Depending on the purpose of the model, it may be rolled out to a limited group of users or in a test
environment, to build up confidence in applying the outcome for use across the board.

10.Feedback

“Is the problem solved?

Has the question been satisfactorily answered?”

• The last stage in the methodology is feedback. This includes results collected from the deployment of the
model, feedback on the model’s performance from the users and clients, and observations from how the
model works in the deployed environment. This process continues till the model provides satisfactory and
acceptable results.
• Feedback from the users will help to refine the model and assess it for performance and impact. The
process from modelling to feedback is highly iterative. Data Scientists may automate any or all of the
feedback so that the model refresh process speeds up and can get quick improved results.
• Feedback from users can be received in many ways like Surveys, Website feedback, local media, call
center, support ticket, social likes etc.

Notes Unit 1
No ratings yet
Notes Unit 1
8 pages
Unit 2 - Data Science Methodology Notes
No ratings yet
Unit 2 - Data Science Methodology Notes
26 pages
Introduction To Data Science Methodology
No ratings yet
Introduction To Data Science Methodology
45 pages
Capstone Project - Unit2
No ratings yet
Capstone Project - Unit2
81 pages
IBM Q1 Technical Marketing ASSET2 - Data Science Methodology-Best Practices For Successful Implementations Ov37176 PDF
No ratings yet
IBM Q1 Technical Marketing ASSET2 - Data Science Methodology-Best Practices For Successful Implementations Ov37176 PDF
6 pages
AI Student HandbookXII
No ratings yet
AI Student HandbookXII
48 pages
CH 2
No ratings yet
CH 2
26 pages
Class 12 AI - Chapter 1
No ratings yet
Class 12 AI - Chapter 1
5 pages
Data Science Methodology
No ratings yet
Data Science Methodology
21 pages
Data Science Methodology
No ratings yet
Data Science Methodology
4 pages
Unit 2 - DS - 1st Year
No ratings yet
Unit 2 - DS - 1st Year
7 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
Data Science Methodology
No ratings yet
Data Science Methodology
26 pages
HTTTTC - Final Exam
No ratings yet
HTTTTC - Final Exam
4 pages
Module 5 - Data Science Methodologies
No ratings yet
Module 5 - Data Science Methodologies
9 pages
Ds 3
No ratings yet
Ds 3
9 pages
DSBD
No ratings yet
DSBD
23 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Cs3352 Fods QB
No ratings yet
Cs3352 Fods QB
25 pages
Data Similarity and Dissimilarity
No ratings yet
Data Similarity and Dissimilarity
73 pages
Introduction Data Science Edited
No ratings yet
Introduction Data Science Edited
33 pages
DS Unit 1
No ratings yet
DS Unit 1
26 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Unit-2 - DS Notes
No ratings yet
Unit-2 - DS Notes
22 pages
22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
Data Science Process
No ratings yet
Data Science Process
101 pages
EBook - Data Science 4
No ratings yet
EBook - Data Science 4
14 pages
Liceria Tech
No ratings yet
Liceria Tech
12 pages
ATW115 Slides Chp02
No ratings yet
ATW115 Slides Chp02
52 pages
Data Science
No ratings yet
Data Science
11 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
DAFD UNit-2
No ratings yet
DAFD UNit-2
16 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
8 pages
Unit 1
No ratings yet
Unit 1
9 pages
PM Unit 1
No ratings yet
PM Unit 1
41 pages
Lecture02 Frameworks Platforms-Part1
No ratings yet
Lecture02 Frameworks Platforms-Part1
40 pages
Xii Analytical Approach
No ratings yet
Xii Analytical Approach
3 pages
AI Project Cycle
No ratings yet
AI Project Cycle
10 pages
Unit 7
67% (3)
Unit 7
43 pages
Unit 1
No ratings yet
Unit 1
60 pages
Project Cycle 1-2-25
No ratings yet
Project Cycle 1-2-25
6 pages
Data Understanding and Prepration
100% (1)
Data Understanding and Prepration
10 pages
Lecture 2 The Data Science Process and Tools For Each Step
No ratings yet
Lecture 2 The Data Science Process and Tools For Each Step
8 pages
Data Science Roles & Lifecycle Guide
No ratings yet
Data Science Roles & Lifecycle Guide
20 pages
Introduction To Data Analysis
100% (1)
Introduction To Data Analysis
94 pages
BigDataAnalytics - Unit1
No ratings yet
BigDataAnalytics - Unit1
21 pages
W3 - DA Life Cycle
No ratings yet
W3 - DA Life Cycle
49 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Analyses
No ratings yet
Data Analyses
9 pages
Module I (Introduction Data Analytics Life Cycle) Part II
No ratings yet
Module I (Introduction Data Analytics Life Cycle) Part II
103 pages
Datas Unit1
No ratings yet
Datas Unit1
20 pages
Unit 2
No ratings yet
Unit 2
58 pages
Life Cycle of Data Science - Complete Step-By-step Guide
No ratings yet
Life Cycle of Data Science - Complete Step-By-step Guide
3 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Big Data
No ratings yet
Big Data
4 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Short Term Prediction of Groundwater Level Using Improved Random Forest Regression With A Combination of Random Features
No ratings yet
Short Term Prediction of Groundwater Level Using Improved Random Forest Regression With A Combination of Random Features
12 pages
Linear Regression Analysis in Education and Economics
No ratings yet
Linear Regression Analysis in Education and Economics
19 pages
Case Study 4
No ratings yet
Case Study 4
4 pages
Introduction To Statistics: Tuesday, December 07, 2021 1
No ratings yet
Introduction To Statistics: Tuesday, December 07, 2021 1
18 pages
Central Tendency & Variability
No ratings yet
Central Tendency & Variability
5 pages
Engineering Data Analysis: Worksheet 2
No ratings yet
Engineering Data Analysis: Worksheet 2
2 pages
Balangkayan Students' Food Study
No ratings yet
Balangkayan Students' Food Study
2 pages
Pizza Demand Forecasting Guide
No ratings yet
Pizza Demand Forecasting Guide
15 pages
Non-Parametric Statistics: Business Statistics - Naval Bajpai
100% (1)
Non-Parametric Statistics: Business Statistics - Naval Bajpai
40 pages
Unit 3
No ratings yet
Unit 3
17 pages
Stats PDF
No ratings yet
Stats PDF
7 pages
Understanding Correlation Basics
No ratings yet
Understanding Correlation Basics
9 pages
Online Course: Ecological Statistics
No ratings yet
Online Course: Ecological Statistics
7 pages
2024 - Data Analytics Book
No ratings yet
2024 - Data Analytics Book
193 pages
B. Com. H Business Statistics S FpigWq1
No ratings yet
B. Com. H Business Statistics S FpigWq1
8 pages
MSA With Minitab
No ratings yet
MSA With Minitab
59 pages
Exercise - Corriges AUTO HETERO E7
No ratings yet
Exercise - Corriges AUTO HETERO E7
24 pages
(Ebook PDF) Essentials of Business Analytics 2nd Edition Full Access
100% (2)
(Ebook PDF) Essentials of Business Analytics 2nd Edition Full Access
142 pages
Time Series Analysis - COMPLETE
No ratings yet
Time Series Analysis - COMPLETE
15 pages
Exercises Dobson
0% (1)
Exercises Dobson
3 pages
Chapter 6. Supplemental Text Material 6-1. Factor Effect Estimates Are Least Squares Estimates
No ratings yet
Chapter 6. Supplemental Text Material 6-1. Factor Effect Estimates Are Least Squares Estimates
14 pages
Statistics Question Paper
No ratings yet
Statistics Question Paper
4 pages
Endogeneity and Gaussian Copulas - SmartPLS
No ratings yet
Endogeneity and Gaussian Copulas - SmartPLS
4 pages
Probability Exercises for Students
No ratings yet
Probability Exercises for Students
4 pages
Job Stress & Performance Study
0% (1)
Job Stress & Performance Study
12 pages
Statistics with LibreOffice & Gnumeric
100% (1)
Statistics with LibreOffice & Gnumeric
91 pages
Giong Chuong 7
100% (1)
Giong Chuong 7
6 pages
Sample Test Result Analysis
No ratings yet
Sample Test Result Analysis
5 pages
Introduction To Factor Analysis (Compatibility Mode) PDF
No ratings yet
Introduction To Factor Analysis (Compatibility Mode) PDF
20 pages

Unit2data Science Methodology

Uploaded by

Unit2data Science Methodology

Uploaded by

CLASS: XII

DATA SCIENCE METHODOLOGY

“ What is the problem that you are trying to solve? ”

“ How can you use the data to answer the question? ”

• The different questions that can be asked now are:

• There are four main types of data analytics:

“What are the data requirements? ”

“What occurs during data collection? ”

“What additional work is required to manipulate and work with the

“Is the data collected representative of the problem to be solved? ”

o Standard deviation, Variance

➢ Second phase – Statistical significance test

“How does the solution reach the hands of the user?”

“Is the problem solved?

You might also like