BUSINESS ANALYTICS
IN
MARKETING
SYLLABUS
problem-solving DATA wrangling VISUALIZATION EXPERT
using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Analytics tools exercise • Common pitfalls
HOW TO SOLVE PROBLEMS USING DATA?
STEP 1 Step 2 Step 3
Problem statement Data wrangling Data analysis
Step 4 Step 5
Data visualization Communication
data
WRANGLING
HOW TO SOLVE PROBLEMS USING DATA?
5 steps of data wrangling
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA
DISCOVERY
Understand your data
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA
STRUCTURING
Remove duplicates.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA
STRUCTURING
Create unique ids for
the observations.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA Missing values:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Impute the values using the
mean, median or max value
(for continuous values) or
most frequent value (for
categorical values) depending
on the situation.
● Ignore the values of those
variables only.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA Invalid value:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Correct the value using most
reasonable methods.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA
CLEANING
DATA
ENRICHING
Remove, impute or
correct missing values
and invalid values.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
Value inconsistency: lengths
Data type inconsistency: text (“20”)
vs. number (19)
DATA
VERIFYING
Validate and ensure the correct data
types, data homogeneity and
constraints.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA
publishing
Store & manage data in suitable
format and system to deliver &
distribute the data to end-users
through platform and tools.
HOW TO SOLVE PROBLEMS USING DATA?
STEP 1 Step 2 Step 3
Problem statement Data wrangling Data analysis
Step 4 Step 5
Data visualization Communication
PRACTICE
DATA WRANGLING
Group
work!!!
Among 3 approaches to data
culture, which one relies the
least on available data?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What does "relation" mean
in a relational database?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
"There will probably be an economic crisis
this winter"
This statement refers to which basic
components of the time series data?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What is NOT a suitable way
to deal with missing data?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
The column "weight" in a dataset
"us_teenager" has the following values
[48,49.5,50,50,70,180,700]. At most, this
dataset violates which data quality
dimension(s)?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What is NOT a valid data
type in spreadsheet apps
(e.g. Excel)?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
Given this data range
[2,3,4,8,9,10]? What is the
median?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
In spreadsheet app (e.g.
Excel), 1/24/180 is
equivalent to?
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What would be the result?
=ISNUMBER(TODAY())
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
What is the value?
=TRUE*FALSE
Click Present with Slido or install our Chrome extension to activate this
ⓘ
poll while presenting.
Question 7:
REVIEw:
explorING data
with excel?
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA analysis: DATA TYPES IN EXCEL
There are 4 main data types for data wrangling and analysis using Excel:
TEXT NUMBER BOOLEAN ERROR
A, B, C 1, 2, 3 TRUE #DIV/0, #N/A,
apple, Banana 1.2, 1.999 FALSE #NAME?,
Who? -1, -0.9 #NULL!,
“10”, “2.1” *date, #NUM!, #REF!
“TRUE” *time, #VALUE!
“” *duration etc.
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA analysis: DATA TYPES IN EXCEL
WARNING: In Excel, the data type (what it is) and data format (how we see
it) of one value might be vastly different:
DATA TYPE DATA FORMAT
● Character: A, B, C
TEXT ●
●
Special character: !@#$%^&*()
Text: apple, Banana, ORANGE
● Numbers as text: “0”, “1.1”
● Number: 0, 1.2, -1, -3.5
number ● Percentage: 12%, 1.5%, -3%
● Accounting; currency: (3), 4; 5000đ, $5.00
● Date; datetime: Feb 19th, 2023; 2023-02-19 17:00:00
● Duration: 3:20:00
HOW TO SOLVE PROBLEMS USING DATA?
EXTRA INFO: DATA TYPES IN DATABASE
There are many main data types for a structured database. They are heavily
validated. PostgresQL has some data types similar to Excel including:
TEXT NUMBER TEMPORAL BOOLEAN NULL
TEXT FLOAT DATE TRUE
CHAR INTEGER TIME FALSE
VARCHAR DATETIME
TIMESTAMP
INTERVAL
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA analysis: BASIC STATS
Before delving into a dataset, an analyst should know some basic stats about the set:
Information SUPPORTING SPREADSHEET FUNCTION OR FORMULA
#of observations =COUNT(range): counting total number of records within a range
#of variables =COUNTA(range): counting total numbers of not-blank records within a range
Minimum, maximum =MAX(range): returning the biggest value of a numerical range
=MIN(range): returning the small value of a numerical range
Mean, median =AVERAGE(range): returning the average (arithmetic mean) of the dataset
=MEDIAN(range): returning the middle number in the dataset
Percentile =PERCENTILE(range,k): returning the k-th percentile of values in a range. P25,
P50 a.k.a median, P75 are common values.
Distribution Histogram Chart
(Numerical variables)
Histogram (2023) Wikipedia. Wikimedia Foundation. Available at: https://en.wikipedia.org/wiki/Histogram (Accessed: February 19, 2023).
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA analysis: BASIC STATS
Before delving into a dataset, an analyst should know some basic stats about the set:
Information SUPPORTING SPREADSHEET FUNCTION OR FORMULA
Distinct values = UNIQUE(range): returning list of unique values.
(categorical variables)
Data Type =ISTEXT(): returning TRUE when a cell contains a text value
=ISNUMBER(): returning TRUE when a cell contains a number value
Text in numerical =COUNTIF(range,"*"): returning the number of text values within a range;
variables return 0 if the range has all number values
Empty values =COUNTBLANK(range): returning the number of blank values within a range
Duplicated records =COUNT(range)-COUNTUNIQUE(range): returning the number duplicates
within a range, excluding the original value. Note: COUNTUNIQUE available on
Google Sheets only.
MId-tERM
REVIEW: working with spreadsheet
Dataset: BA_S1 - Mid-term Assignment (data file).xlsx
● Work in small team (max 4)
● Submit on eLearning by 22:00 Fri, Nov 8th, 2024
SYLLABUS
problem-solving DATA WRANGLING VISUALIZATION EXPERT
using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Techniques & tools exercise • Common pitfalls
This presentation template was created by
Slidesgo, including icons by Flaticon,
infographics & images by Freepik
Please keep this slide for attribution