KEMBAR78
Session 3 - Data Wrangling, Data Manipulation With Excel | PDF | Data Analysis | Data
0% found this document useful (0 votes)
45 views36 pages

Session 3 - Data Wrangling, Data Manipulation With Excel

The document outlines a syllabus for a Business Analytics course focused on marketing, detailing steps for problem-solving using data, including problem statement, data wrangling, data analysis, visualization, and communication. It emphasizes the importance of data cleaning, structuring, and verifying, as well as basic statistics and data types relevant to Excel and databases. Additionally, it includes practical exercises and group work to enhance learning and application of the concepts.

Uploaded by

Momo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views36 pages

Session 3 - Data Wrangling, Data Manipulation With Excel

The document outlines a syllabus for a Business Analytics course focused on marketing, detailing steps for problem-solving using data, including problem statement, data wrangling, data analysis, visualization, and communication. It emphasizes the importance of data cleaning, structuring, and verifying, as well as basic statistics and data types relevant to Excel and databases. Additionally, it includes practical exercises and group work to enhance learning and application of the concepts.

Uploaded by

Momo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

BUSINESS ANALYTICS

IN
MARKETING
SYLLABUS

problem-solving DATA wrangling VISUALIZATION EXPERT


using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Analytics tools exercise • Common pitfalls
HOW TO SOLVE PROBLEMS USING DATA?

STEP 1 Step 2 Step 3


Problem statement Data wrangling Data analysis

Step 4 Step 5
Data visualization Communication
data
WRANGLING
HOW TO SOLVE PROBLEMS USING DATA?
5 steps of data wrangling
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA
DISCOVERY

Understand your data


HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA
STRUCTURING

Remove duplicates.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA
STRUCTURING

Create unique ids for


the observations.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA Missing values:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Impute the values using the
mean, median or max value
(for continuous values) or
most frequent value (for
categorical values) depending
on the situation.
● Ignore the values of those
variables only.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA Invalid value:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Correct the value using most
reasonable methods.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA
CLEANING
DATA
ENRICHING

Remove, impute or
correct missing values
and invalid values.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


Value inconsistency: lengths

Data type inconsistency: text (“20”)


vs. number (19)
DATA
VERIFYING

Validate and ensure the correct data


types, data homogeneity and
constraints.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING

DATA
publishing

Store & manage data in suitable


format and system to deliver &
distribute the data to end-users
through platform and tools.
HOW TO SOLVE PROBLEMS USING DATA?

STEP 1 Step 2 Step 3


Problem statement Data wrangling Data analysis

Step 4 Step 5
Data visualization Communication
PRACTICE
DATA WRANGLING
Group
work!!!
Among 3 approaches to data
culture, which one relies the
least on available data?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
What does "relation" mean
in a relational database?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
"There will probably be an economic crisis
this winter"
This statement refers to which basic
components of the time series data?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
What is NOT a suitable way
to deal with missing data?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
The column "weight" in a dataset
"us_teenager" has the following values
[48,49.5,50,50,70,180,700]. At most, this
dataset violates which data quality
dimension(s)?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
What is NOT a valid data
type in spreadsheet apps
(e.g. Excel)?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
Given this data range
[2,3,4,8,9,10]? What is the
median?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
In spreadsheet app (e.g.
Excel), 1/24/180 is
equivalent to?

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
What would be the result?
=ISNUMBER(TODAY())

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
What is the value?
=TRUE*FALSE

Click Present with Slido or install our Chrome extension to activate this

poll while presenting.
Question 7:

REVIEw:
explorING data
with excel?
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA analysis: DATA TYPES IN EXCEL


There are 4 main data types for data wrangling and analysis using Excel:

TEXT NUMBER BOOLEAN ERROR


A, B, C 1, 2, 3 TRUE #DIV/0, #N/A,
apple, Banana 1.2, 1.999 FALSE #NAME?,
Who? -1, -0.9 #NULL!,
“10”, “2.1” *date, #NUM!, #REF!
“TRUE” *time, #VALUE!
“” *duration etc.
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA analysis: DATA TYPES IN EXCEL


WARNING: In Excel, the data type (what it is) and data format (how we see
it) of one value might be vastly different:

DATA TYPE DATA FORMAT

● Character: A, B, C
TEXT ●

Special character: !@#$%^&*()
Text: apple, Banana, ORANGE
● Numbers as text: “0”, “1.1”

● Number: 0, 1.2, -1, -3.5


number ● Percentage: 12%, 1.5%, -3%
● Accounting; currency: (3), 4; 5000đ, $5.00
● Date; datetime: Feb 19th, 2023; 2023-02-19 17:00:00
● Duration: 3:20:00
HOW TO SOLVE PROBLEMS USING DATA?

EXTRA INFO: DATA TYPES IN DATABASE


There are many main data types for a structured database. They are heavily
validated. PostgresQL has some data types similar to Excel including:

TEXT NUMBER TEMPORAL BOOLEAN NULL


TEXT FLOAT DATE TRUE
CHAR INTEGER TIME FALSE
VARCHAR DATETIME
TIMESTAMP
INTERVAL
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA analysis: BASIC STATS


Before delving into a dataset, an analyst should know some basic stats about the set:

Information SUPPORTING SPREADSHEET FUNCTION OR FORMULA


#of observations =COUNT(range): counting total number of records within a range
#of variables =COUNTA(range): counting total numbers of not-blank records within a range
Minimum, maximum =MAX(range): returning the biggest value of a numerical range
=MIN(range): returning the small value of a numerical range
Mean, median =AVERAGE(range): returning the average (arithmetic mean) of the dataset
=MEDIAN(range): returning the middle number in the dataset
Percentile =PERCENTILE(range,k): returning the k-th percentile of values in a range. P25,
P50 a.k.a median, P75 are common values.

Distribution Histogram Chart


(Numerical variables)

Histogram (2023) Wikipedia. Wikimedia Foundation. Available at: https://en.wikipedia.org/wiki/Histogram (Accessed: February 19, 2023).
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA analysis: BASIC STATS


Before delving into a dataset, an analyst should know some basic stats about the set:

Information SUPPORTING SPREADSHEET FUNCTION OR FORMULA


Distinct values = UNIQUE(range): returning list of unique values.
(categorical variables)

Data Type =ISTEXT(): returning TRUE when a cell contains a text value
=ISNUMBER(): returning TRUE when a cell contains a number value
Text in numerical =COUNTIF(range,"*"): returning the number of text values within a range;
variables return 0 if the range has all number values
Empty values =COUNTBLANK(range): returning the number of blank values within a range
Duplicated records =COUNT(range)-COUNTUNIQUE(range): returning the number duplicates
within a range, excluding the original value. Note: COUNTUNIQUE available on
Google Sheets only.
MId-tERM
REVIEW: working with spreadsheet

Dataset: BA_S1 - Mid-term Assignment (data file).xlsx

● Work in small team (max 4)


● Submit on eLearning by 22:00 Fri, Nov 8th, 2024
SYLLABUS

problem-solving DATA WRANGLING VISUALIZATION EXPERT


using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Techniques & tools exercise • Common pitfalls
This presentation template was created by
Slidesgo, including icons by Flaticon,
infographics & images by Freepik

Please keep this slide for attribution

You might also like