The picture can't be displayed.
Principles of Data Visualization
Data Manipulation
Data Manipulation
• Why do we need to manipulate data?
• Data in the real world is rarely formatted
• Different visualization tools use different data formats
• Reformatting helps us re-purpose data
6
Jupyter Notebooks
Python based data
manipulation suite
Uses a web browser for
managing file and
programming functions
Very easy loading and
manipulation of large data
files
3
Importing Libraries
import – Used to import a python library
as – Used to define alias for library
pandas – Used for data manipulation and analysis
numpy – Used for handling large, multi-dimensional arrays
and matrices
4
Importing Libraries
matplotlib – Used for 2D plotting in Python
pyplot – Provides a MATLAB-like plotting framework
seaborn – Used to create attractive and informative statistical
graphics
os – Allows to use operating system dependent functionality
%matplotlib inline – Allows to produce inline graphs
5
Data Loading
path variable is used here to assign the dataset location
data_dir variable is used to concatenate the path with the
name of the excel file
6
Data Loading
df acts as a pandas object which reads and loads the excel file
sheet_names is an attribute of ExcelFile object
sheet_names displays the name of all the sheets present within
an excel workbook
7
Data Frame Definition
data.frame() - List of variables(vectors) with same number of rows
A data frame is used for storing data tables
Vectors
R Help, r-tutor.com 8
DataFrame
DataFrame is a 2-dimensional labeled data structure with
columns of potentially different types.
df_observation is used as a pandas dataframe to load excel data
Observations was one of the sheets from the Grocery Prices
excel workbook
9
Viewing Data
head() allows viewing the top 5 rows in the dataframe
head() can take parameters to view ‘n’ number of rows
head(10) will display the top 10 rows
10
Exploring Data
info() provides information like total number of rows and
columns, the name of each column, and their datatypes
info() can also be analyzed to determine columns with nulls or
missing values
11
Nulls in DataFrame
Nulls are any blank or missing value in the dataset
isnull() finds any NaN (not a number) or None values
isnull.sum() searches for nulls in each column
12
Nulls in DataFrame
isnull().sum().sum() provides total nulls across all rows and
columns in the df_observation dataframe
13
Dropping NA values
dropna() is used to drop na values (null, Nan, None)
Axis parameter defines the axis - 0 means rows, 1 means
columns
Running the above code will remove all rows where na values
exist
14
Plotting Histogram
figsize allows to set the dimension of the graphic
In figsize - 20 is width, 8 is height
hist() creates an histogram for Price using 27 bins (columns) as
a user specification
show() makes the histogram plot visible
15
Skewness
If the mean is greater than the median, the distribution is positively
skewed
If the mean is less than the median, the distribution is negatively
skewed
If the mean and the median are same, then it is symmetrical
16
Imputing NA (null) values
If imputation technique is selected as median then above code
will replace all na values in Price with the median Price value
If imputation technique is selected as mean then above code
will replace all na values in Price with the median Price value
17
Unique Values
unique() determines the unique or distinct values
list() gives out the values as a list
18
Mapping Key Values
Greece
India
China
Ireland
19
Mapping Key Values
The Country column contains mis-spelled names
To map correct values map() is used
Within map(), each existing value is assigned a new value
20
Handling Duplicates
Row 3 and 8 – India
Row 5 and 6 – China
Total Duplicate Values – 2 (India,
China)
21
Handling Duplicates
duplicated() returns a series of duplicate values
• Row 6 is 2nd Instance of China
• Row 8th is 2nd Instance of India
22
Handling Duplicates
duplicated().sum() returns total duplicate values
drop_duplicates() drops the duplicate values in the dataframe
23
Merging Datasets
DataFrame1 DataFrame 2
24
Merging Datasets
merge() allows to join two dataframes
DataFrame.merge(self, right, how='inner', on=None)
self : First Dataframe
right : Dataframe to merge with
how : determines the type of join
{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
on : Column/Index levels to join
on = None, It merges on intersection of column in both
dataframes
25
Merging Datasets With Parameters
Merging DataFrame1 with DataFrame2 on parameter ID
26
Merging Datasets Without Parameters
Merging DataFrame1 with DataFrame2 without passing any on
parameter
27
Sorting Values
sort_values sorts a column of values in a dataframe
sort_values has default parameters like (by, axis=0,
ascending=True, inplace=False, kind='quicksort',
na_position='last’)
by refers to the column by which the data has to be sorted
28
Workspace Area
Analysis worksheets are created first
Dashboards are constructed from worksheets
Stories are constructed from worksheets and dashboards
Worksheet Dashboard Story
29
Visualization Worksheet
View constructed of visually encoded data elements
Data elements are dragged to cards or shelves to forms rows
and columns
One or multiple data elements can be dragged to create single
or multiple axes
30
Visual Data Encoding
• Automatic adjusts to data selections
• Bar chart provides comparison
• Line chart provides trend
• Area provides trend and comparison
• Shape provides complex comparison
• Maps provide proximity in space
• Pie provides % contribution
• Gantt provides relationship of measures in
time
• Polygon creates data areas
31
Visualization Standard View
Image Source: New York Times
32
Best Practices Recommender
Data elements can be highlighted and the show me function
will recommend good visualizations
33
Ask Data Functionality
Automatic view building from asking questions
Clarifications like “as a bar chart” “by country”
34
Worksheet – Customer Scatter
Customer data from 2011 to 2017 across all US region is used
Sales data is plotted against the corresponding profit.
The scatter plot uses color to represent the profit ratio
Source: Tableau – Sample Store 35
Worksheet – Customer Rank
Customer data is sorted per the sum of sales to give customer rank
Horizontal bar chart uses color encoding to represent profit ratio
Tooltip provides additional info such as profit and sales rank
36
Worksheet – Customer Overview
For each of the measures, data is plotted against each region
The tooltip gives an overview of all the measure values
37
What is a Dashboard?
“A visual display of the most important information needed to achieve one or more
objectives; consolidated and arranged on a single screen so the information can be
monitored at a glance”
(Stephen Few)
38
Create a Dashboard
Fixed size (default): The dashboard remains
the same size, regardless of the size of the
window used to display it
Range: The dashboard scales between minimum
and maximum sizes. Scroll bars are displayed when
the dashboard is smaller than the minimum size
Automatic: The dashboard automatically resizes
to fill the window used to display it
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_organize_floatingandtiled.htm 39
Critical Aspects of the Dashboard
• Essential Strategic Metrics
• Single Visual Frame
• Monitored at a Glance
• Designed with Audience Context
40
Hospitality Design Metaphor
Image Source: hermentorcenter.com, www.sap.com.
41
Customer Acquisition Design Metaphor
Cold
Warm
Hot
Opportunity
42
Dashboard Layouts
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_organize_floatingandtiled.htm 43
Dashboard Layouts
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_organize_floatingandtiled.htm 44
Dashboard – Customer Analysis
It combines all the previous 3 worksheets to give an analytical
view for comparison using tiled layout
Source: Tableau – Sample Store 45
Dashboard – Website Analytics
The dashboard combines overview of Pageviews, topic areas,
map and weekdays using floating layout
Source: The Big Book of Dashboards (BigBookofDashboards.com) 46
Dashboard – Ease of Business
The dashboard combines 4 business tax rates worksheets
using horizontal layout
Source: Tableau – World Indicators 47
Dashboard – Economic Indicators
The dashboard combines 3 different returns worksheets using
vertical layout
Source: Tableau – Regional 48
Add Interactivity to Dashboard
Show filters –
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_create.htm 49
Add Interactivity to Dashboard
Enable highlighting –
A highlighter allows to highlight parts of a view based on what
one enters or selects
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_best_practices.htm 50
Dashboard Actions
Use a single view to filter other views in a dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 51
Dashboard Actions
Use multiple views to filter other views in a dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 52
Dashboard Actions
Navigate from one view to another view, dashboard, or story
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 53
Dashboard Actions
Interactively display a web page in a dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 54
Modify Layout for Mobile Devices
“Use automatic layout” option allows to automatically synchronize any
changes to the Default dashboard
“Edit layout myself ” makes the Phone layout fully independent, so
manually add and arrange items to reflect changes to the Default
dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_dsd_create.htm 55
Summary
• Define your titles and axis for what you want to say
• Reformat data files for the visualization tools you wish to use
• Clean and format data to a solid executable file
• Create knowledge of the data domain
• Define the user experience
• Visually encode the data