0 ratings 0% found this document useful (0 votes) 41 views 11 pages Basics of Data Visualization A Necessity
Basics of data visualisation .This document contains basics and concepts of data visualization with lecture notes. Suitable for those who are pursuing data science as a career or for students looking for exam notes.
** NOTE - Most of them are in question answer formats as best suited for exam preparations and for easy understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Basics of Data Visualization a necessity For Later F
G.
What is data visualisation?
Data visualisation is the graphical or pictorial representation of data,
Its a technique in which data is represented in the form of charts and graphs.
Data visualisation helps to understand and analyse massive amounts of data in a
short duration of time.
The Visual Display of Quantitative Information, Edward Tutte defines ‘graphical displays’
and principles for effective graphical display in the following passage: "Excellence in
statistical graphics consists of complex ideas communicated with clarity, precision, and
efficiency. Graphical displays should:
‘* show the data
‘* induce the viewer to think about the substance rather than about methodology,
graphic design, the technology of graphic production, or something else
avoid distorting what the data has to say
present many numbers in a small space
make large data sets coherent
encourage the eye to compare different pieces of data
reveal the data at several levels of detail, rom a broad overview to the fine structure
serve a reasonably clear purpose: description, exploration, tabulation, or decoration
be closely integrated with the statistical and verbal descriptions of a data set.
Data visualisation not only helps to understand the data but also get the crucial
aspects of it easily which is nearly impossible to get when the data is in the tabular or
numerical format.
These aspects helps any data analyst to make better data driven decisions thus
helping in the progress of any organisation.
There are many tools that help to visualize data thus making the work even easier
and time saving
Eg: Microsoft excel, Microsoft Power bi, Tableau, Cognos Impromptu by IBM,etc.2. What is the need/importance of Data Visualisation?
Ans:
A
B.
In these modern times we live in a data driven world.
Every field in this world be it finance,medical,technical,pol
has data as its core etitiy.
In the year 2020 2.5 quintilion data bytes were generated on a daily basis that is 59
Zettabytes in total (one zettabyte is a trillion gigabytes.)
And the growth of data generation is increasing exponentially as we generated 74
zettabytes of data in the year 2021
Thus, itis important to clean, sort,analyse and visualise data.
Humans more easily grasp information through visualization.
Ina business context, visualization helps convey a story to decision makers, allowing
them to act more quickly than if the data were presented as reports.
The following are some use cases that stress the importance of data visualization:
‘* Helping decision makers understand how the business data is being interpreted
al education, sports,ete
to determine business decisions.
'* Leading the target audience to focus on business insights to discover areas that
require attention.
'* Handling large amounts of data in a pictorial format to provide a summary of
unseen patterns in the data, revealing insights and the story behind the data to
establish a business goal.
‘* Visualizing business data to manage growth and converting trends into business
strategies by making sense of your information.
‘* Revealing previously unnoticed key points about the data sources to help
decision makers compose data analysis reports.3. What are data visualisation techniques? Explain each of them in detail
Ans:
‘A. Data visualisation techniques are the types of graphs or plots that are used for data
visualisation
B. The major types of visualisation techniques are:
4. Line graph:
A line graph—also known as a line plot or a line chart—is a graph that
uses lines to connect individual data points.
A line graph displays quantitative values over a specified time interval.
Aline graph connects individual data points that, typically, display
quantitative values over a specified time interval.
Line graphs consist of two axes: x-axis (horizontal) and y-axis,
(vertical), graphically denoted as (x.y)
In investing, in the field of technical analysis, line graphs are quite
informative in allowing the user to visualize trends.
While line graphs are used across many different fields for different
purposes, their most common function is to oreate a graphical
depiction of changes in values over time.
In finance, line graphs are used to create visual representations of
values over time, including changes in the prices of securities.
2. Bar Graph:
A bar chart or bar graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the
values that they represent.
The bars can be plotted vertically or horizontally.
A vertical bar chart is sometimes called a column chart.
Bar graphs/charts provide a visual presentation of categorical data.
Categorical data is a grouping of data into discrete groups, such as
months of the year, age group, shoe sizes, and animals.
These categories are usually qualitative.
In a column (vertical) bar chart, categories appear along the horizontal
axis and the height of the bar corresponds to the value of each
category.
Bar charts have a discrete domain of categories, and are usually
scaled so that all the data can fit on the chart. When there is no
natural ordering of the categories being compared, bars on the chart
may be arranged in any order.
Bar charts arranged from highest to lowest incidence are called Pareto
charts.
3. Pie Charts:
A pie chart (or a circle chart) is a circular statistical graphic, which
is divided into slices to illustrate numerical proportion.
Ina pie chart, the arc length of each slice (and consequently its
central angle and area), is proportional to the quantity it represents.
While it is named for its resemblance to a pie which has been sliced,
there are variations on the way it can be presented.
The earliest known pie chart is generally credited to William Playfair's
Statistical Breviary of 1801
Pie charts are very widely used in the business world and the mass
mediaHowever, they have been criticized,and many experts recommend
avoiding them,as research has shown itis difficult to compare different
sections of a given pie chart, or to compare data across different pie
charts.
Pie charts can be replaced in most cases by other plots such as the
bar chart, box plot, dot plot, etc,
4, Histogram:
.
Histogram is an approximate representation of the distribution of
numerical data
It was first introduced by Karl Pearson.
To construct a histogram, the first step is to "bin" (or "bucket") the
range of values—that is, divide the entire range of values into a series
of intervals—and then count how many values fall into each interval.
The bins are usually specified as consecutive, non-overlapping
intervals of a variable.
The bins (intervals) must be adjacent and are often (but not required
to be) of equal size.
5. Scatter Plots:
Scatter plot is a type of plot or mathematical diagram using Cartesian
coordinates to display values for typically two variables for a set of
data
The most common use of the scatter plot is to display the relationship
between two variables and observe the nature of the relationship,
The relationships observed can either be positive or negative,
non-linear of linear, and/or, strong or weak.
6. Column Charts:
A column chart is a data visualization where each category is
represented by a rectangle, with the height of the rectangle being
proportional to the values being plotted,
Column charts are also known as vertical bar charts.
They are particularly useful when:
© The data has a small number of discrete categories, with a
single value for each category. Where there are multiple values
per category, the variables such as small multiples, cluster
column charts, and stacked column charts, shown above, are
superior.
© The goal is to compare the values of each category.
© The intent is to make it simple for the viewer. Column charts
are arguably sometimes the best of all visualizations, as they
tap into our instinctive ability to understand heights, whereas
most other data visualizations require some degree of training
for the reader to decode.4. What are data collection structures?
Ans:
© Data structure is a data organization, management, and storage format that enables
efficient access and modification.
* More precisely, a data structure is a collection of data values, the relationships
among them, and the functions or operations that can be applied to the data
‘© Types of data structures are as follows:
a. List:
‘¢ _Listis an ordered collection of items which are mutable in nature.
‘* Asingle list can contain heterogeneous datatypes in it
* Each element or value that is inside of a list is called an item.
© Eg :listt = [1,2,3.14, pi, TRUE]
b. Tuple’
‘*Tuple is also an ordered collection of items.
‘* The major difference between a list and tuple is that the items ina
tuple are immutable in nature meaning that the items in a tuple cannot
be deleted or changed but the tuple as a whole can be deleted,
© Eg: tuplet = (0,1,"Hello’3.14)
©. Dictionary:
* Itis a collection of key-value pairs.
«Keys and values are separated by a colon “:
Keys in a dictionary are immutable while values are mutable.
Eg: dict1 = {subject: ‘Data Visualisation’ Topic:"Unit 1" Rollno: 3510}
4. Dataframes:
Dataframes hold the data in a tabular format
Dataframes can hold any datatype (bool, int,string).
Itis a 2 dimensional structure consisting of rows and columns.
Besides data, you can also specify the index and column names for
your DataFrame.
The index, on the one hand, indicates the difference in rows, while the
column names indicate the difference in columns.
You will see later that these two components of the DataFrame will
‘come in handy when you're manipulating your data.5. Write a note o File I/O processing
Ans:
a. File /O processing mainly consists if the following steps:
© Opening a file:
© Performing Operations
© Closing a file
b.The following operations can be performed in a file:
reading froma file
‘w" : writing to a file (write to an existing / creates a new one incase the file does not
exist)
a": append data to the existing file
© ‘r#”: both read and write
© "b" binary file
° Eg:
‘© Chocolates = [*kitkat”, “snickers”, "mars", “munch’] Entering data
ChocolatesFile = open("Chocolate.txt” Opening the file using open()
© For iin Chocolates:
© ChocolatesFile.write(i) Writing data using “w’" and for loop
© ChocolatesFile.close() Closing the file using close()
© ChocolatesFile = open("Chocolate.txt” Reopening the file in read mode
© MyFile = ChocolatesFile.read() Reading the file
© Print(MyFile) Printing the data in the file
© ChocolatesFile.close() Closing read mode
6. Write a code to read and write data in a osv file(Can give your own example).
Ans:
Reading Data from CSV:
import csv
myfile= open('filename.csv’, *r")
mycsvdata = csv.reader(myfile)
for i in mycsvdata:
print(i)
myfile.close()
19 to csv files:
import csv
myfile = open("Filename.csv’, “w")
mydata = csv.writer(myfile)
mydata.writerow(["Sr.No", “Name, "Age"])
mydata.writerow([1, “John Doe”, 32])
mydata.writerow([2, “Jane Doe", 28))7. Write a note on RegEx.
Ans:
Regular Expressions are simply a sequence of characters.
‘These expressions are often used for pattern matching! string matching or replacing.
RegEx short for regular expressions are widely used for validating emails and
passwords and for replacing strings of data from a dataset.
We import the re module to use regular expressions
Regex consists of 3 parts:
1. Quantifiers:
Inregular expressions, quantifiers match the preceding
characters or character sets a number of times.
2 Matches zero or ane time tothe ef ofthe string
+ (Occurs one or more time othe eft ofthe string
. Matches zero or more Ueto the lef the sting
to Mathes n numberof tines
tn) Matches m or mors numberof tine
finn} Matches a est m 8s bless than nines. Or m tom ties,
0 ‘Groups the pattern to ba matched
' ‘OR: Species cher or lf the patterns to be matched
\ Excipe Oracter
a Species set ts aiphabets tobe matched
Caret is wed to check fa tring starts witha certain character,
eis wed to check if sng ends with certain characte.
Matches any singe character including a space Use re DOTALL to match anew line.
© Lazy and Greedy Quantifiers:
1. The quantifiers by default are greedy quantifiers.
2. When we add a? after any greedy quantifier it
becomes a lazy quantifier.
3. Alazy quantifier matches an element as few
times as possible whereas a greedy quantifier
will match as many times as possible.
4. Alazy quantifier will stop the moment it gets the
first match whereas the greedy quantifier will
stop after it finds the last match.2. Special Sequence:
What are special sequences in Python?
The special sequence represents the basic
predefined character classes, which have a
unique meaning. Each special sequence makes
specific common patterns more comfortable to
use.
Sa
“ Mocha igs
o Mechelen gts
ww Matches igs and sings f-2AZ09]
We Negition
w Matches the peed character atthe begring ofthe ng
(eg: sing lwo > Match
‘sings Onl tere Does ot match
ie Mathes te ped racers ath beg rte and fhe word
egtetsanng= nego srg= next > Mach
sting = senting Dos not match
. Neon fb
« Mahesh sing cot ay ce
tng helothere 3 sch
sing = halts Dees rot mth
w Matches he spcted daraces rath nd ofthe sng
‘gh une allo tere > No match
ngs cele match
Eg:
Validating mobile number. The first digit should
be 8 or 9. The entire number should be 10
digits.
Solution: ""[789]\d(9}S
Validating a landline number. The first set of
number should be 022. Then add the remaining
8 digits.
Solution: '*(022)[0-9(8}S
Validate a password with the following
instructions: Should contain at least one
uppercase letter, a special character (*, & . )
The length should be 8 or more but less than
15.
Solution:
(2=2[arz](2=.*1A-Z))(2=."d)(2=."1@S!%"78)) 48
10})3. Functions:
‘* The re module offers a set of functions that
allows us to search a string for a match:
1. findall()
This method returns a list
of strings containing all
matches.
+ eg.
© import re
© string
© pattern =
* output
pattern)
© print(output)
2. search():
© search() :
'* This method takes two
arguments: a pattern and
a string. The method
looks for the first location
where the RegEx pattern
produces a match with
the string.
+ eg
© import re
‘* string = “Hello There”
© pattern = “\s"
© output
re.search(pattern, string)
© if output
© print("The characters
have spaces in between)
© else
© print(‘The characters
have spaces in between)
3. sub():
The method returns a
string where matched
occurrences are replaced
with the content of
replace variable.eee cee
subn()
Same as sub, but it also
returns how many
‘substitutions have been
made in the code.
eg.
import re
output = re.subipattem,
replace, string)
print(output)
4, match()
eee
This method returns the
part of the string where
there is a match.
Nalw'ts'
string = ‘arrest’
result = re.match(pattern,
string)
if result:
print(‘String matches
with the patter.”)
else:
print("String does not
match with the pattern.")
5. match.group()
oe
Retums one or more
subgroups of the match.
If there is a single
argument, the result is a
single string.
If there are multiple
arguments, the result is a
tuple with one item per
argument.
import re
string = hell0 have a gr3at
day!"
pattern = ‘(\d) (\D)"
match =
re.search(pattern, string)
if match:
print(match.group())
else:
print(*pattern not found")6. split():
The split method splits
the string where there is a
match and retums a list of
strings where the splits
have occurred.
import re
string = ‘Hello There’
pattern = '\s’
output = re.split(pattern,
string)
print(output)
7. Raw string(e)
“When r or R prefix is
used before a regular
expression, it means raw
string. For example, “nis,
anew line whereas rn’
means two characters: a
backslash \ followed by n.
“Backlash \is used to
escape various
characters including all
metacharacters
However, using r prefix
makes the system treat \
as a normal character.
import re
string = "Test for printing
\n escape sequence '
print(string)
result = re findall(r\ny’,
string)
print(result)