Module 3: Data Visualization
Introduction
In the previous chapters, we learned about how data is collected and how we can interpret
the data by asking several types of questions on the data. In this chapter, we will learn to
visualize data and make predictions.
What is data visualization?
Data visualization is the representation of data or information in a graph, chart, or other
visual formats.
Data visualization provides a way to see and understand trends, outliers, and patterns in
data. Charts and graphs make communicating data findings easier even if you can identify
the patterns without them.
The goal of data visualization is to communicate information clearly and efficiently to users.
Common types of data visualizations are:
• Charts
• Graphs
• Tables
• Maps
• Histograms
Examples of data visualization
Example 1: Using a pie chart that displays the data of the food preferred by the students.
We have the food item preference of 50 students. Let us now visualize the data using a pie
chart and find the most preferred and the least preferred food item.
Let us now visualize the data using a pie chart:
The most preferred food item is pizza and the least preferred food item is pasta.
Example 2: Using a line chart that displays the data of the number of students present in the
class for one week.
Here is the data:
Let’s now visualize the data using line chart
We can also visualize the same data using a bar graph:
Importance of data visualization
To make sure that we get the required outcome from the data, we must collect the right
and relevant data.
It is essential to have correct and good quality data to make an analysis or to construct
algorithms that can have an impact. Without relevant data, your analyses will not only be
irrelevant but they can also be misleading.
You cannot expect to find perfectly pre-processed raw data that be used directly for your
needs. Hence you need to understand how the data was gathered and what sources it was
collected from.
Therefore it is essential to understand how to collect relevant data for analysis.
Let us understand what steps we need to take to make sure that we collect the right set of
data for analysis.
•Quality of the data - Primary and most vital point to consider while collecting the data is
the quality of data that is getting collected. If we collect incomplete data, build an
unreliable database and run analysis on skewed data sets, obviously we are not going to
arrive at the required output. The quality of data that is collected should always be the top
priority while assessing the data.
•Completeness of data - We need to make sure that the data that is getting collected is a
complete set. Incomplete sets of data may cause many discrepancies and wrong output on
analysis.
•Format of data - The format of the Data that is collected for analysis should be right. Data
should be accessible and readable for analysis. If the collected data is not in the right
format, we should convert it to the required format for analysis.
Asking the right question
Once we have the required data ready with us, the next step is to ask the right question to
the data. It is important to understand that if we don't ask the right questions, we will
never get the right answers. To make sure we perform the proper analysis of data, we
should ask the data the right set of questions.
Below are specific questions that you need to ask to your data set to get the right answer:
• What do you wish to find?
It is essential to consider what your goal is and what decision-making it will facilitate. What
outcome from the analysis would you consider a success?
These initial analysis questions are important to guide you through the process and help
focus on valuable insights. You can start by brainstorming and preparing a draft guideline for
specific questions you want to find from the data. This will help you to dive deeper into the
more specific insights you want to achieve.
• Which statistical techniques are applicable?
There are several statistical analysis techniques that you can use for analyzing data.
However, in real-life scenarios, three statistical techniques are mostly used for analysis:
a. Regression Analysis is a process for finding out the relationships and correlations
among the different variables in the data. Regression analysis helps us to figure out
how the value of the dependent variable varies when one of the independent
variables change. Thus, regression analysis helps to identify which independent
variables affect the dependent variable.
b. Cohort Analysis – it enables you to easily compare how different groups, or
cohorts, of customers, behave over time. For example, you can create a cohort of
customers based on the date when they made their first purchase. Subsequently,
you can study the spending trends of cohorts from different periods in time to
determine whether the quality of the average acquired customer is increasing or
decreasing over time.
c. Predictive Analysis – Predictive analytics involves the analysis of historical
datasets to predict future possibilities. It can also be used for generating alternative
scenarios and risk assessments.
• Who will be using the final results?
An important aspect of your data analytics refers to the end-users of our analysis. Who are
they and how they will be using the reports you create? You must get to know your final
users including:
a. What do they expect to learn from the data?
b. What do they need?
c. How advanced are their technical skills?
d. How much time do they have?
If you know these answers, you can decide on how detailed your data visualizations should
be and what areas of data your report should be focused on.
You should keep in mind that technical and non-technical users have different needs. If the
reports are designed for executives and non- technical staff, you know which insights will be
useful for them and what level of data complexity they can handle.
If external parties will be using your reports, the reports you make should provide them with
an easy-to-use and actionable interface. The final users should be able to read and interpret
them independently, with no support needed.
• Which visualizations should you pick?
Once your data is clean and your statistical analysis is done, you need to pick your
visualizations. You can have impactful and valuable insights from the data, but if they're
presented badly, the end-users won't be able to understand the insights from them.
It is essential to convince executive and decision-makers that the data that you have
gathered and analyzed are:
a. Correct
b. Important
c. Urgent to act upon
Effective presentation aids in all these areas. There are several kinds of charts to pick from.
You can improve your chances of getting good feedback by choosing the right data
visualization type.
There are several data visualization software like Power BI that can perform most of the
tedious aspects of data cleaning as well. They can not only help to prepare the data but also
interpret the insights. Because they are so easy to use and test data hypotheses without
the need for intensive training, these tools have become an invaluable resource in today's
data management practice.
These tools are very flexible for the end-user and can easily adjust to your prepared
questions for analyzing data, the tools can help to perform a voluminous analysis.
Module 4: Data Science and AI
Introduction
In the last chapter, we saw how we can visualize and make predictions from the data. In this
chapter, we will learn about the applications of data science and the basics of artificial
intelligence.
Applications of data science
Looking at the advantages of data science, it is obvious that many companies have applied
data science to help their business grow. If we look around us, data science is everywhere.
In numerous ways, it is affecting our day-to-day life.
Digital Advertisements - You must have noticed many times that if you do an internet
search for a thing, say, you searched for a handbag, and close the browser once your surf is
complete. Later, when you open any other applications or websites, you see advertisements
for handbags from various brands in your window. Ever wondered, how this new
application or website knows that you are looking to buy a handbag? Well, the answer to
this is data science. Algorithms in data science help in tracking your searches and learn your
preferences from them.
Speech Recognition - Speech recognition is now part of our everyday lives. Speech
recognition, or speech-to-text, is the ability of a machine or program to identify words
spoken aloud and convert them into readable text.
Speech recognition has now become a part of phones, game consoles, and even
smartwatches. Have you heard of Microsoft's Cortana? It uses speech recognition behind
the scenes to take inputs from the user.
Speech recognition can also be found on many devices that can be used to automate our
homes.
Analytics on text data
Text analytics can be defined as the process of collecting unstructured text from various
sources and analyzing and extracting relevant information from it. It can also be used for
transforming it into structured information that can then be used in various other ways.
There are several ways to analyze unstructured text. Most of these techniques can be
divided under these technical areas - Natural Language Processing (NLP), data mining, and
information retrieval.
Typically, we used text analytics technologies for four basic tasks – querying data, mining
data, search data, and analyzing data to get insights.
For example, if we have a database with customer data, an end-user could query the
database to find out how many customers have started using the company's services in the
last quarter and how many have stopped using the service. They can do so by just entering
a query in plain English instead of a query language like SQL.
Chatbots are also an important area that uses text analytics for both querying and
searching data. Chatbots can use to query a database and give a reply based on the
question. They can also use search based on text analytics to help in retrieving a document
based on what end users are looking for.
Analytics on image data
Image recognition can be described as a process by which we can process images for
identifying people, patterns, logos, objects or places.
Many machine learning tools can assist users with facial recognition of objects in a picture.
These tools can also scan the objects in the picture and attempt to identify and name them
based on a large database of images.
Mobile phones, for example, make use of computer vision technologies in combination with
a camera to achieve image recognition. This advanced technology has a variety of
applications like accessibility for the visually impaired and interactive advertising.
Facial recognition is also used by many organizations to check the attendance of workers
and by government organisations for identification purposes.
Besides identifying faces and detecting objects in images, AI is also capable of recognizing
special patterns, in the images and matched them with its database.
Another growing application of image analytics is in searching content based on images.
Some search engines now allow users to upload images and search based on that.
Overview of AI
Artificial Intelligence is defined as the science and engineering of making intelligent
machines. AI is a branch of Computer Science which deals with the research and design of
intelligent systems that can take inputs from their environment and takes actions based on it
as a human being would.
In technology, Artificial Intelligence, Machine Learning, and Deep Learning are widely used.
While you may have seen these terms getting used interchangeably, each carries its
significance and application.
Artificial Intelligence, Machine Learning are subsets of each other, while Machine Learning
is the superset of Deep Learning. Artificial Intelligence aims at making machines as a smart
human.
This main goal of Artificial Intelligence can be explained using the below sub-goals:
a. Logical Reasoning: AI aims at making computers capable of doing all the intelligent and
sophisticated tasks that we humans can do. For example, solving problems that require
logical reasoning like switching on the fan because it is hot or solving complex
mathematical problems.
b. Knowledge Representation: Make computers capable of describing objects. For
example, describing a car that just violated the traffic norms.
c. Planning and Navigation: Making computers capable of traveling from Point X to Point Y.
For example, a self-driving robot.
d. Natural Language Processing: Make computers capable of understanding and processing
a language. For example, a web translator that translates one language to another.
e. Perception: Make computers capable of interacting with real- world objects by the sense
touch, sound, smell and eyesight.
f. Emergent Intelligence: Make computers capable of Intelligence that is not explicitly
programmed but is derived from AI capabilities. The basic vision for this goal is to enable
machines to exhibit emotional intelligence, moral reasoning, and more.