0% found this document useful (0 votes)

18 views8 pages

Term2 Datascience Notes

Module 3 focuses on data visualization, explaining its importance in understanding trends and patterns through various formats like charts and graphs. It emphasizes the need for quality and complete data for effective analysis and the significance of asking the right questions to derive meaningful insights. Module 4 introduces data science applications, including digital advertisements and speech recognition, and provides an overview of artificial intelligence, its goals, and its relationship with machine learning and deep learning.

Uploaded by

Smitha Jayanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

Term2 Datascience Notes

Uploaded by

Smitha Jayanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Module 3: Data Visualization

Introduction
In the previous chapters, we learned about how data is collected and how we can interpret
the data by asking several types of questions on the data. In this chapter, we will learn to
visualize data and make predictions.

What is data visualization?

Data visualization is the representation of data or information in a graph, chart, or other
visual formats.

Data visualization provides a way to see and understand trends, outliers, and patterns in
data. Charts and graphs make communicating data findings easier even if you can identify
the patterns without them.

The goal of data visualization is to communicate information clearly and efficiently to users.

Common types of data visualizations are:

• Charts

• Graphs

• Tables

• Maps

• Histograms

Examples of data visualization

Example 1: Using a pie chart that displays the data of the food preferred by the students.
We have the food item preference of 50 students. Let us now visualize the data using a pie
chart and find the most preferred and the least preferred food item.
Let us now visualize the data using a pie chart:

The most preferred food item is pizza and the least preferred food item is pasta.

Example 2: Using a line chart that displays the data of the number of students present in the
class for one week.

Here is the data:

Let’s now visualize the data using line chart

We can also visualize the same data using a bar graph:

Importance of data visualization

To make sure that we get the required outcome from the data, we must collect the right
and relevant data.
It is essential to have correct and good quality data to make an analysis or to construct
algorithms that can have an impact. Without relevant data, your analyses will not only be
irrelevant but they can also be misleading.
You cannot expect to find perfectly pre-processed raw data that be used directly for your
needs. Hence you need to understand how the data was gathered and what sources it was
collected from.
Therefore it is essential to understand how to collect relevant data for analysis.
Let us understand what steps we need to take to make sure that we collect the right set of
data for analysis.
•Quality of the data - Primary and most vital point to consider while collecting the data is
the quality of data that is getting collected. If we collect incomplete data, build an
unreliable database and run analysis on skewed data sets, obviously we are not going to
arrive at the required output. The quality of data that is collected should always be the top
priority while assessing the data.
•Completeness of data - We need to make sure that the data that is getting collected is a
complete set. Incomplete sets of data may cause many discrepancies and wrong output on
analysis.
•Format of data - The format of the Data that is collected for analysis should be right. Data
should be accessible and readable for analysis. If the collected data is not in the right
format, we should convert it to the required format for analysis.
Asking the right question
Once we have the required data ready with us, the next step is to ask the right question to
the data. It is important to understand that if we don't ask the right questions, we will
never get the right answers. To make sure we perform the proper analysis of data, we
should ask the data the right set of questions.
Below are specific questions that you need to ask to your data set to get the right answer:

• What do you wish to find?

It is essential to consider what your goal is and what decision-making it will facilitate. What
outcome from the analysis would you consider a success?

These initial analysis questions are important to guide you through the process and help
focus on valuable insights. You can start by brainstorming and preparing a draft guideline for
specific questions you want to find from the data. This will help you to dive deeper into the
more specific insights you want to achieve.

• Which statistical techniques are applicable?

There are several statistical analysis techniques that you can use for analyzing data.
However, in real-life scenarios, three statistical techniques are mostly used for analysis:
a. Regression Analysis is a process for finding out the relationships and correlations
among the different variables in the data. Regression analysis helps us to figure out
how the value of the dependent variable varies when one of the independent
variables change. Thus, regression analysis helps to identify which independent
variables affect the dependent variable.

b. Cohort Analysis – it enables you to easily compare how different groups, or

cohorts, of customers, behave over time. For example, you can create a cohort of
customers based on the date when they made their first purchase. Subsequently,
you can study the spending trends of cohorts from different periods in time to
determine whether the quality of the average acquired customer is increasing or
decreasing over time.

c. Predictive Analysis – Predictive analytics involves the analysis of historical

datasets to predict future possibilities. It can also be used for generating alternative
scenarios and risk assessments.

• Who will be using the final results?

An important aspect of your data analytics refers to the end-users of our analysis. Who are
they and how they will be using the reports you create? You must get to know your final
users including:
a. What do they expect to learn from the data?
b. What do they need?
c. How advanced are their technical skills?
d. How much time do they have?
If you know these answers, you can decide on how detailed your data visualizations should
be and what areas of data your report should be focused on.

You should keep in mind that technical and non-technical users have different needs. If the
reports are designed for executives and non- technical staff, you know which insights will be
useful for them and what level of data complexity they can handle.

If external parties will be using your reports, the reports you make should provide them with
an easy-to-use and actionable interface. The final users should be able to read and interpret
them independently, with no support needed.

• Which visualizations should you pick?

Once your data is clean and your statistical analysis is done, you need to pick your
visualizations. You can have impactful and valuable insights from the data, but if they're
presented badly, the end-users won't be able to understand the insights from them.
It is essential to convince executive and decision-makers that the data that you have
gathered and analyzed are:

a. Correct

b. Important
c. Urgent to act upon
Effective presentation aids in all these areas. There are several kinds of charts to pick from.
You can improve your chances of getting good feedback by choosing the right data
visualization type.

There are several data visualization software like Power BI that can perform most of the
tedious aspects of data cleaning as well. They can not only help to prepare the data but also
interpret the insights. Because they are so easy to use and test data hypotheses without
the need for intensive training, these tools have become an invaluable resource in today's
data management practice.
These tools are very flexible for the end-user and can easily adjust to your prepared
questions for analyzing data, the tools can help to perform a voluminous analysis.

Module 4: Data Science and AI

Introduction
In the last chapter, we saw how we can visualize and make predictions from the data. In this
chapter, we will learn about the applications of data science and the basics of artificial
intelligence.
Applications of data science
Looking at the advantages of data science, it is obvious that many companies have applied
data science to help their business grow. If we look around us, data science is everywhere.
In numerous ways, it is affecting our day-to-day life.

Digital Advertisements - You must have noticed many times that if you do an internet
search for a thing, say, you searched for a handbag, and close the browser once your surf is
complete. Later, when you open any other applications or websites, you see advertisements
for handbags from various brands in your window. Ever wondered, how this new
application or website knows that you are looking to buy a handbag? Well, the answer to
this is data science. Algorithms in data science help in tracking your searches and learn your
preferences from them.

Speech Recognition - Speech recognition is now part of our everyday lives. Speech
recognition, or speech-to-text, is the ability of a machine or program to identify words
spoken aloud and convert them into readable text.

Speech recognition has now become a part of phones, game consoles, and even
smartwatches. Have you heard of Microsoft's Cortana? It uses speech recognition behind
the scenes to take inputs from the user.

Speech recognition can also be found on many devices that can be used to automate our
homes.

Analytics on text data

Text analytics can be defined as the process of collecting unstructured text from various
sources and analyzing and extracting relevant information from it. It can also be used for
transforming it into structured information that can then be used in various other ways.

There are several ways to analyze unstructured text. Most of these techniques can be
divided under these technical areas - Natural Language Processing (NLP), data mining, and
information retrieval.

Typically, we used text analytics technologies for four basic tasks – querying data, mining
data, search data, and analyzing data to get insights.

For example, if we have a database with customer data, an end-user could query the
database to find out how many customers have started using the company's services in the
last quarter and how many have stopped using the service. They can do so by just entering
a query in plain English instead of a query language like SQL.

Chatbots are also an important area that uses text analytics for both querying and
searching data. Chatbots can use to query a database and give a reply based on the
question. They can also use search based on text analytics to help in retrieving a document
based on what end users are looking for.

Analytics on image data

Image recognition can be described as a process by which we can process images for
identifying people, patterns, logos, objects or places.
Many machine learning tools can assist users with facial recognition of objects in a picture.
These tools can also scan the objects in the picture and attempt to identify and name them
based on a large database of images.

Mobile phones, for example, make use of computer vision technologies in combination with
a camera to achieve image recognition. This advanced technology has a variety of
applications like accessibility for the visually impaired and interactive advertising.

Facial recognition is also used by many organizations to check the attendance of workers
and by government organisations for identification purposes.

Besides identifying faces and detecting objects in images, AI is also capable of recognizing
special patterns, in the images and matched them with its database.

Another growing application of image analytics is in searching content based on images.

Some search engines now allow users to upload images and search based on that.

Overview of AI
Artificial Intelligence is defined as the science and engineering of making intelligent
machines. AI is a branch of Computer Science which deals with the research and design of
intelligent systems that can take inputs from their environment and takes actions based on it
as a human being would.
In technology, Artificial Intelligence, Machine Learning, and Deep Learning are widely used.
While you may have seen these terms getting used interchangeably, each carries its
significance and application.

Artificial Intelligence, Machine Learning are subsets of each other, while Machine Learning
is the superset of Deep Learning. Artificial Intelligence aims at making machines as a smart
human.

This main goal of Artificial Intelligence can be explained using the below sub-goals:

a. Logical Reasoning: AI aims at making computers capable of doing all the intelligent and
sophisticated tasks that we humans can do. For example, solving problems that require
logical reasoning like switching on the fan because it is hot or solving complex
mathematical problems.

b. Knowledge Representation: Make computers capable of describing objects. For

example, describing a car that just violated the traffic norms.

c. Planning and Navigation: Making computers capable of traveling from Point X to Point Y.
For example, a self-driving robot.

d. Natural Language Processing: Make computers capable of understanding and processing

a language. For example, a web translator that translates one language to another.

e. Perception: Make computers capable of interacting with real- world objects by the sense
touch, sound, smell and eyesight.

f. Emergent Intelligence: Make computers capable of Intelligence that is not explicitly

programmed but is derived from AI capabilities. The basic vision for this goal is to enable
machines to exhibit emotional intelligence, moral reasoning, and more.

Data Analysis For Grade 5 Elementary
No ratings yet
Data Analysis For Grade 5 Elementary
24 pages
Unit 1
No ratings yet
Unit 1
9 pages
Notes 3 (Prepare Coursera)
No ratings yet
Notes 3 (Prepare Coursera)
67 pages
Learn Data Analysis Private PDF
No ratings yet
Learn Data Analysis Private PDF
54 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
CH 1
No ratings yet
CH 1
31 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Introduction To Data Analysis
100% (1)
Introduction To Data Analysis
94 pages
Advantages and Disadvantages of Data Analytics
No ratings yet
Advantages and Disadvantages of Data Analytics
6 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
Data Analysis Is The Process of Gatherin1
No ratings yet
Data Analysis Is The Process of Gatherin1
5 pages
FTA-Module 1-Notes
No ratings yet
FTA-Module 1-Notes
24 pages
GENERAL - Data Analyst Interview Questions
No ratings yet
GENERAL - Data Analyst Interview Questions
57 pages
Data Analysis Is The Process of Gathering
No ratings yet
Data Analysis Is The Process of Gathering
5 pages
Lesson 1 Notes
No ratings yet
Lesson 1 Notes
14 pages
Unit 1 Notes Final Part C
No ratings yet
Unit 1 Notes Final Part C
38 pages
Data Analysis For Beginners PDF Download 2025 1
No ratings yet
Data Analysis For Beginners PDF Download 2025 1
8 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
Big Data
No ratings yet
Big Data
4 pages
Data Science
No ratings yet
Data Science
6 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
8 pages
Data Analysis 2
No ratings yet
Data Analysis 2
8 pages
Approaches in Data Analysis (Slides)
No ratings yet
Approaches in Data Analysis (Slides)
13 pages
Top 65 SQL Data Analysis Q&A
No ratings yet
Top 65 SQL Data Analysis Q&A
53 pages
Unit 1
No ratings yet
Unit 1
57 pages
Bi Tools - Comparative Study
No ratings yet
Bi Tools - Comparative Study
14 pages
Dcova Framework
No ratings yet
Dcova Framework
7 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
Data Analytics Key Notes
No ratings yet
Data Analytics Key Notes
5 pages
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
100% (1)
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
481 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Data Processing & Analysis Guide
100% (3)
Data Processing & Analysis Guide
38 pages
Basic Data Analysis
No ratings yet
Basic Data Analysis
16 pages
6 Phrase of Data Analysis
No ratings yet
6 Phrase of Data Analysis
9 pages
What Is Data Analysis
No ratings yet
What Is Data Analysis
5 pages
Data Analysis Notes
No ratings yet
Data Analysis Notes
29 pages
Data Science - III
No ratings yet
Data Science - III
94 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Unit 2
No ratings yet
Unit 2
58 pages
Notes
No ratings yet
Notes
5 pages
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
No ratings yet
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
15 pages
7 Steps for Effective Data Analysis
No ratings yet
7 Steps for Effective Data Analysis
20 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
7 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
34 pages
Advanced Data Analytics and Visualization Course Material
No ratings yet
Advanced Data Analytics and Visualization Course Material
45 pages
1 Da
No ratings yet
1 Da
44 pages
Data Analysis Coursea
No ratings yet
Data Analysis Coursea
15 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
EDA - Unit 1
No ratings yet
EDA - Unit 1
82 pages
Data Understanding and Prepration
100% (1)
Data Understanding and Prepration
10 pages
Data Analysis
No ratings yet
Data Analysis
50 pages
Notes Data Science With Python 1
No ratings yet
Notes Data Science With Python 1
18 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
What Is Data Visualization and Why Is It Important
No ratings yet
What Is Data Visualization and Why Is It Important
18 pages
Research Paper PDF
No ratings yet
Research Paper PDF
15 pages
MULTIVARIATE ANALYSIS Part 1
No ratings yet
MULTIVARIATE ANALYSIS Part 1
30 pages
Approaches in Data Analysis (Slides) (Re-Brand)
No ratings yet
Approaches in Data Analysis (Slides) (Re-Brand)
13 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Draft GIS Manual For CDRA Steps 3 and 4 - 09222022
No ratings yet
Draft GIS Manual For CDRA Steps 3 and 4 - 09222022
172 pages
Advanced Database Management System MCQ With Answers - 071802
No ratings yet
Advanced Database Management System MCQ With Answers - 071802
21 pages
Railway MGMT System
No ratings yet
Railway MGMT System
49 pages
Session 31 Quiz Explanation
No ratings yet
Session 31 Quiz Explanation
7 pages
DMKD Question Bank
No ratings yet
DMKD Question Bank
40 pages
OCP Syllabus: Java Class Design
No ratings yet
OCP Syllabus: Java Class Design
3 pages
SC-100 Exam - Free Actual Q&as, Page 1 - ExamTopics
100% (4)
SC-100 Exam - Free Actual Q&as, Page 1 - ExamTopics
395 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
01 - Carbon Black Cloud - Audit and Remediation User Guide
No ratings yet
01 - Carbon Black Cloud - Audit and Remediation User Guide
51 pages
D353020X012 PRV2SIZE Brochure PDF
No ratings yet
D353020X012 PRV2SIZE Brochure PDF
16 pages
StreamServe Persuasion SP5 Document Broker Plus
No ratings yet
StreamServe Persuasion SP5 Document Broker Plus
30 pages
Ermi - G - Updated Resume (02-21-23)
No ratings yet
Ermi - G - Updated Resume (02-21-23)
5 pages
T5 DNS
No ratings yet
T5 DNS
8 pages
Engineering Guide: The Standard
No ratings yet
Engineering Guide: The Standard
76 pages
Troubleshooting '9300' and '10807' Errors: Symptom
No ratings yet
Troubleshooting '9300' and '10807' Errors: Symptom
3 pages
Note 1127194 - R3trans Import With Parallel Processes
No ratings yet
Note 1127194 - R3trans Import With Parallel Processes
5 pages
L4DC Qualification Specification With Specialisms V3.4
No ratings yet
L4DC Qualification Specification With Specialisms V3.4
69 pages
What Is Undo?
No ratings yet
What Is Undo?
7 pages
MCA DBMS Exam Paper
No ratings yet
MCA DBMS Exam Paper
3 pages
PDF Synopsis of Online Chat Application
No ratings yet
PDF Synopsis of Online Chat Application
29 pages
Computer Part - II
No ratings yet
Computer Part - II
5 pages
Database Management System L6 MARCH MOCK
No ratings yet
Database Management System L6 MARCH MOCK
5 pages
ZS Hiring - Roles & Eligibility Criteria
No ratings yet
ZS Hiring - Roles & Eligibility Criteria
3 pages
CS405 Finalterm Current Paper Solved by Maha
No ratings yet
CS405 Finalterm Current Paper Solved by Maha
35 pages
Siddharth Cs
No ratings yet
Siddharth Cs
40 pages
Resource Pack ServiceNow Administration Fundamentals Tokyo Final
0% (1)
Resource Pack ServiceNow Administration Fundamentals Tokyo Final
10 pages
DDD - Assignment - BKC18368
No ratings yet
DDD - Assignment - BKC18368
28 pages
Unit 3 Data-Analytics
No ratings yet
Unit 3 Data-Analytics
48 pages
Science Citation: Indexing
No ratings yet
Science Citation: Indexing
34 pages
Garimella Srikanth Resume
No ratings yet
Garimella Srikanth Resume
5 pages