Introduction to Data Science
Data
We often use the term data to refer to computer information. This information is either
transmitted or stored. Data comes in numerous forms. Any kind of information may it be in
numbers or text or pictures is termed as data. Data is gathered and translated for some
purpose, usually analysis. However, if data is not put into context, it doesn't help in any way
to humans or computers.
Data comes in different types. Some of the common types of data include:
• Text
• Image
• Video
• Numbers
• Spreadsheets
• Sound
Internally in computers, data is stored as a series of bits that have a value of either one or
zero.
Data can be of two types:
1. Qualitative - Qualitative data is the data that is a descriptive piece of information. For
example, "What a nice day it is"
2. Quantitative – Quantitative Data is the data that is numerical information—for example,
"1", "3.65," etc.
Data Science
Every day, through various means of our lifestyle, a tremendous amount of data is
generated. When you buy something from your local grocery store, somewhere in this
world, someone keeps track of what you have purchased and in which quantity you
have purchased. Let's say you are withdrawing money from ATM for monthly
expenditure. You might withdraw various amounts of cash through multiple ATMs at
different locations. ATMs of these Banks will generate data to manage your bank
account correctly. On your social media page, you are clicking the 'Like' button for
the singer you like; on your browser, you are surfing through various tutorial videos in
a video uploading site – all these activities are creating data. This data can be
investigated and can be organized, and through careful analysis, it will give a clearer
picture of what you do and what might be offered to you to enrich your daily life. This
is what data science is about, to extract meaningful interpretations from the data. The
insight gained through the processing of the data using data science is meant for
helping our decision-making capabilities. It has various applications; for example,
helps various industries to cater to us better and helping authorities to nab criminals,
and learn cricket in a better way.
Careers in Data science
There are many careers in Data Science like Data Scientist, Data Engineer and Data
analyst.
• Data Architect and Senior Data Scientist are two roles for experienced professionals
What does Data Science help us achieve?
In simple words, data science helps us answer different types of questions that help us
achieve various objectives. Broadly, these questions can be divided into five types.
Which class does this belong to - A or B?
The answers to some questions can only be from a definite number of options.
For example, will it rain today?
A: Yes/No
Q: Will the weather be hot or cold?
A: Hot/Cold
To make such predictions, we use a family of algorithms called classification
algorithms.
Is this an outlier?
In some cases, the objective is to find outliers or anomalies in data that is otherwise
mostly consistent. These so-called anomalies could be the cause of concern especially
in cases where we need the data to be within a specific range all the time. An
unexpected change in data patterns can often be a sign of something going wrong or
possible fraud. For example, if an unexpected transaction is done from your debit card
which does not match your regular transactions, there could be a case of fraud.
Banking institutions track these records and alert the customer that an unexpected
transaction has happened, and this helps in protecting the customer's money. Some
other examples of anomaly detections are:
Q: Is this email normal or spam?
Q: You are checking your car tyre pressure. Is the reading normal?
The algorithms that are used for these types of questions are called anomaly
detection algorithms.
There are scenarios in which we must predict numerical values of a variable based on
historic data. Some examples are:
Q: How much rainfall will we receive this year?
A: 100 mm
Q: How many runs will the winning team score?
A: 320
The kind of algorithms that can predict these values are called regression algorithms.
How is the data grouped?
Sometimes data may be separated into distinct groups based on some parameters. This
approach is called clustering. For example, consider the data of the heights and
weights of three species of cats. When we perform clustering, we will get an output
that shows us the three different species of cats in three groups. What should be done
now? This question usually solves the problems of autonomous robots or selfdriving
cars that need to make decisions based on changes in external factors.
Data Visualization
Data visualization is the representation of data or information in a graph, chart, or
other visual formats. Data visualization provides a way to see and understand trends,
outliers, and patterns in data. Charts and graphs make communicating data findings
easier even if you can identify the patterns without them. The goal of data
visualization is to communicate information clearly and efficiently to users.
Common types of data visualizations are:
• Charts
• Graphs
• Tables
• Maps
• Histograms
Examples of data visualization
Example 1: Using a pie chart displays the data of the food preferred by the students.
We have the food item preference of 50 students. Let us now visualize the data using a
pie chart and find the most preferred and the least preferred food item.
Let us now visualize the data using a pie chart:
The most preferred food item is pizza and the
least preferred food item is pasta.
Importance of data visualization
To make sure that we get the required outcome from the data, we must collect the
right and relevant data. It is essential to have correct and good quality data to make an
analysis or to construct algorithms that can have an impact. Without relevant data,
your analyses will not only be irrelevant, but they can also be misleading. You cannot
expect to find perfectly pre processed raw data to be used directly for your needs.
Hence, you need to understand how the data was gathered and what sources it was
collected from. Therefore, it is essential to understand how to collect relevant data for
analysis. Let us understand what steps we need to take to make sure that we collect
the right set of data for analysis.
• Quality of the Data - Primary and most vital point to consider while collecting the
data is the quality of data that is getting collected. If we collect incomplete data, build
an unreliable database, and run analysis on skewed data sets, obviously we are not
going to arrive at the required output. The quality of data that is collected should
always be the top priority while assessing the data.
• Completeness of data - We need to make sure that the data that is getting collected
is a complete set. Incomplete sets of data may cause many discrepancies and wrong
output on analysis.
• Format of data - The format of the Data that is collected for analysis should be
right. Data should be accessible and readable for analysis. If the collected data is not
in the right format, we should convert it to the required format for analysis.
Asking the right question
Once we have the required data ready with us, the next step is to ask the right question
to the data. It is important to understand that if we don't ask the right questions, we
will never get the right answers. To make sure we perform the proper analysis of data,
we should ask the data the right set of questions.
Below are specific questions that you need to ask to your data set to get the right
answer:
• What do you wish to find? It is essential to consider what your goal is and what
decision-making it will facilitate. What outcome from the analysis would you
consider a success? These initial analysis questions are important to guide you
through the process and help focus on valuable insights. You can start by
brainstorming and preparing a draft guideline for specific questions you want to find
from the data. This will help you to dive deeper into the more specific insights you
want to achieve.
Data Science and AI
Applications of data science
Digital Advertisements - You must have noticed many times that if you do an
internet search for a thing, say, you searched for a handbag, and close the browser
once your surf is complete. Later, when you open any other applications or websites,
you see advertisements for handbags from various brands in your window. Algorithms
in data science help in tracking your searches and learn your preferences from them.
Speech Recognition - Speech recognition is now part of our everyday lives. Speech
recognition has now become a part of phones, game consoles, and even smartwatches.
Have you heard of Microsoft's Cortana? It uses speech recognition behind the scenes
to take inputs from the user.
Self Driving Cars
Fraud Detection
Website Recommendations
Overview of AI :Artificial Intelligence is defined as the science and engineering of
making intelligent machines. AI is a branch of Computer Science which deals with the
research and design of intelligent systems that can take inputs from their environment
and takes actions based on it as a human being would. In technology, Artificial
Intelligence, Machine Learning, and Deep Learning are widely used. While you may
have seen these terms get used interchangeably, each carries its significance and
application.
Worksheet(Data Science)
Please choose the correct option in the questions below.
1. A school named ABC has recorded the total marks of every student in the class.
This an example of:
a. Qualitative data b. Quantitative data
c. Both qualitative and quantitative data d. None of the above
2. A food delivery app has asked for your feedback on the quality of the food. You
have written two paragraphs to describe the food. This is an example of:
a. Qualitative Data b. Quantitative Data
c. Both qualitative and quantitative data d. None of the above
3. It would help if you predicted what the temperature would be for next Friday.
Which algorithm will you use?
a. Clustering b. Regression
c. Anomaly detection d. Binary classification
4. You need to predict if your car tire will last for the next 1000 km. Which algorithm
will you use?
a. Clustering b. Regression
c. Anomaly detection d. Binary classification
5. You want to build a way to segregate spam emails from good emails. Which
algorithm will you use?
a. Clustering b. Regression
c. Anomaly detection d. Binary classification
1. Discrete data can take any value in a range.
a. True b. False
2. Continuous data cannot take decimal values.
a. True b. False
3. Information stored in a PDF is not considered data.
a. True b. False
4. Quantitative data cannot take numerical values
a. True b. False
5. Qualitative data is descriptive in nature.
a. True b. False
6. “How is the weather like?” is what kind of data
a. Quantitative b. Qualitative
7. Which of the following is considered data?
a. Speech b. Video c. Messages d. All of the above
8. How is data used in the entertainment industry?
a. Predicting interests b. Targeting ads c. Both of the above
9. Number of days in a week is an example of?
a. Discrete Data b. Continuous Data Answer – a.
1. Data can be visualized using:
a. Graphs b. Maps c. Charts d. All of the above
2. Which of the following statement is false?
a. Data visualization can absorb information quickly.
b. Data visualization decreases the insights and takes slower decisions.
c. Data visualization is a type of visual art.
d. None of the above
3. Which of the following is a use case of data visualization?
a. Healthcare b. Sales and Marketing c. Politics/Campaigning d. All of the above .
4.Which format of data is easiest for analysis?
a. Tabular data b. Text data in a PDF c. Data in an image d. Speech data
1. Data Science can help with:
a. Speech Recognition b. Digital Advertising c. All of the above
2. Which of the following is a use case of data science?
a. Facial recognition b. Text analytics c. Sentiment analysis d. All of the above