Lesson 1: Introduction to Data Science
Today, Data rules the world. This has resulted in a huge demand
for Data Scientists.
A Data Scientist helps companies with data-driven decisions, to
make their business better.
Data Science is a combination of multiple disciplines that uses
statistics, data analysis, and machine learning to analyze data
and to extract knowledge and insights from it.
What is Data?
Data is a collection of information.
One purpose of Data Science is to structure data, making it
interpretable and easy to work with.
Data can be categorized into two groups:
Structured data
Unstructured data
Unstructured Data
Unstructured data is not organized. We must organize the data
for analysis purposes.
1
Structured Data
Structured data is organized and easier to work with.
2
How to Structure Data?
We can use an array or a database table to structure or present
data.
Example of an array:
[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
The following example shows how to create an array in Python:
Example
Array = [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
print(Array)
It is common to work with very large data sets in Data Science.
In this tutorial we will try to make it as easy as possible to
understand the concepts of Data Science. We will therefore work
with a small data set that is easy to interpret.
What is Data Science?
Data Science is about data gathering, analysis and decision-
making.
3
Data Science is about finding patterns in data, through analysis,
and make future predictions.
By using Data Science, companies are able to make:
Better decisions (should we choose A or B)
Predictive analysis (what will happen next?)
Pattern discoveries (find pattern, or maybe hidden
information in the data)
Where is Data Science Needed?
Data Science is used in many industries in the world today, e.g.
banking, consultancy, healthcare, and manufacturing.
Examples of where Data Science is needed:
For route planning: To discover the best routes to ship
To foresee delays for flight/ship/train etc. (through
predictive analysis)
To create promotional offers
To find the best suited time to deliver goods
To forecast the next years revenue for a company
To analyze health benefit of training
To predict who will win elections
Data Science can be applied in nearly every part of a business
where data is available. Examples are:
Consumer goods
Stock markets
Industry
Politics
Logistic companies
E-commerce
How Does a Data Scientist Work?
A Data Scientist requires expertise in several backgrounds:
Machine Learning
4
Statistics
Programming (Python or R)
Mathematics
Databases
A Data Scientist must find patterns within the data. Before
he/she can find the patterns, he/she must organize the data in a
standard format.
Here is how a Data Scientist works:
Ask the right questions - To understand the business
problem.
Explore and collect data - From database, web logs,
customer feedback, etc.
Extract the data - Transform the data to a standardized
format.
Clean the data - Remove erroneous values from the data.
Find and replace missing values - Check for missing
values and replace them with a suitable value (e.g. an
average value).
Normalize data - Scale the values in a practical range (e.g.
140 cm is smaller than 1,8 m. However, the number 140 is
larger than 1,8. - so scaling is important).
Analyze data, find patterns and make future
predictions.
Represent the result - Present the result with useful
insights in a way the "company" can understand.