Essential
DATA SCIENCE
Notes
A Concise PDF Guide
Table of Contents
Introduction to Data Science : 03
Key Concepts and Terminologies : 04
Essential Tools and Technologies : 08
Basic Data Manipulation Techniques : 12
Exploratory Data Analysis (EDA) : 16
Summary : 18
Data Science combines statistical, analytical, and
programming expertise to derive valuable insights
from data. As one of the most rapidly growing fields, its
applications range from straightforward data analysis
to sophisticated Machine Learning algorithms, making it
an essential skill in numerous industries. This
introductory chapter offers fundamental knowledge for
those beginning their journey in Data Science or looking
to refresh their skills. Our "Data Science Notes PDF" is a
concise resource filled with crucial information.
The adaptability and high demand for Data Science
skills have made it prominent in sectors such as
healthcare, finance, and technology. This eBook will help
you gain a better understanding of essential Data
Science concepts and practices. As you move through
the subsequent sections, keep these introductory notes
in mind as a foundation for the more advanced topics
that will be covered.
03 | www.theknowledgeacademy.com
KEY CONCEPTS AND
TERMINOLOGIES
04 | www.theknowledgeacademy.com
In the field of Data Science, there are theories and terms that are
foundational in terms of what any Data Scientist should know. To
comprehend these definitions, it is crucial for reasonable Data
Science tasks and reproducible reporting of results.
Data Science
In its essence, Data Science is a trans-
disciplinary field that applies scientific
principles, methods, statistics, algorithms, and
computer systems to manage, analyse and
model data in order to uncover hidden
patterns and perform predictions.
Algorithm
An Algorithm is a set of instructions or a list of
procedures provided to an Artificial
Intelligence system or computer program to
guide it to perform or solve mathematical
computations or other issues and arrive at a
specific conclusion.
05 | www.theknowledgeacademy.com
Big Data
This term is used to refer to the massive
amount of information, in the form of
repetitive data sets and less-formatted
information, that constantly floods a business.
Big Data can be used to gain a deeper
understanding of certain data sets and
trends, which, in turn, helps make better
decisions within a company and deploy the
right strategies.
Machine Learning (ML)
One of the fields of AI in which a system is
empowered to learn from past data and
make enhancements based on experience.
Artificial Intelligence (AI)
A vast subfield of computer science that aims
at making computers that possess the ability
to solve problems that ordinarily are only
solvable by people.
Neural Networks
Neural networks are a subset of computing
models patterned after the actual structure of
the human brain, which is applied in the
training of Artificial Intelligence from
observational data.
06 | www.theknowledgeacademy.com
Supervised Learning
A Machine Learning approach in which the
model is developed using a set of data in
which the input data is associated with the
right output.
Unsupervised Learning
Compared to supervised learning, this type of
Machine Learning operates from the data that
does not contain labels, making the algorithm
free to perform its function.
Regression Analysis
Another name for the technique that is
employed in an attempt to find out how the
variables under consideration relate to each
other. It is widely applied in the analysis of
data and for carrying out various predictive
and anticipatory assessments.
Classification
A technique in Machine Learning that sorts
data through labelling so that it can be
placed in the corresponding category.
07 | www.theknowledgeacademy.com
ESSENTIAL TOOLS
AND
TECHNOLOGIES
08 | www.theknowledgeacademy.com
In the field of Data Science, you implement various approaches
and technologies that will increase your work’s efficiency and
quality. In this eBook, we will be putting through an understanding
of the basic tools that are central to any Data Scientist.
Programming Languages
Python
Taking over the world of ML and AI with its
simplicity and the rich libraries it provides,
Python is the foundation of many Data
Scientists.
Another language extremely important to
Data Sciences is R, which is favoured for
statistical computations and graphics.
SQL
A basic understanding of SQL is crucial for
the management and extraction of data in
related databases.
09 | www.theknowledgeacademy.com
Key Libraries and Frameworks
Pandas
This is a fundamental Python library for data
organisation and processing, which includes
the necessary data structures and
mathematical functions to modify numerical
tables and time series.
NumPy
NumPy integrates effectively within scientific
computing in Python with the ability to handle
large multi-dimensional arrays and matrices,
as well as standard and high-level
mathematical functions to manipulate these
arrays.
TensorFlow and PyTorch
These frameworks are
indispensable for building and
training Machine Learning models,
and each has some specific
benefits over the others
depending on the types and
degrees of model complexity.
10 | www.theknowledgeacademy.com
Integrated Development
Environments (IDEs) and Tools
Jupyter Notebook
Jupyter is well suited for Data Science projects
and can be used for live coding, mathematical
equations and diagrams, and for writing stories
or text; this is very useful when data needs to
be visualised, or the project is collaborative.
GitHub
A system to manage revisions to projects, with
the response for coordinating activities in
conjunction with other developers and for
archiving projects on repositories using the Git
software.
11 | www.theknowledgeacademy.com
BASIC DATA
MANIPULATION
TECHNIQUES
12 | www.theknowledgeacademy.com
In the field of Data Science, data handling is crucial in
transforming data into valuable insights and knowledge. In
this part of the Data Science Notes PDF, we will introduce you
to some fundamental aspects known as data cleaning or
data pre-processing, which is crucial for data shaping right
after data collection.
Data Cleaning
Data cleansing is one of the primary procedures
that help in preparing the data for carrying out
various operations on it. This includes dealing with
missing, inaccurate and incomplete values and
eliminating cases of outliers
The effective use of processes like imputation,
where missing data is substituted by mean,
median or mode, pruning or utilising necessary
algorithms to search and predict errors is essential.
They provide clean data to use when developing
subsequent models to make certain that they are
accurate and do not misinform the business.
13 | www.theknowledgeacademy.com
Pre-processing Techniques
Data pre-processing is the process of
preparing data to be analysed, where the data
collected needs to be refined and put into an
appropriate format. Some of the pre-
processing techniques are normalisation and
textual. Data attributes need to be rescaled
between 0 and 1, while in encoding, categorical
data is converted into the numerical format.
Another striking component is the process of
feature space reduction; I mean that features
that have no strong relation to the target
variable or feature space that contains partially
relevant features are excluded, which makes
the models simpler and, theoretically, have
better performance.
14 | www.theknowledgeacademy.com
Analysis Techniques
Once the data is cleaned and pre-processed, a
simple analysis can begin next in the process.
Pre-processing can involve, for instance, sorting
the data, grouping, and aggregating it so as to
find some sort of pattern or oddity.
For example, data can be described through
measures of central tendencies such as means,
median, or measures of dispersion such as
standard deviations, which could be used to give
some sense of the behavior of a particular
dataset. In addition, correlational research is
useful in hypothesis testing concerning the
causal effect or even in forecasting causal
conditions since it involves the establishment of
relationships between variables.
15 | www.theknowledgeacademy.com
EXPLORATORY
DATA ANALYSIS
(EDA)
16 | www.theknowledgeacademy.com
Exploratory Data Analysis (EDA) is an important tool
needed in the data analysis process as it links the data
collection phase with the data analysis phase. Hence at
the core, EDA is about the discovery of the distribution,
exploring for outliers, testing conjectures, and verifying
hypothesis with descriptive statistics and graphical
means. Descriptive analysis gives an initial view of the
data set and can highlight exciting areas for further
study and model development.
More specifically, EDA entails a process that may include
the most basic graphs, such as histograms, as well as
multi-variable scatter plots. Pareto charts are used to
identify the significant factors for observation and
control, line graphs are used for trends and fluctuations,
scatter plots are used for variable relationship
observation and control, and pie charts are used to
measure the proportion of amounts.
Every graphical display aids in identifying the distribution
of the data, the association between variables and the
existence of unusual values or data points. However, it is
not only to inform the choice of data modelling
strategies but also to reveal the existing shortcomings of
the dataset acquisition and preparation phase.
17 | www.theknowledgeacademy.com
Summary
In this Data Science Notes PDF, we've distilled the
essential elements that every aspiring Data Scientist
needs to begin their journey. This eBook has allowed us
to present complex information in an accessible
manner, ensuring you can quickly grasp key concepts
and practical techniques. Whether you've explored
statistical analysis, Machine Learning fundamentals, or
the crucial tools that facilitate data manipulation and
visualisation, these pages serve as a foundational
stepping stone in your Data Science education.
The landscape of Data Science is ever-evolving, with
new technologies, methodologies, and areas of
application emerging regularly. Keep this guide handy
for quick reference, and always seek out further
resources to ensure your skills remain sharp and your
knowledge is current.
18 | www.theknowledgeacademy.com
NEW YORK SAN FRANCISCO LONDON SYDNEY DUBAI
SINGAPORE VANCOUVER BENGALURU NEW ZEALAND