Unit 2 - Data Science
CONTENTS
1. Basics of Data Analytics 4. Advanced Data Analytics
01
Basics of Data Analytics
What is Data Science
Data Science is a field that This field is related to Artificial
Data science comprises
deals with extracting Intelligence and is currently
mathematics, computations,
meaningful information and one of the most demanded
statistics, programming, etc.,
insights by applying various skills.
to gain meaningful insights
algorithms, preprocessing,
from the large amount of data
and scientific methods on
provided in various formats.
structured and unstructured
data.
What is Data Analytics
Data Analytics is used to get conclusions by processing the raw data.
It is helpful in various businesses as it helps the company to make decisions based on the conclusions
from the data.
Data analytics helps to convert a large number of figures in the form of data into plain English, i.e.,
conclusions which are further helpful in making in-depth decisions.
What is Data Analytics
Feature Data Science Data Analytics
Coding Language Python is the most commonly The Knowledge of Python and
used language for data R Language is essential for
science along with the use of Data Analytics.
other languages such as C+
+, Java, Perl, etc.
Programming Skills In-depth knowledge of Basic Programming skills are
programming is required for necessary for data analytics.
data science.
Use of Machine Learning Data Science makes use of Data Analytics does not use
machine learning algorithms machine learning to get the
to get insights. insight of data.
Other Skills Data Science makes use of Hadoop Based analysis is
Data mining activities for used for getting conclusions
getting meaningful insights. from raw data.
Scope The scope of data science is The Scope of data analysis is
large. micro, i.e., small.
Life Cycle Phases of Data
Analytics
The Data analytic lifecycle is designed for Big Data problems
and data science projects.
The cycle is iterative to represent real projects.
To address the distinct requirements for performing analysis on
Big Data, a step-by-step methodology is needed to organize
the activities and tasks involved with acquiring, processing,
analyzing, and repurposing data.
Life Cycle Phases of Data Analytics
Phase 1: Discovery
1 2 3 4
The data science team Develop context and Come to know about The team formulates
learns and investigates understanding. data sources needed initial hypotheses that
the problem. and available for the can be later tested
project. with data.
Life Cycle Phases of Data Analytics
Phase 2: Data Preparation
1 2 3 4
Several tools
Steps to explore, It requires the Data preparation tasks
commonly used for
preprocess, and presence of an are likely to be
this phase are –
condition data prior to analytic sandbox; the performed multiple
Hadoop, Alpine Miner,
modeling and analysis. team executes, loads, times and not in a
Open Refine, etc.
and transforms to get predefined order.
data into the sandbox.
Life Cycle Phases of Data Analytics
Phase 3: Model Planning
1 2 3 4
Team builds and Several tools
Team explores data to In this phase, the data
executes models commonly used for
learn about science team develops
based on the work this phase are –
relationships between datasets for training,
done in the model Matlab, STASTICA.
variables and testing, and production
planning phase.
subsequently selects purposes.
key variables and the
most suitable models.
Life Cycle Phases of Data Analytics
Phase 4: Model Building
Team develops datasets for testing, training, and Team also considers whether its existing tools will
production purposes. suffice for running the models or if they need a
more robust environment for executing models.
Life Cycle Phases of Data Analytics
Phase 5: Communication Results
1 2 3
Team considers how best to
After executing a model, the Team should identify key
articulate findings and
team needs to compare the findings, quantify business
outcomes to various team
outcomes of modeling to value, and develop narrative to
members and stakeholders,
criteria established for success summarize and convey findings
taking into account warning,
and failure. to stakeholders.
assumptions.
Life Cycle Phases of Data Analytics
Phase 6: Operationalize
The team communicates benefits of the project more broadly and sets up a pilot project to deploy work in
a controlled way before broadening the work to the full enterprise of users.
This approach enables the team to learn about performance and related constraints of the model in the
production environment on a small scale and make adjustments before full deployment.
The team delivers final reports, briefings, codes.
Free or open source tools – Octave, WEKA, SQL, MADlib.
Review of Data Analytics
Many insights fail to analyze data completely
Raw data aggregated is data that is not
and become difficult for the stakeholders'
oriented. It requires a thoughtful
comprehension. Therefore, it becomes
understanding as well as the appropriate
necessary for a data analyst to define and
questions in order to create sense out of it.
understand data with the right set of initial
questions and a standardized workflow for the
different types of analysis he needs to
perform.
Types of Data Analysis
Descriptive Data Analysis Exploratory Data Analysis
• The name suggests that this kind of analysis • Analysis of descriptive data output that is
offers basic "descriptions" or summaries about the further studied for discovering patterns, trends,
raw data set accumulated and the observations correlations, or inter-relations among different
added to the same. areas of the data in order to develop an
interpretation, an idea, or hypotheses.
• Example: Data on segregation of students
enrolled in the same course at college.
Inferential/Quantified Data Analysis
• The distinction between inferential and
exploratory analysis could be identified by
determining if the analysis offers consistent
information across various samples and the ones
in the present.
Types of Data Analysis
Predictive Data Analysis
• The predictive analysis predicts the outcomes
that could be expected from a small subset of
data from the initial population set.
Causal Data Analysis
• Making modifications to one dimension or
measurement to create a conclusive version of a Mechanistic Data Analysis
different dimension is the foundation of causal
analysis. • Although causal data provides an accurate
average result, the aim isn't just to comprehend
that there's an impact of the inferences derived
from data but also to understand how the effect is
affecting the outcome.
02
Advanced Data Analytics
Key Components
Data Collection Data Cleaning Data Integration
• Definition: The process of • Definition: The process of • Definition: Combining data
gathering information from detecting and correcting (or from different sources to create
various sources. removing) corrupt or inaccurate a unified view.
records from a dataset.
• Sources: Databases, APIs, web • Techniques: ETL (Extract,
scraping, sensors, social media, • Techniques: Handling missing Transform, Load), data
transaction logs. values, correcting errors, warehousing, APIs.
removing duplicates,
• Importance: High-quality data • Importance: Provides a
standardizing formats.
collection is critical for accurate comprehensive dataset that
analysis. • Importance: Ensures the reflects all relevant information.
reliability and validity of the
data analysis.
Key Components
Data Transformation Data Storage
• Definition: The process of • Definition: Storing data in
converting data into a format databases, data warehouses, or
suitable for analysis. data lakes.
• Techniques: Normalization, • Technologies: SQL databases
aggregation, feature (MySQL, PostgreSQL), NoSQL
engineering. databases (MongoDB,
• Importance: Enhances data Cassandra), data lakes (Amazon
quality and prepares it for S3).
effective analysis. • Importance: Efficient storage
solutions are crucial for
managing large datasets and
ensuring fast access.
Techniques
Descriptive Analytics Predictive Analytics
• Purpose: To describe the main features of a • Purpose: To make predictions about future
dataset. outcomes based on historical data.
• Methods: Summary Statistics, Data Visualization. • Methods: Regression Analysis, Time Series
Analysis, Machine Learning Models.
Prescriptive Analytics Exploratory Data Analysis (EDA)
• Purpose: To suggest actions that can optimize • Purpose: To explore data without making any
outcomes. assumptions.
• Methods: Optimization, Simulation. • Methods: Hypothesis Testing, Cluster Analysis.
Techniques
Big Data Analytics Text Analytics
• Purpose: To analyze large and complex
• Purpose: To extract meaningful information
datasets that traditional data processing tools
from text data.
cannot handle.
• Methods: Distributed Computing, Real-Time • Methods: Natural Language Processing
Analytics. (NLP), Text Mining.
Advanced Machine Learning and AI
• Purpose: To develop models that can learn
from data and make intelligent decisions.
• Methods: Deep Learning, Reinforcement Learning.
Tools and Technologies
1 Programming Languages 2 Statistical Software
• Python • SAS
•R • SPSS
• SQL
3 Data Visualization Tools 4 Machine Learning Libraries
• Tableau • TensorFlow
• Power BI • PyTorch
• D3.js • Scikit-Learn
5 Big Data Platforms 6 Databases
• Apache Hadoop • SQL Databases
• Apache Spark • NoSQL Databases
Applications
Business Intelligence Healthcare Finance
• Objective: To improve
• Objective: To transform data • Objective: To enhance
patient outcomes and
into actionable insights for financial decision-making and
optimize healthcare
strategic decision-making. risk management.
operations.
• Tools: BI tools like Tableau, • Applications: Predictive • Applications: Fraud
Power BI. modeling for patient risk, detection, algorithmic trading,
• Examples: Sales personalized treatment plans. credit scoring.
performance analysis, market • Examples: Predicting
• Examples: Detecting
trend analysis. disease outbreaks, analyzing
fraudulent transactions,
patient data for better
optimizing investment
diagnosis.
strategies.
Applications
Marketing Manufacturing
• Objective: To understand customer behavior • Objective: To increase efficiency and reduce
and optimize marketing efforts. costs in production processes.
• Applications: Customer segmentation, targeted • Applications: Predictive maintenance, supply
advertising, sentiment analysis. chain optimization.
• Examples: Identifying potential customer • Examples: Predicting equipment failures,
segments, analyzing customer feedback. optimizing inventory levels.
Data Analytics Tools
Tableau
Description Features
Tableau is an easy-to-use Data Analytics tool with • Easy Drag and Drop Interface
a drag-and-drop interface to create interactive
• Mobile support for both iOS and Android
visuals and dashboards.
• Data Discovery feature
• Various Data sources like SQL Server, Oracle, etc.
Data Analytics Tools
Power BI
Description Features
Microsoft’s Data Analysis Tool providing enhanced • Great connectivity with Microsoft products
interactive visualization and capabilities of
• Powerful Semantic Models
Business Intelligence.
• Ability to create beautiful paginated reports
Data Analytics Tools
Apache Spark
1 Description 2 Features
Known for its speed in Data Processing with • Incredible Speed and Efficiency
in-memory processing and open source.
• Great connectivity with support of Python,
Scala, R, and SQL shells
Data Analytics Tools
TensorFlow
Description Features
An open-source library developed by Google for • Supports various programming languages
building and training machine learning models.
• Can scale as needed
Data Analytics Tools
Hadoop
Description
A distributed processing and storage solution
for Big Data.
Features
• Free to use as it is Open Source
• Can run on commodity hardware
Data Analytics Tools
R
Description Features
An Open Source Programming language widely • Ability to handle large datasets
used for Statistical Computing and Data Analysis.
• Flexibility for various areas like Data
Visualization, Data Processing
Data Analytics Tools
Python
1 Description 2 Features
A popular programming language for Data • Easy to learn and user-friendly
Analysis and Machine Learning.
• Extensive packages and libraries
Data Analytics Tools
SAS
Description
A software package designed for predictive
modeling, data management, and advanced
analytics.
Features
• Ability to handle large datasets
• Wide range of tools for predictive and
statistical analysis
Data Analytics Tools
QlikSense
Description
A Business and data analysis tool that
provides support for Data Visualization and
Data Analysis.
Features
• Tools for stunning and interactive Data
Visualization
Data Analytics Tools
KNIME
Description Features
An Analytics Platform and a data analysis tool
• Intuitive User Interface with drag and drop
that is Open Source and features an intuitive User
function
Interface.
Data Analytics Trends You Must Know in
2024
Smarter and Scalable Artificial Intelligence Edge Computing For Faster Analysis
Agile and Composed Data & Analytics Augmented Analytics
Hybrid Cloud Solutions and Cloud Computing The Death of Predefined Dashboards
Data Fabric XOps
Engineered Decision Intelligence
Data Analytics Trends You Must Know in
2024
Data Visualization
This markdown content is structured to be suitable for a presentation, following the requirements specified.
Thank You