0% found this document useful (0 votes)

109 views15 pages

ToolKit 1 - Unit 1 - Introduction To Data Analytics

This document provides an introduction to data analytics, including definitions of key terms and concepts. It discusses what data is, the different types and phases of data processing, and the steps involved in data analysis. It also describes the components, life cycle, and types of data analytics. Finally, it covers topics like data collection methods, tools, factors to consider, and frameworks for data and analytics.

Uploaded by

SHIVAM Mathur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views15 pages

ToolKit 1 - Unit 1 - Introduction To Data Analytics

Uploaded by

SHIVAM Mathur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

ToolKit 1| Unit 1| Introduction

To Data Analytics

01
What is Data?
A formalized representation of facts, concepts, or instructions that is
suitable for transmission, interpretation, or processing by a human or an
electronic system is referred to as data. Data is represented by characters
such as alphabets (A-Z, a-z), digits (0-9) or special characters (+,-,/,*,,>,=,
etc.). A collection of information obtained by observations, measurements,
study, or analysis is referred to as data.

Types of Data Classiﬁcation

Data can be essentially classified into four types namely:

1 Geographical Data 2 Chronological Data

3 Quantitative Data 4 Qualitative Data

Phases of Data Processing Cycle

1 Collection 2 Preparation 3 Input

4 Processing 5 Output 6 Storage

Data Analytics
Data analytics is about studying raw data with the purpose of drawing
conclusions about it. Data analytics is critical since it allows firms to improve
their performance. Companies that include it into their business models can
help cut costs by developing more efficient methods of doing business and
storing massive volumes of data.
02
Data Analysis Steps

Step 1 Establish your aim

Step 2 Gather data

Step 3 Arrange data in order to examine

Step 4 Analyze the data

Step 5 Create a model or representation

Step 6 Validation

Components of Data Analytics

Roadmap and
1 operating model
2 Data Acquisition

Data Governance
3 Data Security 4 and Standards

5 Insights and analysis 6 Data Storage

7 Data Visualization 8 Data Optimisation

03
Data Analytics Life Cycle
It involves 6 phases namely:

Discovery

Measure
Effectiveness 1
Data Prep
6 2

Communicate
Results/publish 5 3
Insights
Plan Model
4
Build Model

4 types of Data Analytics

Descriptive Analysis

Descriptive analytics simply describes the answer to what is happening

to the business and it alters raw information from numerous data
sources to give important knowledge into the past.

Diagnostic Analytics

At this stage, historical information can be classiﬁed against other data

to acknowledge the topic of why something happened. Diagnostic
analytics provides top to bottom bits of knowledge into a speciﬁc issue.

04
Predictive Analysis

Predictive analytics is giving hints that it is something related to future

prediction. Yes, it is as it tells about what is going to happen. It uses the
discoveries of descriptive and diagnostic analytics to identify bunches
and special cases and to predict future trends, which makes it a
signiﬁcant device for estimating.

Prescriptive Analytics

The motivation behind prescriptive analytics is to prescribe what move

to make to eliminate a future issue or take full advantage of a promising
trend. Prescriptive analytics utilizes advanced tools and technologies,
similar to machine learning, business rules, and algorithms, which
makes it modern to actualize and manage.

PwC’s Global Data and

Analytics Survey 2016
Over 250 executives in the UK were observed on what they will be making major
decisions about before 2020. The most likely proactive decisions are around
developing or launching new products or services (25% envisage having to do this);
investment in IT (20%); or entering new markets with existing products (18%). And
executives in the UK are motivated by market leadership and the need to survive.

Data Collection
Data collection is the procedure of collecting, measuring and analyzing accurate
insights for research using standard validated techniques.
The most important goal of data collecting is to collect information-rich and
accurate data for statistical analysis so that data-driven research choices may
be made.

05
Data Collection Methods

1 Primary
This is original, ﬁrst-hand data collected by the data researchers.
Primary data results are highly accurate provided the researcher
collects the information.

2 Secondary
Secondary data is second-hand data collected by other parties
and already having undergone statistical analysis. This data is
either information that the researcher has tasked other people to
collect or information the researcher has looked up.

Methods of Primary Data Collection

1 Direct personal interviews

2 Indirect Oral Interviews

3 Information from correspondents

4 Mailed questionnaire method

5 Schedules sent through Enumerators

Sources of Secondary Data

1 Published Sources 2 Unpublished Sources

06
Data Collection Tools

1 Interviews 2 Questionnaires 3 Case Studies

4 Checklists 5 Surveys 6 Observations

Documents and
7 records
8 Focus groups 9 Oral histories

Factors to be considered before

choosing a Data Collection tool
Variable Type: Consider the type of information you want to collect, your
research specialty, and the overall goals of the study.
Study design: Choose the method you'll use to gather this data.
Data collection technique: Determine which strategies and technologies you
like for data collection.
Sample data: Decide where you want to collect data and sample it. This really
refers to the sampled population. Determine which segments of the population
will be included in your inquiry.
Sample size: Consider the number of subjects you wish to include in your study.
Sample design: Also, think about how you will choose the sample.
Time factor: When selecting on a technique of data gathering, the availability
of time must also be considered.
Availability of funds: The availability of ﬁnances for the research topic dictates
the approach to be employed for data collecting to a considerable extent.
Nature, scope and object of enquiry: This is the most essential aspect
inﬂuencing technique selection. The approach used should be appropriate for
the sort of investigation to be done by the researcher.
Precision required: Another key issue to consider when deciding on a data
gathering strategy is precision required.

07
How to deliver value with analytics?
• Enable self-service analytics
• Provide speciﬁc goals and their related KPIs to help teams
measure success
• Democratize advanced analysis with intuitive AI
• Support development of data literacy or conﬁdence when working
with data
• Identify subject matter experts in each department

The Data and Analytics Framework

A framework matrix is a table of rows and columns that summarizes and
analyses qualitative data. It supports both cross-case and theme-based data
sorting. Individual instances are typically organized by row, while themes to
which the data has been coded constitute the matrix's columns. The source
material relating to the intersecting case and theme is described in each
intersecting cell.

Aspects of Framework

1 Discovery 2 Insights

3 Actions 4 Outcomes

6 layers in Data and Analytics Framework

1 Use Cases 2 Datasets 3 Data Collection

Intelligent
4 Data Preparation 5 Learning 6 Actions
08
Techniques of Framework
The big data analytics framework is primarily based on two fundamental
frameworks, namely:

1 SQL frameworks 2 NoSQL frameworks

Many entrepreneurs all around the world employ data analytics frameworks.
• Apache Cassandra
• Knime
• Datawrapper
• Lumify
• Apache Storm
• Rapidminer
• Flink

Big Data
Big data is, as the term implies, a "large" quantity of data. It refers to a data
collection that is both huge in volume and complicated. Traditional data
processing software cannot manage Big Data due to its vast volume and
increased complexity. Big Data simply refers to datasets that contain a
signiﬁcant quantity of different data, both organized and unstructured.

5 Vs of Big Data

Volume Volume is a huge amount of data.

Velocity refers to the high speed of accumulation of data. In

Velocity Big Data velocity data ﬂows in from sources like machines,
networks, social media, mobile phones etc.

09
It refers to the nature of data that is structured,
Variety semi-structured and unstructured data. It also refers
to heterogeneous sources.

The bulk of Data having no Value is of no good to the

Value
company, unless you turn it into something useful

It refers to inconsistencies and uncertainty in data,

Veracity that is data which is available can sometimes get
messy and quality and accuracy are difﬁcult to control.

Application of Big Data in Real World

Customer Machine Demand

1 Experience
2 Learning
3 Forecasting

Big Data Storage

Big data storage is a storage system that is especially built to store,
handle, and retrieve huge volumes of data, often known as big data. Big
data storage allows for the storing and sorting of large amounts of data
so that it may be quickly accessible, consumed, and processed by big
data applications and services.
Big data storage is a compute-and-storage architecture that allows you
to collect and manage massive datasets as well as execute real-time data
analytics. The results of these studies can then be utilized to produce
intelligence from metadata.

Types of Big Data

Semi-
1 Structured 2 Unstructured 3 Structured

10
Big Data Life-cycle
There are 9 phases involved in the Big Data Life Cycle. They are as
follows:
• Business Case/Problem Definition
• Data Identification
• Data Acquisition and filtration
• Data Extraction
• Data Munging(Validation and Cleaning)
• Data Aggregation & Representation(Storage)
• Exploratory Data Analysis
• Data Visualization(Preparation for Modeling and Assessment)
• Utilization of analysis results.

Big Data Tools

Big Data requires a set of tools and techniques for analysis to gain
insights from it.
There are a number of big data tools available in the market such as
Hadoop which helps in storing and processing large data, Storm helps in
faster processing of unbounded data, Apache Cassandra provides high
availability and scalability of a database,, so there are different functions
of every Big Data tool.

1 Hadoop 2 Atlas.ti 3 HPCC

4 Storm 5 Cassandra 6 Stats iQ

7 CouchDB 8 RapidMiner

11
Data Warehouse
An analytics-focused type of data management system called a data
warehouse is designed to support and facilitate business intelligence (BI)
operations. Data warehouses are only used to conduct searches and
analyses on vast amounts of historical data. Data for a data warehouse is
frequently produced from a variety of sources, such as transactional
programmes and application log ﬁles.

Advantages of Data Warehouse

• Provides quick access to crucial data from numerous sources
• Gives consistent information on a variety of cross-functional
operations. Ad hoc reporting and querying is also possible
• Helps to integrate a number of data sources to decrease workload on
production system
• Reduces the amount of time it takes for analysis and reporting to
be completed
• Enables access of crucial data from several sources in one place.
The user therefore saves time while gathering data from various sources.

Drawbacks of Data Warehouse

• Ineffective at handling unstructured data
• Building and implementing a data warehouse takes time.
• Possibility of getting outdated quickly
• Challenging to make changes to data types, ranges, data source
structure, indexes, and searches.

12
Data Warehouse Components

1 ETL- Extract/Transform/Load:

A variety of tasks are performed by ETL such as:

• Logical data conversion
• Veriﬁcation of Domain
• Converting one DMS to another
• Default values generation, when required
• Summarizing the Data
• Adding time values to the data key
• Restructuring the data key
• Records integration
• Getting rid of extraneous or duplicate data.

2 ODS- Operational Data Store

Online updates of integrated data are carried out with an OLTP (online
Transaction Processing) response time in the ODS. An integrated format
for application data is created in the hybrid environment known as the ODS
(often via ETL). Data can be used for high-performance processing,
including update processing, once it is placed in the ODS.

3 Data Mart

The data mart is designed around a single set of user-wide expectations for
how data should appear and is typically arranged by department. There is a
separate information warehouse for ﬁnance. Compared to the data warehouse,
each data mart typically contains much less data. Additionally, data marts
frequently include a sizable amount of summarized and aggregated data.

4 Exploration Warehouse

End users that wish to undertake discovery processing go to the exploration

warehouse. The exploration warehouse does a lot of statistical analysis.
13
Approaches to building a Warehouse

Inmon’s Approach

Bill Inmon developed the Inmon's technique to developing a data

warehouse. Starting point for this strategy is a business data model.
This model takes into account important areas as well as customers,
goods, and vendors. This model is used to provide a thorough logical
model that is applied to signiﬁcant processes. A physical model is then
created using details and models. The normalized nature of this
approach reduces data redundancy.

Kimball’s Approach

This approach of designing a data warehouse was introduced by Ralph

Kimball. Recognizing the business process and questions that Data
warehouse must address is the ﬁrst step in this strategy. These data
sets are carefully evaluated and then documented.

Steps to build a warehouse

1 To extract the data (transnational) from different data sources

2 To transform the transnational data

3 To load the data (transformed) into the dimensional database

14
Data warehouse can be mapped into
different types of architecture as follows:
Shared memory architecture: The standard method for putting an RDBMS on
SMP hardware is to implement it in shared-memory or shared-everything form.
The main beneﬁt of this method is that a single RDBMS server can likely
access all memory, all CPUs, and the whole database, giving the client a
consistent single system image.
Shared disk architecture: The idea of shared ownership of the complete
database between RDBMS servers, each of which is executing on a node of a
distributed memory system, is implemented via shared-disk architecture. Each
RDBMS server can access the same shared database to read, write, update,
and delete data, necessitating the implementation of a distributed lock
management (DLM).
Shared nothing architecture: Systems that share nothing are often loosely
connected. Only one CPU is attached to a speciﬁc disc in shared nothing
systems. Access is entirely dependent on the PU that owns any tables or
databases that are stored on that disc.

Ds Notes-Unit 1, II and III Upto Part1
No ratings yet
Ds Notes-Unit 1, II and III Upto Part1
341 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
U1 D CLSRM
No ratings yet
U1 D CLSRM
18 pages
U1 C CLSRM
No ratings yet
U1 C CLSRM
30 pages
BigDataAnalytics - Unit1
No ratings yet
BigDataAnalytics - Unit1
21 pages
Introduction To Data Science Module 2
No ratings yet
Introduction To Data Science Module 2
35 pages
Unit 1ppt 241202105748 Ba1c594f
No ratings yet
Unit 1ppt 241202105748 Ba1c594f
30 pages
Big Data & Web Analytics Insights
No ratings yet
Big Data & Web Analytics Insights
9 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Unit - I DA
No ratings yet
Unit - I DA
107 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Module 1 - BA
No ratings yet
Module 1 - BA
24 pages
Microooooooooooooo
No ratings yet
Microooooooooooooo
33 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
Essay 2
No ratings yet
Essay 2
3 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Big Data Analytics: UNIT-1
No ratings yet
Big Data Analytics: UNIT-1
141 pages
Screenshot 2024-11-08 at 11.01.05 AM
No ratings yet
Screenshot 2024-11-08 at 11.01.05 AM
54 pages
Untitled Document-1
No ratings yet
Untitled Document-1
3 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
53 pages
CH 1
No ratings yet
CH 1
31 pages
Data Analytics Essentials for Students
No ratings yet
Data Analytics Essentials for Students
24 pages
Module 1 - BA
No ratings yet
Module 1 - BA
24 pages
Unit 1
No ratings yet
Unit 1
61 pages
Curso Data Analis
No ratings yet
Curso Data Analis
7 pages
Unit 1
No ratings yet
Unit 1
50 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
61 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
Data For Business Analytics Unit 2
No ratings yet
Data For Business Analytics Unit 2
23 pages
Data Analytics Unit - I Data Analytics and Lifecycle
No ratings yet
Data Analytics Unit - I Data Analytics and Lifecycle
46 pages
DA - Unit I
No ratings yet
DA - Unit I
83 pages
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Data Analysis & Business Insights
No ratings yet
Data Analysis & Business Insights
2 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Unit 1 Notes Final Part C
No ratings yet
Unit 1 Notes Final Part C
38 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Module1 Cse2500 Da
No ratings yet
Module1 Cse2500 Da
54 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
Unit 3 Data Analytics
No ratings yet
Unit 3 Data Analytics
16 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
BDA Unit 1 Bigdata Intro
No ratings yet
BDA Unit 1 Bigdata Intro
69 pages
Introduction To Data Analytics PDF
No ratings yet
Introduction To Data Analytics PDF
15 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
No ratings yet
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
18 pages
Key Challenges in Data Analysis Solutions
100% (1)
Key Challenges in Data Analysis Solutions
15 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Unit 1 Data - Analytics
No ratings yet
Unit 1 Data - Analytics
53 pages
Data Analytics for Beginners
No ratings yet
Data Analytics for Beginners
11 pages
What Is Big Data
No ratings yet
What Is Big Data
5 pages
Unit 1
No ratings yet
Unit 1
36 pages
Bda CH1
No ratings yet
Bda CH1
18 pages
Output SPSS Format Word
No ratings yet
Output SPSS Format Word
19 pages
DWM Musa
No ratings yet
DWM Musa
4 pages
GIA 4204 Independent Research Project Research Project Report Outline
No ratings yet
GIA 4204 Independent Research Project Research Project Report Outline
6 pages
Curve Fitting Techniques Explained
100% (1)
Curve Fitting Techniques Explained
43 pages
Attachment 1
No ratings yet
Attachment 1
3 pages
Taller Final Estadistica Tercer Corte
No ratings yet
Taller Final Estadistica Tercer Corte
5 pages
Hasil Spss Log
No ratings yet
Hasil Spss Log
6 pages
Density, Boxplot, Violinplot, Scatterplot
No ratings yet
Density, Boxplot, Violinplot, Scatterplot
7 pages
Namrata Resume
No ratings yet
Namrata Resume
4 pages
Mathematics Standard Stage 6 Syllabus 2017
No ratings yet
Mathematics Standard Stage 6 Syllabus 2017
94 pages
A Study of Consumer Buying Behaviour of Perfume in India: November 2014
No ratings yet
A Study of Consumer Buying Behaviour of Perfume in India: November 2014
5 pages
Joshua Erdy Tan: Data Analyst/Professional Teacher
No ratings yet
Joshua Erdy Tan: Data Analyst/Professional Teacher
1 page
Introduction To Time Series
No ratings yet
Introduction To Time Series
7 pages
K Sunil
No ratings yet
K Sunil
21 pages
Ba Sas
No ratings yet
Ba Sas
5 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Statistics II - Formula Sheet: Unit 1
No ratings yet
Statistics II - Formula Sheet: Unit 1
2 pages
A Research Project To Investigate The Effect of Automation in Banking Services
100% (1)
A Research Project To Investigate The Effect of Automation in Banking Services
46 pages
ATLAS - Digital Student Handbook - 2
No ratings yet
ATLAS - Digital Student Handbook - 2
49 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
35 pages
Five Phases of The Quantitative Research Process
No ratings yet
Five Phases of The Quantitative Research Process
4 pages
Case Study SCM
No ratings yet
Case Study SCM
2 pages
Budget of Work 1stsem
No ratings yet
Budget of Work 1stsem
6 pages
Notes On
No ratings yet
Notes On
2 pages
Pointers Practical Research 1
100% (1)
Pointers Practical Research 1
5 pages
Cáceres Et Al. 2021 Longitudinal Social Competence ECRQ
No ratings yet
Cáceres Et Al. 2021 Longitudinal Social Competence ECRQ
11 pages
Rajiv Gandhi University of Health Sciences: Bangalore, Karnataka
No ratings yet
Rajiv Gandhi University of Health Sciences: Bangalore, Karnataka
21 pages
Smartyouth Savings and Credit System Project Report
No ratings yet
Smartyouth Savings and Credit System Project Report
24 pages
Data Driven Decision Making
100% (1)
Data Driven Decision Making
27 pages
ML Unit 2
No ratings yet
ML Unit 2
53 pages

ToolKit 1 - Unit 1 - Introduction To Data Analytics

Uploaded by

ToolKit 1 - Unit 1 - Introduction To Data Analytics

Uploaded by

ToolKit 1| Unit 1| Introduction

Types of Data Classiﬁcation

1 Geographical Data 2 Chronological Data

3 Quantitative Data 4 Qualitative Data

Phases of Data Processing Cycle

1 Collection 2 Preparation 3 Input

4 Processing 5 Output 6 Storage

Step 1 Establish your aim

Step 2 Gather data

Step 3 Arrange data in order to examine

Step 4 Analyze the data

Step 5 Create a model or representation

Components of Data Analytics

5 Insights and analysis 6 Data Storage

7 Data Visualization 8 Data Optimisation

4 types of Data Analytics

Descriptive analytics simply describes the answer to what is happening

At this stage, historical information can be classiﬁed against other data

Predictive analytics is giving hints that it is something related to future

The motivation behind prescriptive analytics is to prescribe what move

PwC’s Global Data and

Methods of Primary Data Collection

1 Direct personal interviews

2 Indirect Oral Interviews

3 Information from correspondents

4 Mailed questionnaire method

5 Schedules sent through Enumerators

Sources of Secondary Data

1 Published Sources 2 Unpublished Sources

1 Interviews 2 Questionnaires 3 Case Studies

4 Checklists 5 Surveys 6 Observations

Factors to be considered before

The Data and Analytics Framework

6 layers in Data and Analytics Framework

1 Use Cases 2 Datasets 3 Data Collection

1 SQL frameworks 2 NoSQL frameworks

Volume Volume is a huge amount of data.

Velocity refers to the high speed of accumulation of data. In

The bulk of Data having no Value is of no good to the

It refers to inconsistencies and uncertainty in data,

Application of Big Data in Real World

Customer Machine Demand

Big Data Storage

Types of Big Data

Big Data Tools

1 Hadoop 2 Atlas.ti 3 HPCC

4 Storm 5 Cassandra 6 Stats iQ

Advantages of Data Warehouse

Drawbacks of Data Warehouse

A variety of tasks are performed by ETL such as:

2 ODS- Operational Data Store

End users that wish to undertake discovery processing go to the exploration

Bill Inmon developed the Inmon's technique to developing a data

This approach of designing a data warehouse was introduced by Ralph

Steps to build a warehouse

1 To extract the data (transnational) from different data sources

2 To transform the transnational data

3 To load the data (transformed) into the dimensional database

You might also like