0% found this document useful (0 votes)

9 views25 pages

Chapter-1-Introduction To Big Data

The document provides an introduction to Big Data, explaining its significance, characteristics, and the challenges it addresses compared to traditional systems. It outlines the 'V's of Big Data: Volume, Velocity, and Variety, and discusses various NoSQL database types and their applications. Additionally, it highlights use cases and architectural patterns for implementing Big Data solutions effectively.

Uploaded by

f524hrkxmt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views25 pages

Chapter-1-Introduction To Big Data

Uploaded by

f524hrkxmt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Introduction to Big Data

Ref book: Hadoop Essentials by Shiva Achari, ISBN 978-1-78439-668-8

Available on ProQuest e-book central
Outcome #1

Describe the concepts of Big Theory

Data, its characteristics and big
data domains.
Understand the need of Big Data

What is Big Data mean?

Big Data is all about finding the needle of value in a haystack of structured, semi-structured and
unstructured information.
Big Data capabilities over the traditional system:
Big data systems can process data analytics not only faster but also efficiently for a large data and can
enhance the scope of research and development analysis and can produce more meaningful insights
and faster than any other analytic or BI system.
Big data systems have emerged due to some issues and limitations in traditional systems. The
traditional systems are good for Online Transaction Processing (OLTP) and Business Intelligence
(BI), but are not easily scalable considering cost, effort, and manageability aspect. Processing heavy
computations are difficult and prone to memory issues, or will be very slow. Traditional systems lack
extensively in data science analysis and make big data systems powerful

INTRODUCTION TO BIG DATA 3

Some examples of big data use cases:
Predictive analytics
Fraud analytics
Machine learning
Identifying patterns
Data analytics
 Semi-structured and unstructured data processing and analysis.

INTRODUCTION TO BIG DATA 4

V's of big data

Image copied from Achari, Shiva. Hadoop Essentials, Packet Publishing Ltd, 2015. ProQuest Ebook

INTRODUCTION TO BIG DATA 5

Volume
Big data systems are designed to store petabytes or zettabytes of data even more
than that.

The cost per terabyte storage in big data is very less than in other systems

Data will be distributed and replicated across multiple nodes.

Data easily scalable, and nodes can be added without much maintenance effort.
INTRODUCTION TO BIG DATA 6
Velocity
The rate at which data is processed should be equal to the rate at which data is
generated.

Big data systems can process huge complex algorithms on huge data much quickly, as
it leverages parallel processing across distributed environment.

 Big Data systems executes multiple processes in parallel at the same time, and the job
can be completed much faster.

INTRODUCTION TO BIG DATA 7

Variety
Another big challenge for the traditional systems is to handle different
variety of
semi-structured data or unstructured data such as e-mails, audio and video
analysis,
image analysis, social media, gene, geospatial, 3D data, and so on.

Big data can not only help store, but also utilize and process such data using
algorithms much more quickly and also efficiently.

Big data systems can efficiently handle Semi-structured and unstructured

complex data processing with minimal or no preprocessing like other systems.

INTRODUCTION TO BIG DATA 8

Who is creating big data?
A list of some sources that are creating big data is mentioned as follows:

Monitoring sensors: Climate or ocean wave monitoring sensors generate data

consistently and in a good size, and there would be more than millions of sensors that
capture data.

Posts to social media sites: Social media websites such as Facebook, Twitter, and
others have a huge amount of data in petabytes.

Digital pictures and videos posted online: Websites such as YouTube, Netflix, and
others process a huge amount of digital videos and data that can be petabytes.

INTRODUCTION TO BIG DATA 9

Who is creating big data?....
Transaction records of online purchases: E-commerce sites such as eBay, Amazon,
Flipkart, and others process thousands of transactions on a single time.

Server/application logs: Applications generate log data that grows consistently, and
analysis on these data becomes difficult.

CDR (call data records): Roaming data and cell phone GPS signals to name a few.

Science, genomics, biogeochemical, biological, and other complex and/or

interdisciplinary scientific research.

INTRODUCTION TO BIG DATA 10

Understanding big data

big data is a terminology which refers to challenges that we are facing due to exponential growth
of data in terms of V problems. The challenges can be subdivided into the following phases:
• Capture
• Storage
• Search
• Sharing
• Analytics
• Visualization

INTRODUCTION TO BIG DATA 11

Strategy to solve Big Data Problems
Big data systems also refer to technologies that can process and analyze data, which we
discussed as volume, velocity, and variety data problems. The technologies that can
solve big data problems should use the following architectural strategy:
• Distributed computing system
• Massively parallel processing (MPP)
• NoSQL (Not only SQL)
• Analytical database

INTRODUCTION TO BIG DATA 12

Why NoSQL
A NoSQL database is a widely adapted technology due to the schema less
design, and its ability to scale up vertically and horizontally is fairly
simple and in much less effort. SQL and RDBMS have ruled for more
than three decades, and it performs well within the limits of the
processing environment, and beyond that the RDBMS system
performance degrades, cost increases, and manageability decreases, we
can say that NoSQL provides an edge over RDBMS in these scenarios.

INTRODUCTION TO BIG DATA 13

Types of NoSQL databases
As the NoSQL databases are non-relational they have different sets of
possible architecture and design. There are four general types of
NoSQL databases, based on how the data is stored:

1. Key-value store
2. Column store
3. Document database
4. Graph database
5. Analytical database

INTRODUCTION TO BIG DATA 14

1. Key-value store
• These databases are designed for storing data in a key-value store.
Some popular key-value type databases are DynamoDB, Azure Table
Storage (ATS), Riak, and BerkeleyDB.
• stores information in form of matched pairs with only two columns
permitted - the key (hashed key) and the value.
• The values can be simple text or complex data types such as sets of
data.
• Data must be retrieved via an exact match on the key.
• The advantage of this type of NoSQL database is that new types of
data can easily be added to the database as new key value pairs.

INTRODUCTION TO BIG DATA 15

1. Key-value store

INTRODUCTION TO BIG DATA 16

2. Column store
These databases are designed for storing data as a group
of column families. Read/write operation is done using
columns, rather than rows.
They deliver high performance on aggregation queries
like SUM, COUNT, AVG, MIN etc. as the data is readily
available in a column.
Column-based NoSQL databases are widely used to
manage data warehouses, business intelligence, CRM,
Library card catalogs,
Some popular column store type databases are HBase,
BigTable, Cassandra, Vertica, and Hypertable.

INTRODUCTION TO BIG DATA 17

3. Document database
These databases are designed for storing, retrieving, and managing document-oriented
information. A document database expands on the idea of key-value stores where values
or documents are stored using some structure and are encoded in formats such as XML.
Pair of each key with a complex data structure known as a Document
Document store, is a computer program designed for storing, retrieving, and managing
document-oriented information, also known as semi-structured data.
Document-oriented databases are inherently a subclass of the key-value store
Documents can contain many different key-value pairs, or key-array pairs, or even
nested documents.
Some popular document-type databases are MongoDB and CouchDB.

INTRODUCTION TO BIG DATA 18

3. Document database

INTRODUCTION TO BIG DATA 19

4. Graph database
These databases are designed for data whose relations are well represented as trees or a
graph, and has elements, usually with nodes and edges, which are interconnected. Relational
databases are not so popular in performing graph-based queries as they require a lot of
complex joins.
Graph Databases are used to store information about networks of data, such as social
connections.
A graph database is a database that uses graph structures for semantic queries with
nodes, edges and properties to represent and store data.
A key concept of the system is the graph (or edge or relationship), which directly relates
data items in the store.
Some popular graph-type databases are Neo4J and Polyglot.

INTRODUCTION TO BIG DATA 20

4. Graph database

INTRODUCTION TO BIG DATA 21

5. Analytical database
An analytical database is a type of database built to store,
manage, and consume big data. Analytical databases are vendor-
managed DBMS, which are optimized for processing advanced
analytics that involves highly complex queries on terabytes of data
and complex statistical processing, data mining, and NLP (natural
language processing).
Examples of analytical databases are Vertica (acquired by HP),
Aster Data (acquired by Teradata), Greenplum (acquired by EMC),
and so on.

INTRODUCTION TO BIG DATA 22

Big data use case patterns

There are many technological scenarios, and some of them are

similar in pattern. It is a good idea to map scenarios with
architectural patterns.
Once these patterns, are understood, they become the fundamental
building blocks of solutions. We will discuss five types of patterns.

INTRODUCTION TO BIG DATA 23

Big data use case patterns

1. Big data as a storage pattern

Big data systems can be used as a storage pattern or as a data warehouse, where
data from multiple sources, even with different types of data, can be stored and can
be utilized later.
2. Big data as a data transformation pattern
Big data systems can be designed to perform transformation as the data loading
and cleansing activity, and many transformations can be done faster than traditional
systems due to parallelism. Transformation is one phase in the Extract–Transform–Load
of data ingestion and cleansing phase.

INTRODUCTION TO BIG DATA 24

Big data use case patterns
1. Big data for a data analysis pattern
Data analytics is of wider interest in big data systems, where a huge amount of data can be
analyzed to generate statistical reports and insights about the data, which can be useful in
business and understanding of patterns.
2. Big data for data in a real-time pattern
Big data systems integrating with some streaming libraries and systems are capable of handling
high scale real-time data processing. Real- time processing for a large and complex requirement
possesses a lot of challenges such as performance, scalability, availability, resource management,
low latency, and so on. Some streaming technologies such as Storm and Spark Streaming can be
integrated with YARN.
3. Big data for a low latency caching pattern
Big data systems can be tuned as a special case for a low latency system, where reads are much
higher and updates are low, which can fetch the data faster and can be stored in memory, which
can further improve the performance and avoid overheads.

INTRODUCTION TO BIG DATA 25

03 Unit Bda Hadoop, Map Reduce
No ratings yet
03 Unit Bda Hadoop, Map Reduce
80 pages
Big Data Tech: NoSQL & Hadoop
No ratings yet
Big Data Tech: NoSQL & Hadoop
16 pages
1.5 Module-1
No ratings yet
1.5 Module-1
21 pages
Mod10-Wk10 CSG2132 Module 10 Big Data 2020
No ratings yet
Mod10-Wk10 CSG2132 Module 10 Big Data 2020
26 pages
Module 1
No ratings yet
Module 1
54 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
134 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
Big Data Intro
No ratings yet
Big Data Intro
32 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
Big Data Shivani
No ratings yet
Big Data Shivani
78 pages
Seminar On: Big Data
No ratings yet
Seminar On: Big Data
23 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Big Data
No ratings yet
Big Data
25 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Module 1
No ratings yet
Module 1
90 pages
BigData AmberSahai1
No ratings yet
BigData AmberSahai1
32 pages
Big Data Opportunities and Challenges - (2 BIG DATA TECHNOLOGIES)
No ratings yet
Big Data Opportunities and Challenges - (2 BIG DATA TECHNOLOGIES)
3 pages
BIG Data - Unit - 1
No ratings yet
BIG Data - Unit - 1
24 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Chapter 5c
No ratings yet
Chapter 5c
18 pages
CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
Lecture8 - Big Data (Hadoop)
No ratings yet
Lecture8 - Big Data (Hadoop)
29 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Lecture 8
No ratings yet
Lecture 8
34 pages
Bda Unit I LM
No ratings yet
Bda Unit I LM
14 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
WK 3
No ratings yet
WK 3
29 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Big Data Analytics
No ratings yet
Big Data Analytics
49 pages
Big Data
No ratings yet
Big Data
25 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Big Data
No ratings yet
Big Data
63 pages
Big Data Analytics Handbook 2020
No ratings yet
Big Data Analytics Handbook 2020
103 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Unit 1
No ratings yet
Unit 1
118 pages
Big Data
No ratings yet
Big Data
30 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Chapter 1 Introduction To Big Data
No ratings yet
Chapter 1 Introduction To Big Data
19 pages
BDA Unit 2
No ratings yet
BDA Unit 2
30 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
Big Data
No ratings yet
Big Data
53 pages
Big Data Lec4
No ratings yet
Big Data Lec4
38 pages
Hadoop Notes Unit2
No ratings yet
Hadoop Notes Unit2
24 pages
Beyond The Hype
No ratings yet
Beyond The Hype
30 pages
Nursing Informatics-Innovative Technology in Health
100% (1)
Nursing Informatics-Innovative Technology in Health
2 pages
What's New in Avaya Aura Release 6.2 Feature Pack 4
No ratings yet
What's New in Avaya Aura Release 6.2 Feature Pack 4
66 pages
Fundamentals of Computer (Theory)
No ratings yet
Fundamentals of Computer (Theory)
2 pages
Front End Web Technologies-Javascript
No ratings yet
Front End Web Technologies-Javascript
85 pages
Oracle
No ratings yet
Oracle
29 pages
Ridham's CV
No ratings yet
Ridham's CV
1 page
GYE Swap Proposal
No ratings yet
GYE Swap Proposal
14 pages
W 7 U
No ratings yet
W 7 U
6 pages
EDI in Indian Customs: Overview & Impact
No ratings yet
EDI in Indian Customs: Overview & Impact
2 pages
Activity Template - Project Plan
No ratings yet
Activity Template - Project Plan
48 pages
CPTR213 - Fundamentals of Databases - Course Outline (FALL 2024)
No ratings yet
CPTR213 - Fundamentals of Databases - Course Outline (FALL 2024)
10 pages
Learning Tools Interoperability (LTI) Engine 20.1.x Documentation
No ratings yet
Learning Tools Interoperability (LTI) Engine 20.1.x Documentation
1 page
Multimedia Systems Lab: Practical
No ratings yet
Multimedia Systems Lab: Practical
3 pages
Algebra Cheat Sheets PDF
100% (2)
Algebra Cheat Sheets PDF
10 pages
Logg 20250215
No ratings yet
Logg 20250215
167 pages
Manoranjan PP T
No ratings yet
Manoranjan PP T
22 pages
Introduction To Computer Programming
No ratings yet
Introduction To Computer Programming
36 pages
CAPSTONE PROJECTInstallation PDF
No ratings yet
CAPSTONE PROJECTInstallation PDF
33 pages
A Jaccards Similarity Score Based Methodology For Kannada Text Document Summarization
No ratings yet
A Jaccards Similarity Score Based Methodology For Kannada Text Document Summarization
4 pages
Api Error Codes
No ratings yet
Api Error Codes
7 pages
Synchro Weld
No ratings yet
Synchro Weld
12 pages
Advance Database Management Systems
100% (1)
Advance Database Management Systems
5 pages
Create WIM Image of Windows XP For System Deployment - VMware Wiki PDF
No ratings yet
Create WIM Image of Windows XP For System Deployment - VMware Wiki PDF
41 pages
Save Text To PDF Extension - Opera Add-Ons
No ratings yet
Save Text To PDF Extension - Opera Add-Ons
1 page
Cisco Nexus 3232C - 215-15147 - A0
No ratings yet
Cisco Nexus 3232C - 215-15147 - A0
9 pages
HMIS for Healthcare Professionals
100% (1)
HMIS for Healthcare Professionals
12 pages
SAP HANA & BW Consultant Resume
No ratings yet
SAP HANA & BW Consultant Resume
3 pages
Source Code Documentation
100% (2)
Source Code Documentation
338 pages
Functions in C (Examples and Practice)
No ratings yet
Functions in C (Examples and Practice)
6 pages

Chapter-1-Introduction To Big Data

Uploaded by

Chapter-1-Introduction To Big Data

Uploaded by

Introduction to Big Data

Ref book: Hadoop Essentials by Shiva Achari, ISBN 978-1-78439-668-8

Describe the concepts of Big Theory

What is Big Data mean?

INTRODUCTION TO BIG DATA 3

INTRODUCTION TO BIG DATA 4

INTRODUCTION TO BIG DATA 5

Data will be distributed and replicated across multiple nodes.

INTRODUCTION TO BIG DATA 7

Big data systems can efficiently handle Semi-structured and unstructured

INTRODUCTION TO BIG DATA 8

Monitoring sensors: Climate or ocean wave monitoring sensors generate data

INTRODUCTION TO BIG DATA 9

Science, genomics, biogeochemical, biological, and other complex and/or

INTRODUCTION TO BIG DATA 10

INTRODUCTION TO BIG DATA 11

INTRODUCTION TO BIG DATA 12

INTRODUCTION TO BIG DATA 13

INTRODUCTION TO BIG DATA 14

INTRODUCTION TO BIG DATA 15

INTRODUCTION TO BIG DATA 16

INTRODUCTION TO BIG DATA 17

INTRODUCTION TO BIG DATA 18

INTRODUCTION TO BIG DATA 19

INTRODUCTION TO BIG DATA 20

INTRODUCTION TO BIG DATA 21

INTRODUCTION TO BIG DATA 22

There are many technological scenarios, and some of them are

INTRODUCTION TO BIG DATA 23

1. Big data as a storage pattern

INTRODUCTION TO BIG DATA 24

INTRODUCTION TO BIG DATA 25

You might also like