0% found this document useful (0 votes)

143 views22 pages

Web Based Data Management of Apache Hive

This document provides an overview of Apache Hive, including: 1. Hive is a data warehouse infrastructure built on Hadoop that allows SQL queries to be compiled as MapReduce jobs and run on a Hadoop cluster. It is suitable for semi-structured and structured data. 2. Hive stores schema information in a metastore database and processes and stores data in HDFS. It compiles HiveQL (SQL-like) queries into MapReduce jobs. 3. The Hive architecture includes interfaces, a metastore, a query compiler, an execution engine, and HDFS storage. It allows users to interact with data stored in HDFS using SQL-like queries.

Uploaded by

Krupa Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views22 pages

Web Based Data Management of Apache Hive

Uploaded by

Krupa Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

3161607 – Big Data Analytics

WEB BASED DATA

MANAGEMENT OF APACHE HIVE
Submitted To : Prof. Pooja Bhatt

Present By :
Krupa Patel(190633116001)
Maitri Patel(190633116003)
Riya Soni(190633116004)

1
Outlines :
Origin
What is hive?
How Hive works?
Hive Architecture
Working of hive
Execution of Hive
limitations
Apache Hive Vs Pig
Hive Table
Summary

2
Origin:
Hive was Initially developed by Facebook.
 Data was stored in Oracle database every night
ETL(Extract,Transform,Load) was performed on Data
The Data growth was exponential
– By 2006 1TB /Day
– By 2010 10 TB /Day
– By 2013 about 5000,000,000 per day..etc and
there was a need to find some way to manage the data
“effectively”.

3
What is Hive?
Hive is a Data warehouse infrastructure built on top of
Hadoop that can compile SQL Quires as Map Reduce
jobs and run the jobs in the cluster.
Suitable for semi and structured databases.
Capable to deal with different storage and file formats.
 Provides HQL(SQL like Query Language) What Hive
is not.
Does not use complex indexes so do not response in
seconds.
But it scales very well , it works with data of peta byte
order.
4 It is not independent and its performance is tied
How Hive Works?
 Hive Built on top of Hadoop – Think HDFS and Map
Reduce
 Hive stored data in the HDFS
Hive compile SQL Quires into Map Reduce jobs and
run the jobs in the Hadoop cluster.
It stores schema in a database and processed data into
HDFS.
It is designed for OLAP.
We need reports to make operations better not to
conduct and operations.
5
 We use ETL to populate data in DW
Hive Architecture

6
Hive Architecture
User Interface – Hive is a data warehouse infrastructure
software that can create interaction between user and HDFS.

Meta Store – Hive chooses respective database servers to

store the schema or Metadata of tables, databases, columns in
a table, their data types, and HDFS mapping.

HiveQL Process Engine – HiveQL is similar to SQL for

querying on schema info on the Metastore. It is one of the
replacements of traditional approach for MapReduce
program.

7
Hive Architecture
Execution Engine : The conjunction part of HiveQL
process Engine and MapReduce is Hive Execution
Engine. It uses the flavor of MapReduce.

HDFS or HBASE – Hadoop distributed file system or

HBASE are the data storage techniques to store data
into file system. Extreme scalability (up to 100 PB) –
Self-healing storage .

8
Working of Hive :

9
Execution of Hive :
Execute Query : The Hive interface such as
Command Line or Web UI sends query to Driver (any
database driver such as JDBC, ODBC, etc.) to execute.
Get Plan : The driver takes the help of query
compiler that parses the query to check the syntax and
query plan or the requirement of query.
Get Metadata : The compiler sends metadata request
to Meta store (any database).
Send Metadata: Meta store sends metadata as a
response to the compiler.

10
Execution of Hive :
Send Plan : The compiler checks the requirement and
resends the plan to the driver. Up to here, the parsing
and compiling of a query is complete.
Execute Plan: The driver sends the execute plan to
the execution engine.
Execute Job: The execution engine sends the job to
JobTracker, which is in Name node and it assigns this
job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.

11
Execution of Hive :
Metadata Ops : Meanwhile in execution, the
execution engine can execute metadata operations with
Meta store.

Fetch Result : The execution engine receives the

results from Data nodes.

Send Results : The execution engine sends those

resultant values to the driver. The driver sends the
results to Hive Interfaces.
12
Limitations:
The biggest limitation of Hadoop is that one have to
use M/R model (Map-Reduce Model). Other
limitations are as stated below:
* Not Reusable
* Error prone
* Multiple stage of Map/Reduce functions for complex
jobs.
*It’s just like asking a developer to write physical
execution plan in the DB.

13
Apache Hive vs. Apache Pig

14
Hive Table:
A Hive Table: -

Data: file or group of files in HDFS .

Schema: in the form of metadata stored in a relational

database

You have to define a schema if you have existing data in

HDFS that you want to use in Hive.

Schema and Data are separate.

15
Defining a Table

16
Managing Table

17
Loading Data
 Use LOAD DATA to import data into Hive Table.
 Use the word OVERWRITE to write over a file of the same
name

18
Insert Data
 Use INSERT statement to populate data into a table from
another Hive table.
 Overwrite is used to replace the data in the table, Otherwise
the data is appended to the table

19
Performing Queries (HiveQL):
SELECT

20
Summary

21
Thank You…!

Hive for Big Data Professionals
No ratings yet
Hive for Big Data Professionals
17 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Apache Hive for Big Data Processing
No ratings yet
Apache Hive for Big Data Processing
19 pages
Apache Hive: Structure & Data Analysis
No ratings yet
Apache Hive: Structure & Data Analysis
25 pages
01 Introduction To Hive (1) 2 15
No ratings yet
01 Introduction To Hive (1) 2 15
14 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Hive
No ratings yet
Hive
49 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
HIVE
No ratings yet
HIVE
7 pages
Hadoop Ecosystem: Hive and MapReduce
No ratings yet
Hadoop Ecosystem: Hive and MapReduce
14 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
1 - Introduction
No ratings yet
1 - Introduction
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
182 pages
Big Data: Week - 11
No ratings yet
Big Data: Week - 11
28 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
HIVE
No ratings yet
HIVE
18 pages
Hive
No ratings yet
Hive
30 pages
Assignment 4-Gcc: Hive Is Not
No ratings yet
Assignment 4-Gcc: Hive Is Not
3 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Hive
No ratings yet
Hive
52 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
7 Hive
No ratings yet
7 Hive
30 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Hive
No ratings yet
Hive
12 pages
Introduction To Hive-5
No ratings yet
Introduction To Hive-5
4 pages
Apache Hive: Data Warehousing on Hadoop
No ratings yet
Apache Hive: Data Warehousing on Hadoop
28 pages
What Is Hive
No ratings yet
What Is Hive
4 pages
Unit 4 Hadoop Ecosystem - HIVE and PIG
No ratings yet
Unit 4 Hadoop Ecosystem - HIVE and PIG
157 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Hive
No ratings yet
Hive
5 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
Bigdata Lecture 5
No ratings yet
Bigdata Lecture 5
19 pages
Bda Report
No ratings yet
Bda Report
16 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Hive
No ratings yet
Hive
28 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Architecture and Working of Hive
No ratings yet
Architecture and Working of Hive
7 pages
Hive - Self Learning Notes
No ratings yet
Hive - Self Learning Notes
69 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
BDA Answers
No ratings yet
BDA Answers
10 pages
Week 14 Hive
No ratings yet
Week 14 Hive
6 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
Hive Architecture
No ratings yet
Hive Architecture
7 pages
Unit 3 Hive Overview and Architecture
No ratings yet
Unit 3 Hive Overview and Architecture
5 pages
Hive Database & Analytics Guide
No ratings yet
Hive Database & Analytics Guide
10 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
A Project Report On Web Based Data Management
No ratings yet
A Project Report On Web Based Data Management
16 pages
LinkedIn's Data Ecosystem for ML
No ratings yet
LinkedIn's Data Ecosystem for ML
22 pages
Linkedin: Big Data in Social Media
No ratings yet
Linkedin: Big Data in Social Media
22 pages
Image Caption Generator
100% (1)
Image Caption Generator
20 pages
Introduction to Utilitarianism
No ratings yet
Introduction to Utilitarianism
26 pages
PE 1 Assignment
No ratings yet
PE 1 Assignment
1 page
Technical Note: Operating A Movidrive B Using Two DIO11B Option Cards
No ratings yet
Technical Note: Operating A Movidrive B Using Two DIO11B Option Cards
7 pages
Coal-Assignment 3
No ratings yet
Coal-Assignment 3
11 pages
Bladesinger
No ratings yet
Bladesinger
4 pages
MYTHOLOGY and FOLKLORE
No ratings yet
MYTHOLOGY and FOLKLORE
59 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
Ntop-1 Worksheet-2 Grade-Iv Maths - 20230819 - 212930
No ratings yet
Ntop-1 Worksheet-2 Grade-Iv Maths - 20230819 - 212930
4 pages
Homeopathy's Chronic Disease Insights
100% (4)
Homeopathy's Chronic Disease Insights
143 pages
Mir Mustafa Ali
No ratings yet
Mir Mustafa Ali
2 pages
CIT 1101 Introduction To Programming Odinary Exam Print
No ratings yet
CIT 1101 Introduction To Programming Odinary Exam Print
4 pages
JD - SoftwareDeveloper - Jivu Infosolutions Software Development Company
No ratings yet
JD - SoftwareDeveloper - Jivu Infosolutions Software Development Company
2 pages
Learning Area Grade Level Quarter Date
0% (1)
Learning Area Grade Level Quarter Date
6 pages
IRC Bot for IP and Zip Code Lookup
No ratings yet
IRC Bot for IP and Zip Code Lookup
18 pages
Tenses & Drill and Substitution
No ratings yet
Tenses & Drill and Substitution
8 pages
Ims DB Concepts
No ratings yet
Ims DB Concepts
112 pages
Methods of Bible Study
No ratings yet
Methods of Bible Study
9 pages
Vooma Paybill Application Form
No ratings yet
Vooma Paybill Application Form
2 pages
4019 OXE User Guide PDF
No ratings yet
4019 OXE User Guide PDF
2 pages
Tài liệu bồi dưỡng học sinh giỏi tiếng Anh lớp 7
100% (1)
Tài liệu bồi dưỡng học sinh giỏi tiếng Anh lớp 7
26 pages
PI 100 Lecture Notes
No ratings yet
PI 100 Lecture Notes
4 pages
Subject Verb Agreement: Falculan Twins' Review Center
No ratings yet
Subject Verb Agreement: Falculan Twins' Review Center
1 page
Qgis Tutorial
No ratings yet
Qgis Tutorial
53 pages
XML Parsing Techniques in Java
No ratings yet
XML Parsing Techniques in Java
44 pages
Software Engineering Unit-1
No ratings yet
Software Engineering Unit-1
30 pages
Business English Teaching Insights
No ratings yet
Business English Teaching Insights
53 pages
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
No ratings yet
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
10 pages
6fc5103-0ab03-1aa3 Siemens Manual Datasheet PDF
No ratings yet
6fc5103-0ab03-1aa3 Siemens Manual Datasheet PDF
267 pages
1400 GMAT Vocabulary Flashcards
No ratings yet
1400 GMAT Vocabulary Flashcards
100 pages
MT Solution 28feb25
No ratings yet
MT Solution 28feb25
7 pages
Occult Symbolism Explained
100% (1)
Occult Symbolism Explained
26 pages
Exam Preparation Guide: Week 10
No ratings yet
Exam Preparation Guide: Week 10
98 pages

Web Based Data Management of Apache Hive

Uploaded by

Web Based Data Management of Apache Hive

Uploaded by

3161607 – Big Data Analytics

WEB BASED DATA

Meta Store – Hive chooses respective database servers to

HiveQL Process Engine – HiveQL is similar to SQL for

HDFS or HBASE – Hadoop distributed file system or

Fetch Result : The execution engine receives the

Send Results : The execution engine sends those

Data: file or group of files in HDFS .

Schema: in the form of metadata stored in a relational

You have to define a schema if you have existing data in

Schema and Data are separate.

You might also like