KEMBAR78
Selvarani Mylsamy: "We Swim in A Sea of Data and The Sea Level Is Rising Rapidly." | PDF | Big Data | Analytics
0% found this document useful (0 votes)
74 views33 pages

Selvarani Mylsamy: "We Swim in A Sea of Data and The Sea Level Is Rising Rapidly."

The document discusses big data, including: 1) Big data refers to the massive amounts of data being generated from various sources like scientific instruments, mobile devices, and social media. 2) Big data is characterized by its volume, variety, and velocity - it is growing exponentially in size and being generated in many different formats at high speeds. 3) Hadoop is an open source framework that can be used to efficiently store and process big data across large clusters of commodity hardware. It utilizes MapReduce for distributed processing of large datasets in parallel.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views33 pages

Selvarani Mylsamy: "We Swim in A Sea of Data and The Sea Level Is Rising Rapidly."

The document discusses big data, including: 1) Big data refers to the massive amounts of data being generated from various sources like scientific instruments, mobile devices, and social media. 2) Big data is characterized by its volume, variety, and velocity - it is growing exponentially in size and being generated in many different formats at high speeds. 3) Hadoop is an open source framework that can be used to efficiently store and process big data across large clusters of commodity hardware. It utilizes MapReduce for distributed processing of large datasets in parallel.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

BIG DATA

“We swim in a sea of data … and the sea level is rising rapidly.”

Selvarani Mylsamy
1/25/2016
1
Agenda

Why Big Data


What is Big Data
Classification of BigData
Who’s generating Big Data
Characteristics of Big Data
OLTP Vs OLAP
What’s driving Big Data
Predictive Analytics
Big Data Architecture
Tools used in Big Data
Applications of Big Data

1/25/2016
2
Why Big Data?

1/25/2016
3
What is Big Data?

1/25/2016
4
Definition of Big Data

1/25/2016
5
Classification of Big Data

1/25/2016
6
Example of Big Data in Airline Domain

1/25/2016
7
Model has changed

 The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: All of us are generating data, and all of us are consuming data

1/25/2016
8
Who’s Generating Big Data?

Scientific instruments Mobile devices


(collecting all sorts of data) (tracking all objects all the time)

Social media and networks


(all of us are generating data)

Sensor technology and networks


(measuring all kinds of data)
 The ability to manage, analyze, summarize, visualize, and discover knowledge from the collected
data in a timely manner and in a scalable fashion

1/25/2016
9
Who’s Generating Big Data?

1/25/2016
10
Explosion in Quantity of Data

1/25/2016
11
Characteristics of Big Data

1/25/2016
12
Volume
1-Scale (Volume)

 Data Volume
 44x increase from 2009 2020

 From 0.8 zettabytes to 35zb

 Data volume is increasing exponentially

1/25/2016
13
Variety
2-Complexity (Variety)
 Various formats, types, and structures

 Text, numerical, images, audio, video,


sequences, time series, social media data,
multi-dim arrays, etc…

 Static data vs. streaming data

 A single application can be


generating/collecting many types of data

To extract knowledge all these types


of data need to linked together

1/25/2016
14
Velocity
3-Speed (Velocity)

Data is being generated fast and need to be processed fast


Online Data Analytics
Late decisions  missing opportunities

Examples

E-Promotions:
Based on your current location, your purchase history, what you like
 send promotions right now for store next to you

Healthcare monitoring:
sensors monitoring your activities and body
 any abnormal measurements require immediate reaction

1/25/2016
15
3Vs

1/25/2016
16
3Vs
Harnessing Big Data

 OLTP: Online Transaction Processing (DBMSs)


 OLAP: Online Analytical Processing (Data Warehousing)
 RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

1/25/2016
17
OLTP Vs OLAP
OLTP OLAP

Data Source Operational: ERP, CRM, legacy Management Information


apps, ... System, Decision Support
System
Typical users Staff Managers, Executives

Horizon Weeks, Months Years


Refresh Immediate Periodic
Data model Entity-relationship Multi-dimensional
Schema Normalized Star
Focus Update Data Retrieval (Reporting Data)
Space Small Large
Queries Simple Complex

1/25/2016
18
OLAP Overview

1/25/2016
19
What’s driving Big Data

- Optimizations and predictive analytics


- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets

1/25/2016
20
What’s Predictive Analytics?
 Data analysis with mathematical techniques from statistics, data
mining, and machine learning.

 Used to uncover hidden patterns that yields competitive advantage

1/25/2016
21
Common customer scenarios for predictive analytics

1/25/2016
22
Advantages of Big Data

Drive incremental revenue


• Predict customer behavior across all channels
• Understand and monetize customer behavior

Improve operational effectiveness


• Machines/sensors: predict failures, network attacks
• Financial risk management: reduce fraud, increase security

Reduce data warehouse cost


• Integrate new data sources without increased database cost
• Provide online access to ‘dark data’

1/25/2016
23
Big Data Architecture

1/25/2016
24
Big Data Features

1/25/2016
25
What is Hadoop?
• Open source framework by Doug cutting in 2006
• Maintained by Apache Software foundation
• Distributed under Apache license
• Linux based set of tools
• Designed to store and process huge volumes of data efficiently
Core Components

1/25/2016
26
Hadoop cluster

1/25/2016
27
Example : Word Count
Count the occurrences of each word in a data set

1/25/2016
28
Example Contd

1/25/2016
29
Applications of Big Data

1/25/2016
30
References
http://web.cs.wpi.edu/~cs525/s13-MYE/

www.itskillsindemand.com

www.brandwatch.com

www.quora.com/Data-Warehousing/

https://svforum.org/.../Big%20Data%20for%20BI%20-%20Beyond%20t

http://www.docme.ru/doc/833138/

https://gigaom.com/

http://cdn.blog.profoundis.com/

1/25/2016
31
1/25/2016
32
1/25/2016
33

You might also like