BIG DATA
“We swim in a sea of data … and the sea level is rising rapidly.”
Selvarani Mylsamy
1/25/2016
1
Agenda
Why Big Data
What is Big Data
Classification of BigData
Who’s generating Big Data
Characteristics of Big Data
OLTP Vs OLAP
What’s driving Big Data
Predictive Analytics
Big Data Architecture
Tools used in Big Data
Applications of Big Data
1/25/2016
2
Why Big Data?
1/25/2016
3
What is Big Data?
1/25/2016
4
Definition of Big Data
1/25/2016
5
Classification of Big Data
1/25/2016
6
Example of Big Data in Airline Domain
1/25/2016
7
Model has changed
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: All of us are generating data, and all of us are consuming data
1/25/2016
8
Who’s Generating Big Data?
Scientific instruments Mobile devices
(collecting all sorts of data) (tracking all objects all the time)
Social media and networks
(all of us are generating data)
Sensor technology and networks
(measuring all kinds of data)
The ability to manage, analyze, summarize, visualize, and discover knowledge from the collected
data in a timely manner and in a scalable fashion
1/25/2016
9
Who’s Generating Big Data?
1/25/2016
10
Explosion in Quantity of Data
1/25/2016
11
Characteristics of Big Data
1/25/2016
12
Volume
1-Scale (Volume)
Data Volume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
1/25/2016
13
Variety
2-Complexity (Variety)
Various formats, types, and structures
Text, numerical, images, audio, video,
sequences, time series, social media data,
multi-dim arrays, etc…
Static data vs. streaming data
A single application can be
generating/collecting many types of data
To extract knowledge all these types
of data need to linked together
1/25/2016
14
Velocity
3-Speed (Velocity)
Data is being generated fast and need to be processed fast
Online Data Analytics
Late decisions missing opportunities
Examples
E-Promotions:
Based on your current location, your purchase history, what you like
send promotions right now for store next to you
Healthcare monitoring:
sensors monitoring your activities and body
any abnormal measurements require immediate reaction
1/25/2016
15
3Vs
1/25/2016
16
3Vs
Harnessing Big Data
OLTP: Online Transaction Processing (DBMSs)
OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
1/25/2016
17
OLTP Vs OLAP
OLTP OLAP
Data Source Operational: ERP, CRM, legacy Management Information
apps, ... System, Decision Support
System
Typical users Staff Managers, Executives
Horizon Weeks, Months Years
Refresh Immediate Periodic
Data model Entity-relationship Multi-dimensional
Schema Normalized Star
Focus Update Data Retrieval (Reporting Data)
Space Small Large
Queries Simple Complex
1/25/2016
18
OLAP Overview
1/25/2016
19
What’s driving Big Data
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
1/25/2016
20
What’s Predictive Analytics?
Data analysis with mathematical techniques from statistics, data
mining, and machine learning.
Used to uncover hidden patterns that yields competitive advantage
1/25/2016
21
Common customer scenarios for predictive analytics
1/25/2016
22
Advantages of Big Data
Drive incremental revenue
• Predict customer behavior across all channels
• Understand and monetize customer behavior
Improve operational effectiveness
• Machines/sensors: predict failures, network attacks
• Financial risk management: reduce fraud, increase security
Reduce data warehouse cost
• Integrate new data sources without increased database cost
• Provide online access to ‘dark data’
1/25/2016
23
Big Data Architecture
1/25/2016
24
Big Data Features
1/25/2016
25
What is Hadoop?
• Open source framework by Doug cutting in 2006
• Maintained by Apache Software foundation
• Distributed under Apache license
• Linux based set of tools
• Designed to store and process huge volumes of data efficiently
Core Components
1/25/2016
26
Hadoop cluster
1/25/2016
27
Example : Word Count
Count the occurrences of each word in a data set
1/25/2016
28
Example Contd
1/25/2016
29
Applications of Big Data
1/25/2016
30
References
http://web.cs.wpi.edu/~cs525/s13-MYE/
www.itskillsindemand.com
www.brandwatch.com
www.quora.com/Data-Warehousing/
https://svforum.org/.../Big%20Data%20for%20BI%20-%20Beyond%20t
http://www.docme.ru/doc/833138/
https://gigaom.com/
http://cdn.blog.profoundis.com/
1/25/2016
31
1/25/2016
32
1/25/2016
33