KEMBAR78
Introduction To Big Data | PDF | Big Data | Apache Hadoop
0% found this document useful (0 votes)
23 views25 pages

Introduction To Big Data

Big Data refers to large volumes of structured and unstructured data that traditional processing methods cannot handle. It encompasses four key dimensions: volume, variety, velocity, and veracity, highlighting the challenges and opportunities in data management. Technologies like Hadoop and MapReduce are essential for analyzing Big Data, which has applications across various sectors including healthcare, security, and manufacturing.

Uploaded by

hassanali2415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views25 pages

Introduction To Big Data

Big Data refers to large volumes of structured and unstructured data that traditional processing methods cannot handle. It encompasses four key dimensions: volume, variety, velocity, and veracity, highlighting the challenges and opportunities in data management. Technologies like Hadoop and MapReduce are essential for analyzing Big Data, which has applications across various sectors including healthcare, security, and manufacturing.

Uploaded by

hassanali2415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Introduction to Big Data

By: Faizan Irshad


What is Big Data?

Big Data is a phrase used to mean a massive volume of


both structured and unstructured data which is so large that it is
difficult to process using traditional database and software
techniques.
In most enterprise scenarios the volume of data is too big or it
moves too fast or it exceeds current processing capacity.
The Information Continuum
Types of Data

Quantitative Data Qualitative Data


 Measurable  Descriptive
 Collected through  Collected through
measuring things that observation, field work,
have a fixed reality focus groups, interviews,
 Close ended recording or filming
conversations
 Open ended
VOLUME VARIETY
The amount The types
of data The 4 V’s of data
of
VELOCITY Big Data VERACITY
The frequency of The quality
data of data
Volume: scale of data
Volume: scale of data

 90% of today’s data has been created in just the last 2 years

 Every day we create 2.5 quintillion bytes of data

 Most companies in the US have over 100 terabytes (100,000 gigabytes) of


data stored
Variety: different forms of data
Velocity: analysis of streaming data
Veracity: trustworthiness of data

 Origin
 Authenticity
 Trustworthiness
 Completeness
 Integrity
The Structure of Big Data 12

❖ Structured
• Most traditional data sources

❖ Semi-structured
• Many sources of big data

❖ Unstructured
• Video data, audio data
What is Unstructured Data?

Typical human-generated unstructured data includes:


•Text files: Word processing, spreadsheets, presentations, email, logs.
•Email: Email has some internal structure thanks to its metadata, and we sometimes refer to it
as semi-structured. However, its message field is unstructured and traditional analytics
tools cannot parse it.
•Social Media: Data from Facebook, Twitter, LinkedIn.
•Website: YouTube, Instagram, photo sharing sites.
•Mobile data: Text messages, locations, phone recordings.
•Media: MP3, digital photos, audio and video files.

Typical machine-generated unstructured data includes:


•Satellite imagery: Weather data, land forms, military movements.
•Scientific data: Oil and gas exploration, space exploration, seismic imagery, atmospheric data.
•Digital surveillance: Surveillance photos and video.
•Sensor data: Traffic, weather, oceanographic sensors.
Why Big Data

• Growth of Big Data is needed because of:

– Increase of storage capacities

– Increase of processing power

– Availability of data(different data types)


Big Data sources

Users

Application Large and growing


files
(Big data files)
Systems

Sensors
Data generation points Examples

Mobile Devices

Microphones

Readers/Scanners

Science facilities

Programs/ Software

Social Media

Cameras
Sensing devices

 Smartwatches
 Smart jewelry
 Fitness trackers
 Sport watches
 Smart glasses
 Smart clothing…
Technologies of Big Data

 Traditionally, data are stored in relational databases. The data is


required to be extracted periodically according to the needs of
the organization from operational databases. The traditional
processing systems and tools set fall short when it comes to deal
with big data. Therefore new processes and technologies are
required to deal with big data.
 Additional technologies applied on big data are massively-
parallel processing(MPP) databases, Hadoop and MapReduce,
data mining, search –based applications, distributed databases
and file systems.
RDBMS vs. Hadoop
Big Data Analytics
Benefits of Big Data

•Real-time big data isn’t just a process for storing petabytes or


exabytes of data in a data warehouse, It’s about the ability to
make better decisions and take meaningful actions at the right
time.

•Fast forward to the present and technologies like Hadoop give


you the scale and flexibility to store data before you know how
you are going to process it.

•Technologies such as MapReduce, Hive and Impala enable you


to run queries without changing the data structures underneath.
Application Of Big Data analytics
Smarter Multi-channel
Healthcare sales

Homeland Telecom
Security

Trading
Traffic Control Analytics

Search
Manufacturing Quality
Hurdles and Risks
 Unstructured Data (~75% of data in the healthcare
environment)
 Data privacy/security
 Inconsistent, incomplete , unavailable, poor quality or invalid
data
 Poor analysis/analytics leading to erroneous
correlations/conclusions
Thank You

You might also like