Introduction to Big Data
M. Atemkeng (Rhodes)
CONTENT
What is Big Data
What is an example of Big Data
Why is Big Data Important?
Big Data Analytics
Benefits of Big Data Analytics
Types of Big Data
Characteristics of Big Data
Primary Source of Big Data
Big Data Tools and Software
Big Data Mining
Top Trends in Big Data
WHAT IS BIG DATA DATA
Big Data is a massive collection of data that is growing exponentially over time
Big Data is a data set that is so large and complex that traditional data
management tools (including traditional machine learning) cannot store or process
it efficiently
Big Data is a type of data that is extremely large in size
WHAT IS AN EXAMPLE OF BIG
DATA?
Combating Cyber Threats Data
Network Traffic Examination
Enhancing Enterprise Protection Data
Cloud Security Monitoring Data
User Behaviour Data
WHY IS BIG DATA IMPORTANT
Compagnies use big data in their system to improve operations, provide better
customer services, create personalized marketing campaign and take other actions
that, ultimately, can increase revenue and profits
Big data is also used by medical researchers to identify disease signs and risk
factors and by doctors to help
Astronomy, cybersecurity, medicine, ecology, etc
BIG DATA ANALYTICS
Big data analytics examines large amounts of data to uncover hidden patterns,
correlations and other insights
Big data analytics helps organization harness their data and use it to identify new
opportunities
That, in turn, leads to smarter business moves, more efficient operations, higher
profits and happier customers
TYPE OF DATA
Structured
Unstructured
Semi-structured
STRUCTURED DATA
Structured data is used to refer to data which is already stored in databases, in an
ordered manner
Two source of structured data: Human-Generated, Machine-Generated
All data received from sensors, antennas, web logs and financial systems are
classified as machine-generated data
Human-generated structured data includes all the data human input to a computer
UN-STRUCTURED DATA
Unstructured data is defined as any data with an unknown form or structure
Aside from its massive size, un-structured data presents several challenges in terms
of processing and extracting value from it
A heterogeneous data source containing a mix of simple text files, images, videos,
and so on is an example of unstructured data
SEMI-STRUCTURED DATA
Semi-structured data can contain both types of information
Semi-structured data appears to be structured, but it is not defined in the same
way that a table definition in a relational database is
A data representation in an XML file is an example of semi-structured data
CHARACTERISTICS OF BIG DATA
VOLUME
The name Big Data itself is related to a size which is enormous
Size of the data plays a very crucial role in determining value out of the data, Also
whether a particular data can be considered as a Big Data or not, is dependent
upon the volume of the data
Hence, Volume is one characteristics which needs to be considered while dealing
with Big Data solutions
For example: Cybersecurity data
VELOCITY
The term “VELOCITY” refers to the speed of generation of data
How fast the data is generated and processed to meet the demands, determines
real potential in the data
Big Data Velocity deals with the speed at which data flow in from sources like
business processes, application logs, networks, and social media sites, sensors,
mobiles devices, antennas, etc
The flow of data is massive and continuous.
VERACITY
When we are dealing with a high volume, velocity and variety of data, it is not
possible that all the data is going to be 100% correct, there will be dirty data
The quality of the data being captured can vary greatly
The data accuracy of analysis depends on the veracity of the source data
VALUE
Value is the most important aspect in the big data
Through the potential value of the big data is huge
It is all well and good having access to big data but unless we can turn it into value
it becomes useless
VARIETY
Big data is not always structured data and it is not always easy to put big data into a
relational database
This means that the category to which big data belongs to is also a very essential
fact that needs to be known by the data analyst
Dealing with a variety of structured and unstructured data greatly increases the
complexity if both storing and processing
90 of data generated is unstructured
A MORE COMPLETE DEFINTION
“Big data is high-volume, high-velocity and high-variety information assets that
demand cost-effective, innovative forms of information processing for enhanced
insight and decision making.” -- Gartner
PRIMARY SOURCE OF BIG DATA
CHALLENGES OF BIG DATA
CHALLENGES OF BIG DATA
CHALLENGES OF BIG DATA
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
Compagnies using HADOOP: https://bigdataanalyticsnews.com/top-12-hadoop-technology-
companies/
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP AS A SOLUTION
HADOOP ECOSYSTEM