Big Data and Data Analytics
What is Big Data?
To understand Big Data, let us first understand small data.
• Small Data: Small data refers to datasets that are easily comprehensible by people as
they are easily accessible, informative, and actionable. This will help the business to
make useful information and make better choices in everyday tasks. For example, a
small store might track daily sales to decide what products to restock.
• Big Data: Big Data refers to extremely large and complex datasets that regular
computer programs and databases cannot handle. It comes from three main sources:
transactional data (online purchases), machine data (sensor readings), and social data
(social media posts). To analyze and use Big Data effectively, special tools and
techniques are required. For example, companies like Amazon and Netflix use Big
Data to recommend products or shows based on users’ past activities.
Types of Big Data
There are three different types of data:
a. Structured Data
b. Semi Structured Data
c. Unstructured Data
Aspect Structured Data Semi-Structured Data Unstructured Data
Quantitative data with A mix of quantitative and No inherent structures
Definition
a defined structure qualitative properties or formal rules
May lack a specific data Lacks a consistent data
Data Model Dedicated data model
model model
Organized in clearly Less organized than No organization exhibits
Organization
defined columns structured data variability over time
Easily accessible and Accessible but may be Accessibility depends on
Accessibility
searchable harder to analyze the specific data format
Customer information, XML files, CSV files, JSON Audio files, images,
Examples transaction records, files, HTML files, semi- video files, emails, PDFs,
product directories structured documents social media posts
Advantages and Disadvantages of Big Data:
Advantage
• Enhanced Decision Making: Organisations can make data-driven decisions based on
the insights derived from big data.
• Improved efficiency and productivity: Big data analysis can help the organisation to
improve the efficiency and productivity of the business.
• Better Customer Insights: Big data can help the organisation to gain a deeper
understanding of customer behaviour, preferences, and needs.
• Competitive Advantage: Leveraging big data analytics provides organisations with a
competitive edge by enabling them to uncover market trends, identify opportunities,
and stay ahead of competitors.
• Innovation and Growth: Big Data fosters innovation by facilitating the development
of new products, services, and business models based on insights derived from data
analysis, driving business growth and expansion.
Disadvantages
• Privacy and Security Concerns: It is difficult to collect, store and analysis the big data.
Managing big data is also having a security and privacy concerns due to unauthorized
access, data breaches, and misuse of personal information.
• Data Quality Issues: Ensuring the accuracy, reliability, and completeness of data can
be challenging, as Big Data often consists of unstructured and heterogeneous data
sources, leading to potential errors and biases in analysis.
• Technical Complexity: Implementing and managing Big Data infrastructure and
analytics tools require specialized skills and expertise, leading to technical challenges
and resource constraints for organizations.
• Regulatory Compliance: Organizations face challenges in meeting data protection
laws like GDPR (General Data Protection Regulation) and The Digital Personal Data
Protection Act, 2023. These laws require strict handling of personal data, making
compliance essential to avoid legal risks and penalties.
• Cost and Resource Intensiveness: The cost of acquiring, storing, processing, and
analyzing Big Data, along with hiring skilled staff, can be high. This is especially
challenging for smaller organizations with limited budgets and resources.
Characteristics of Big Data
The “characteristics of Big Data” refer to the defining attributes that distinguish large and
complex datasets from traditional data sources. These characteristics are commonly
described using the “3Vs” framework: Volume, Velocity, and Variety. The 6Vs framework
provides a holistic view of Big Data, emphasizing not only its volume, velocity, and variety
but also its veracity, variability, and value.
• Velocity: Velocity refers to the speed at which data is generated, delivered, and
analyzed. For example: Google alone generates more than 40,000 search queries per
second.
• Volume: Every day a huge volume of data is generated as the number of people using
online platforms has increased exponentially. Such a huge volume of data is
considered Big Data. According to the surveys conducted in varuious organization
estimates, 328.77 million terabytes of data are created each day.
• Variety: Big data encompasses data in various formats, including structured,
unstructured, semi-structured, or highly complex structured data. These can range
from simple numerical data to complex and diverse forms such as text, images, audio,
videos, and so on.
• Veracity: Veracity is a characteristic in Big Data related to consistency, accuracy,
quality, and trustworthiness. Not all data that undergoes processing holds value.
Therefore, it is essential to clean data effectively before storing or processing it.
• Value: The goal of big data analysis lies in extracting business value from the data.
Hence, the business value derived from big data is perhaps its most critical
characteristic. Without obtaining valuable insights, the other characteristics of big
data hold little significance.
• Variability: This refers to establishing if the contextualizing structure of the data
stream is regular and dependable even in conditions of extreme unpredictability. It
defines the need to get meaningful data considering all possible circumstances.
Big Data Analytics
Data analytics involves analysing datasets to uncover insights, trends, and patterns.
Technologies commonly used in data analytics include statistical analysis software, data
visualisation tools, and relational database management systems (RDBMS). Big data
analytics uses advanced analytic techniques against huge, diverse datasets that include
structured, semi-structured, and unstructured data from different sources and in various
sizes from terabytes to zettabytes. Big Data Analytics emerges as a consequence of four
significant global trends:
1. Moore’s Law: Moore’s Law has enabled the handling and analysis of massive
datasets, driving the evolution of Big Data Analytics.
2. Mobile Computing: Due to smartphones and mobile devices, the vast amount of real-
time data can be collected from anywhere.
3. Social Networking: Social media platforms use data sharing, interactions, and user-
generated content for collecting massive datasets, which helps to analyse the data
easily.
4. Cloud Computing: Cloud computing allows organisations to access hardware and
software resources remotely via the Internet, which reduces the investments in
software and hardware.
Working on Big Data Analytics
Big data analytics involves collecting, processing, cleaning, and analyzing enormous datasets
to improve organizational operations. The working process of big data analytics includes the
following steps –
Step 1: Gather data
Each company has a unique approach to data collection. Organizations can now collect data
from various sources, including cloud storage, mobile apps, and IoT sensors.
Step 2: Process Data
Once data is collected and stored, it must be processed properly to get accurate result on
analytical queries. Various processing options are available:
• Batch processing which looks at large data blocks over time.
• Stream processing looks at small batches of data at once, shortening the delay time
between collection and analysis for quicker decision-making.
Step 3: Clean Data
All data should be clean for better results. Cleaning data helps to eliminate duplicate or
irrelevant data.
Step 4: Analyze Data
Getting big takes time to analyse or make usable; advanced analytics processes can turn big
data into big insights.
Mining Data Streams
A data stream is a continuous, real-time flow of data generated by various sources like
sensors, satellite images, the internet, and web traffic, etc. Mining data streams refers to
the process of extracting meaningful patterns, trends, and knowledge from a continuous
flow of real-time data. For instance, a sudden spike in searches for “election results” on a
particular day might indicate that elections were recently held in a region or highlight the
level of public interest in the results.
Future of Big Data Analytics
• Real-Time Analytics: It will allow businesses to process real-time data for decision-
making, such as live monitoring customer behaviour or tracking supply chain
activities.
• Development of Advanced Models in Predictive Analytics: Predictive analytics will
help to integrate more sophisticated machine learning algorithms to enable the
forecasting of trends and behaviours with greater precision.
• Quantum Computing: Quantum computers will be able to solve complex problems
much faster than classical computers.