Introduction to Big Data
Big Data refers to extremely large and complex
datasets beyond the capacity of traditional data
tools
Characterized by the 3 Vs:
Volume: Massive amounts of data (terabytes to zettabytes)
Velocity: High-speed data generation and processing
Variety: Structured, semi-structured, and unstructured data
Introduction to Big Data
Sources of Big Data include:
Social media, e-commerce, mobile apps, IoT, and business systems
Traditional tools like Excel or SQL struggle with big data
Big Data technologies include Hadoop, Spark, and NoSQL databases
Big Data Analytics refers to advanced techniques for analyzing large
data sets to uncover patterns and trends
Types of data w.r.t structure
Big Data comes in many forms. Categorizing data
helps determine how to store, process, and analyze
it.
Three major types:
Structured Data
Semi-Structured Data
Unstructured Data
What is Structured Data?
Structured data refers to data that is organized in a predefined
format such as rows and columns. It follows a consistent schema,
making it easy to enter, store, query, and analyze.
Data organized into tables with rows and columns.
Follows a predefined schema.
Easy to store, search, and analyze.
Examples: SQL databases, Excel spreadsheets, Transaction records
Characteristics of Structured
Data
High degree of organization (highly organized)
Stored in relational databases
Easily queried using SQL
Used in business operations and reporting
What is Semi-Structured Data?
Semi-structured data doesn't follow a strict tabular format but still
contains tags or markers to separate elements and enforce
hierarchies of records and fields.
Does not conform to a strict table format.
Contains tags, markers, or metadata to separate elements.
More flexible than structured data.
Examples: XML, JSON, Emails, NoSQL databases
Email is an example of semi-structured because it has structured and
unstructured parts: subject, sender, receiver body.
Characteristics of Semi-Structured
Data
Flexible schema (structure can change)
Not stored in traditional relational databases
Partially organized
Easier to analyze than unstructured data
What is Unstructured Data?
No predefined structure or model.
Most difficult to store and analyze.
Requires advanced tools for processing.
Examples: Videos, images, social media, raw sensor
data, PDFs
Characteristics of Unstructured
Data
No schema or organization
Cannot be easily stored in traditional databases
Requires big data tools (Hadoop, Spark, AI/ML) for
analysis
Makes up 80–90% of all data today
Importance of Big Data in MIS
1. Enables organizations to make evidence-based decisions
2. Helps identify patterns and trends that are not visible in small
data
3. Supports customer behavior analysis for personalized marketing
4. Enhances risk management and fraud detection
5. Improves operational efficiency through process optimization
6. Facilitates real-time decision-making and performance tracking
7. Drives innovation by revealing unmet customer needs
8. Supports predictive and prescriptive analytics for strategy
formulation
Big Data Technologies
Hadoop: Open-source framework for distributed
storage and processing
Spark: Fast in-memory data processing engine
NoSQL Databases: Handle unstructured data efficiently
(e.g., MongoDB, Cassandra)
Data Lakes: Store raw, unprocessed data at scale
ETL Tools: Extract, transform, and load data for analysis
Cloud Platforms: AWS, Google Cloud, and Azure
support big data processing and storage
Data Warehouses: Organize structured data for
efficient querying
Applications of Big Data in
Business
Customer Analytics: Understand preferences, predict churn, personalize
experiences
Supply Chain Optimization: Monitor and forecast logistics and inventory
Financial Analysis: Detect fraud, manage risk, improve investment
decisions
Marketing Campaigns: Target audiences more accurately using data
insights
Healthcare Analytics: Enhance patient care and predict disease
outbreaks
HR Analytics: Measure employee performance, reduce turnover, recruit
effectively
Retail Intelligence: Manage pricing, product placement, and demand
forecasting
Challenges in Big Data
Management
1. Data Quality: Inaccurate or inconsistent data affects
reliability
2. Data Security: Risk of breaches, leaks, and
unauthorized access
3. Integration Issues: Difficulty in consolidating data from
multiple sources
4. Data Storage: High cost and complexity of managing
large volumes
5. Skilled Workforce: Need for data scientists and big data
engineers
6. Regulatory Compliance: GDPR and other laws
demand responsible data handling
Case Study: Netflix and Big
Data
Netflix uses Big Data to:
Recommend personalized content to users
Decide which original shows to produce
Analyze viewer behavior for engagement strategies
Optimize streaming quality across devices and
regions
Reduce churn by predicting subscriber
dissatisfaction
Leverages predictive analytics and machine
learning algorithms
Summary of Big Data and
Analytics
Big Data transforms how businesses make decisions
and operate
Unlocks valuable insights from complex and high-
volume data
Supports personalized customer experiences and
operational efficiency
Despite challenges, organizations that harness Big
Data gain a competitive edge
Integration with MIS ensures data-driven culture
across departments