UNIT TITLE HOURS
I Introduction to Big Data 9
Introduction to Big Data, Big Data characteristics, Challenges of Conventional System, Types of Big Data,
Intelligent data analysis, Traditional vs. Big Data business approach, Case Study of Big Data Solutions,
Hadoop architecture: HDFS, Namenode/Datanode, block replication, Setting up and configuring HDFS in
standalone/pseudo-distributed mode, HDFS commands and data ingestion best practices, Hadoop
ecosystem: YARN, MapReduce framework overview, Data ingestion patterns: Sqoop for RDBMS, Flume
for streaming
UNIT TITLE HOURS
II MapReduce Development & Hive/Pig 9
MapReduce pipeline: Mapper, MapReduce pipeline: Mapper, Reducer, Combiner, Partitioner
Reducer, Combiner, Partitioner,
Data formats: Writables, Data formats: Writables, SequenceFile, Avro, Parquet
SequenceFile, Avro, Parquet, Hive
architecture, HiveQL: table Hive architecture, HiveQL: table creation, partitions, UDFs
creation, partitions, UDFs, Pig
Latin: scripting, data flow
Pig Latin: scripting, data flow operators, performance
operators, performance
considerations considerations
UNIT TITLE HOURS
III Apache Spark for Batch & Real-Time Processing 9
Spark cluster architecture: driver, executors, master, RDD vs DataFrame vs Dataset abstractions, Spark
SQL and DataFrame transformations & actions, Spark Streaming: micro-batch processing, MLlib
introduction: basic ML pipelines
UNIT TITLE HOURS
IV NoSQL, Kafka & Real-Time Analytics 9
NoSQL database models: key-value, document, column-family, graph, Cassandra data modelling and
architecture, MongoDB CRUD operations and indexing strategies, Kafka architecture: producers,
consumers, partitions, Integration of Kafka‐Spark for real-time processing
UNIT TITLE HOURS
V Visualization, Optimization & Cloud Deployment 9
Data visualization approaches using Zeppelin, Jupyter, or Grafana, Spark optimization: shuffles, caching,
partitioning strategies, Hadoop & Spark deployment models: standalone, YARN, Mesos, Kubernetes,
Integration with cloud services: AWS EMR, Azure HDInsight, End-to-end workflow orchestration using
Oozie or Airflow
TOTAL HOURS : 45