0% found this document useful (0 votes)

80 views7 pages

Big Data Technologies

The document outlines suggested teaching guidelines for a Big Data Technologies course, covering topics such as Hadoop, HDFS, Map Reduce, HBase, Hive, and Spark, along with lab assignments for practical experience. It also includes a section on Data Visualization and Reporting, detailing evaluation methods and recommended textbooks. The course aims to provide a comprehensive understanding of big data technologies and their applications in data analytics and visualization.

Uploaded by

Raghav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views7 pages

Big Data Technologies

Uploaded by

Raghav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ACTS, Pune

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA August
2024
Introduction to Hadoop
o A Brief History of Hadoop
o Evolution of Hadoop
o Introduction to Hadoop and its components
o Comparison with Other Systems
o Hadoop Releases
o Hadoop Distributions and Vendors
Hadoop Distributed File System (HDFS)

Session: 4 & 5
Hadoop Distributed File System (HDFS)
o Distributed File System
o What is HDFS
o Where does HDFS fit in
o Core components of HDFS
o HDFS Daemons
o Hadoop Server Roles: Name Node, Secondary Name Node, and Data
Node
HDFS Architecture
o HDFS Architecture
o Scaling and Rebalancing
o Replication
o Rack Awareness
o Data Pipelining,
o Node Failure Management.
o HDFS High Availability NameNode

Lab-Assignment:
o Run the HDFS commands, and add a one liner understanding
for each of the command.
o Execute the provided code using HDFS, step run and understand

Session: 6
Getting Started: Hadoop Installation
o Hadoop Operation modes
o Setting up a Hadoop Cluster
o Cluster specification
o Single and Multi-Node Cluster Setup on Virtual & Physical Machines,
o Remote Login using Putty/Mac Terminal/Ubuntu Terminal.
o Hadoop Configuration, Security in Hadoop, Administering Hadoop,
o HDFS – Monitoring & Maintenance, Hadoop benchmarks,
o Hadoop in the cloud.

Session: 7
Hadoop Architecture
o Hadoop Architecture,
o Core components of Hadoop,
o Common Hadoop Shell commands.

PG-DBDA Page 2 of 7
ACTS, Pune

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA August
2024
Session: 8
HDFS Data Storage Process
o HDFS Data storage process,
o Anatomy of writing and reading file in HDFS,
o Handling Read/Write failures
o HDFS user and admin commands,
o HDFS Web Interface.

Session: 9
Getting in touch with Map Reduce Framework
o Hadoop Map Reduce paradigm,
o Map and Reduce tasks,
o Map Reduce Execution Framework,
o Map Reduce Daemons
o Anatomy of a Map Reduce Job run
More Map Reduce Concepts
o Partitioners and Combiners,
o Input Formats (Input Splits and Records, Text Input, Binary Input,
Multiple Inputs),
o Output Formats (Text Output, Binary Output, Multiple Output).
o Distributed Cache

Session: 10
Basics of Map Reduce Programming
o Hadoop Data Types,
o Java and Map Reduce,
o Map Reduce program structure,
o Map-only program, Reduce-only program,
o Use of combiner and partitioner,
o Counters, Schedulers (Job Scheduling),
o Custom Writables, Compression

Lab-Assignment:
o Execute the train data example.
o Execute the train data example using chained methods.

Session: 11
Map Reduce Streaming
o Complex Map Reduce programming,
o Map Reduce streaming,
o Python and Map Reduce,
o Map Reduce on image dataset

Hadoop ETL
Session: 12

o Hadoop ETL Development,

o ETL Process in Hadoop,
o Discussion of ETL functions,

PG-DBDA Page 3 of 7
ACTS, Pune

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA August
2024
o Data Extractions,
o Need of ETL tools,
o Advantages of ETL tools.

Lab-Assignment:
o Understand the file formats and read the provided links

Session: 13
Introduction to HBase
o Overview of HBase
o HBase architecture
o Installation

Session: 14 & 15
The HBaseAdmin and HBase Security
o Various Operations on Tables
o HBase general command and shell,
o java client API for HBase
o Admin API
o CRUD operations
o Client API
o HBase – Scan, Count and Truncate
o HBase Security

Lab-Assignment:
o Run the Hbase shell commands
o Run the HBase using Java client

Session: 16
The Hive Data-ware House
o Introduction to Hive,
o Hive architecture and Installation,
o Comparison with Traditional Database,
o Basics of Hive Query Language.

Session: 17
Working with Hive QL
o Datatypes,
o Operators and Functions,
o Hive Tables (Managed Tables and Extended Tables),
o Partitions and Buckets,
o Storage Formats,
o Importing data,
o Altering and Dropping Tables

PG-DBDA Page 4 of 7
ACTS, Pune

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA August
2024
Lab-Assignment:
o Creative a hive DB and table ( internal and external )
o Load the data into hive table (using local inpath and HSFS inpath)

Session: 18
Querying with Hive QL
o Querying Data-Sorting,
o Aggregating,
o Map Reduce Scripts,
o Joins and Sub queries,
o Views,
o Map and Reduce side joins to optimize query.

Lab-Assignment:
o Run all the types of joins in Hive
o Execute the data to be partitioned

Session: 19
More on Hive QL
o Data manipulation with Hive,
o UDFs,
o Appending data into existing Hive table,
o custom map/reduce in Hive
o Writing HQL scripts

Session: 20, 21 & 22

o Introduction to Data Warehousing and Data Lakes
o Designing Data warehousing for an ETL Data Pipeline
o Designing Data Lakes for an ETL Data Pipeline
o ETL vs ELT
o Fundamentals of Airflow/Informatica
o Work management with Airflow/ Informatica
o Automating an entire Data Pipeline with Airflow/Informatica

Lab-Assignment:
o Create an airflow DAG/ Informatica for Extract -> Transform -> Load

Session: 23, 24 & 25

Apache Spark APIs for large-scale data processing
o Overview, Linking with Spark, Initializing Spark,
o Resilient Distributed Datasets (RDDs), External Datasets
o RDD v/s Data frames v/s Datasets
o Data frame operations
o Structured Spark Streaming
o Passing Functions to Spark, Working with Key-Value Pairs, Shuffle
operations,
o RDD Persistence, Removing Data, Shared Variables, Deploying to a Cluster

PG-DBDA Page 5 of 7
ACTS, Pune

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA August
2024
Lab-Assignment:
o Run the provided Hadoop Streaming program using python

Session: 26
o Map Reduce with Spark
o Working with Spark with Hadoop
o Working with Spark without Hadoop and their Differences

Lab Assignment
o Execute all the provided code using step-runs for each and every
codeline

o Setup the JDBC configuration and run the Spark JDBC Connectivity
program
o Run the spark integrations using the provided code

Session: 27
o Data preprocessing
o EDA

Session: 28 & 29
o Introduction to Kafka
o Working with Kafka using Spark
o Spark streaming Architecture
o Spark Streaming APIs
o Building Stream Processing Application with Spark

Lab Assignment
o Execute the spark streaming with Kafka

Session: 30
o Setting up Kafka Producer and Consumer
o Kafka Connect API

Session: 31
o Spark SQL

Lab Assignment
o Run the sparkSQL programs using step-runs for each and every
codeline
o Run all the SparkSQL programs
o Analyse the election data using spark and provide analysis

Session: 32 & 33
o Spark MLlib
o Predictive Analysis

PG-DBDA Page 6 of 7
ACTS, Pune

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA August
2024
Lab Assignment:
o Deep Learning with Spark
o Connecting DB’s with Spark
o Accessing and manipulating the DB’s
o Demo: Capstone Project
o Create a complex workflow using bash operator, a simple workflow
using python
o Create Using python airflow operator to read data from your local
drive, ingest the data into your HDFS, and perform a spark WC

PG-DBDA Page 7 of 7
ACTS, Pune

Suggested Teaching Guidelines for

Data Visualization - Analysis and Reporting
PG-DBDA August 2024

Duration: 26 Classroom hours + 24 Lab hours

Objective: To introduce students in Data Analytics, Visualization and Reporting

Prerequisites: Knowledge of Database Fundamentals and Big Data

Technologies.

Evaluation method: Theory exam– 40% weightage

Lab exam – 40% weightage
Internal exam– 20% weightage

List of Books / Other training material

Text Book:
1. Communicating Data with Tableau, Ben Jones, O'Reilly, Shroff Publishers &
Distributors,Tableau 8.1.

Reference Book:

1. Mastering Microsoft Power BI: Expert Techniques for Effective Data Analytics
and Business Intelligence Book by Brett Powell
2. Designing Data Visualizations, by Steele,O'Reilly
3. Tableau your data, by Daniel G/ Wiley
4. Graphs Cookbook, Hrishi V. Mittal, Packt Publishing
5. Python Data Visualization Cookbook,Igor Milovanović, Packt Publishing
6. Learning Python Data Visualization, Chad Adams, Packt Publishing
7. Data Visualization with D3.js Cookbook,Nick Qui Zhu,Packt Publishing
8. Getting Started with D3,Mike Dewar,O'Reilly
9. Data Visualization with JavaScript
10. Data Visualization for Dummies
11. High Impact Data Visualization with Power View, Power Map, and Power BI
12. The Visual Organization: Data Visualization, Big Data, and the Quest for Better
Decisions
13. Mastering Tableau 2021:- by Marleen Meier

Note:
o Tool to be use: Tableau

Session 1 & 2:
o Business Intelligence basic,
o Information gathering,
o Decision making,
o Managing BI,

PG-DBDA
Page 1 of 3

Big Data Technologies PG-DBDA March 2022
No ratings yet
Big Data Technologies PG-DBDA March 2022
8 pages
Big Data Technologies PG-DBDA September 2023: ACTS, Pune
No ratings yet
Big Data Technologies PG-DBDA September 2023: ACTS, Pune
6 pages
Data Collection DBMS
No ratings yet
Data Collection DBMS
6 pages
Syllabus
No ratings yet
Syllabus
7 pages
Cap456-Introduction To Big Data
No ratings yet
Cap456-Introduction To Big Data
1 page
Big Data Analytics Course Outline (Fall 2020) : Dr. Tariq Mahmood 830 Am - 11 Am (Monday) Scope
No ratings yet
Big Data Analytics Course Outline (Fall 2020) : Dr. Tariq Mahmood 830 Am - 11 Am (Monday) Scope
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Introduction of Subject
No ratings yet
Introduction of Subject
28 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
2 pages
17cs17 - Vcs314 - Big Data Systems
No ratings yet
17cs17 - Vcs314 - Big Data Systems
5 pages
CIT 4401big Data Analytics Course Outline
No ratings yet
CIT 4401big Data Analytics Course Outline
5 pages
AIADS 7th Sem Syllabus Signed
No ratings yet
AIADS 7th Sem Syllabus Signed
19 pages
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
2 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
2 pages
Big Data
No ratings yet
Big Data
2 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Appendix-74
No ratings yet
Appendix-74
42 pages
BCA - 409 Syallabus
No ratings yet
BCA - 409 Syallabus
2 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
19 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
3 pages
Big Data Technologies Course Outline
No ratings yet
Big Data Technologies Course Outline
2 pages
BD Course Handout (Spring 2024)
No ratings yet
BD Course Handout (Spring 2024)
4 pages
Coursera Report Divyansh Sahai CSF443
No ratings yet
Coursera Report Divyansh Sahai CSF443
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Syllabus of Course Big Data Integration
No ratings yet
Syllabus of Course Big Data Integration
9 pages
Data Visualization
No ratings yet
Data Visualization
3 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
Koe097big Data
No ratings yet
Koe097big Data
1 page
BDA Syllabus
No ratings yet
BDA Syllabus
4 pages
Bigdata
No ratings yet
Bigdata
2 pages
PCAC2009
No ratings yet
PCAC2009
3 pages
Syllabus
No ratings yet
Syllabus
3 pages
CSET 371 Course File
No ratings yet
CSET 371 Course File
81 pages
BDA - Lab Manual
No ratings yet
BDA - Lab Manual
78 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
Final Lesson Plan
No ratings yet
Final Lesson Plan
8 pages
Bda U1
No ratings yet
Bda U1
80 pages
Data Science and Big Data Analytics - Unit - 1
No ratings yet
Data Science and Big Data Analytics - Unit - 1
47 pages
BIG DATA ANALYTICS - Syllabus
No ratings yet
BIG DATA ANALYTICS - Syllabus
4 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
2 pages
Big Data-2
No ratings yet
Big Data-2
3 pages
BDA2023 Outline
No ratings yet
BDA2023 Outline
7 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
CourseCurriculum (8) - 1
No ratings yet
CourseCurriculum (8) - 1
3 pages
Big Data Course: Hadoop & Analytics
No ratings yet
Big Data Course: Hadoop & Analytics
2 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
Data Bots Training Courses
100% (1)
Data Bots Training Courses
36 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
3 pages
Getting To Know Apache Kafka's Architecture: Ryan Plant
No ratings yet
Getting To Know Apache Kafka's Architecture: Ryan Plant
23 pages
Apache Kafka: Trusted Event Streaming Platform
No ratings yet
Apache Kafka: Trusted Event Streaming Platform
4 pages
OpenEdge 12 Product Availability Guide
No ratings yet
OpenEdge 12 Product Availability Guide
20 pages
Santosh Goud - Senior AWS Big Data Engineer
No ratings yet
Santosh Goud - Senior AWS Big Data Engineer
9 pages
Effective Kafka A Hands On Guide To Building Robust and Scalable Event Driven Applications With Code Examples in Java 1st Edition Emil Koutanov PDF Download
No ratings yet
Effective Kafka A Hands On Guide To Building Robust and Scalable Event Driven Applications With Code Examples in Java 1st Edition Emil Koutanov PDF Download
52 pages
Software Engineer Resume - Akshat Gupta
No ratings yet
Software Engineer Resume - Akshat Gupta
1 page
Module-1 Data Analytics in Healthcare Systems
No ratings yet
Module-1 Data Analytics in Healthcare Systems
23 pages
Google - Professional Data Engineer.v2022 05 17.q108
No ratings yet
Google - Professional Data Engineer.v2022 05 17.q108
62 pages
Rohith P S SRE
No ratings yet
Rohith P S SRE
9 pages
Kafka+CLI +Consumer+Summary
No ratings yet
Kafka+CLI +Consumer+Summary
6 pages
Cribl Course: Deployment & Data Management
No ratings yet
Cribl Course: Deployment & Data Management
9 pages
6.4 Installation Guide For On-Prem
No ratings yet
6.4 Installation Guide For On-Prem
64 pages
Lakshmana Resume
No ratings yet
Lakshmana Resume
7 pages
Jaichand Java
No ratings yet
Jaichand Java
9 pages
Python SOA Design: Protocols & API Versioning
No ratings yet
Python SOA Design: Protocols & API Versioning
10 pages
Microservices Backend for Analytics
No ratings yet
Microservices Backend for Analytics
3 pages
Modern Financial Architecture Overview
100% (1)
Modern Financial Architecture Overview
56 pages
Big Data Architecture Group 1 PROJECT
No ratings yet
Big Data Architecture Group 1 PROJECT
55 pages
MDM Introduction Session1
No ratings yet
MDM Introduction Session1
21 pages
Data Analyst 3
No ratings yet
Data Analyst 3
5 pages
Devoxxpl2023 c4 Model
No ratings yet
Devoxxpl2023 c4 Model
128 pages
Java Developer with 7+ Years Experience
No ratings yet
Java Developer with 7+ Years Experience
5 pages
Messaging Systems for Developers
No ratings yet
Messaging Systems for Developers
8 pages
The Real-Time APIs-1614764210278
No ratings yet
The Real-Time APIs-1614764210278
42 pages
Lambda Architecture for Data Pros
No ratings yet
Lambda Architecture for Data Pros
20 pages
Babel A Generic Benchmarking Platform
No ratings yet
Babel A Generic Benchmarking Platform
10 pages
Sandeep Deshpande - JAVA
No ratings yet
Sandeep Deshpande - JAVA
5 pages
Iot Assignment
No ratings yet
Iot Assignment
15 pages
AWS Whitepaper
No ratings yet
AWS Whitepaper
31 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages

Big Data Technologies

Uploaded by

Big Data Technologies

Uploaded by

ACTS, Pune

Suggested Teaching Guidelines for

Suggested Teaching Guidelines for

o Hadoop ETL Development,

Suggested Teaching Guidelines for

Suggested Teaching Guidelines for

Session: 20, 21 & 22

Session: 23, 24 & 25

Suggested Teaching Guidelines for

Suggested Teaching Guidelines for

Suggested Teaching Guidelines for

Duration: 26 Classroom hours + 24 Lab hours

Objective: To introduce students in Data Analytics, Visualization and Reporting

Prerequisites: Knowledge of Database Fundamentals and Big Data

Evaluation method: Theory exam– 40% weightage

List of Books / Other training material

You might also like