Course Code Course Theory Practical Tutorial Theory Practical/ Tutorial Total
Name Oral
ITDO8011 Big Data 03 -- -- 03 -- -- 03
Analytics
Examination Scheme
Theory Marks
Course
Course Code Internal assessment End Term
Name Practical Oral Total
Avg. of 2 Sem. Work
Test1 Test 2
Tests Exam
ITDO8011 Big Data
20 20 20 80 -- -- -- 100
Analytics
Course Objectives:
Sr.No Course Objectives
1 To provide an overview of an exciting growing field of Big Data analytics.
2 To discuss the challenges traditional data mining algorithms face when analyzing Big Data.
3 To introduce the tools required to manage and analyze big data like Hadoop, NoSql MapReduce.
4 To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming
capability.
5 To introduce to the students several types of big data like social media, web graphs and data streams.
6 To enable students to have skills that will help them to solve complex real-world problems in decision support.
Course Outcomes:
Sr. Course Outcomes Cognitive levels of
No attainment as per
Bloom’s Taxonomy
On successful completion, of course, learner/student will be able to:
1 Explain the motivation for big data systems and identify the main sources of Big Data L1,L2,L3
in the real world.
2 Demonstrate an ability to use frameworks like Hadoop, NOSQL to efficiently store, L1,L2,L3
retrieve and process Big Data for Analytics.
3 Implement several Data Intensive tasks using the Map Reduce Paradigm. L1,L2,L3
4 Apply several newer algorithms for Clustering Classifying and finding associations in L1,L2,L3
Big Data.
5 Design algorithms to analyze Big data like streams, Web Graphs and Social Media L6
data.
6 Design and implement successful Recommendation engines for enterprises. L6
Prerequisite: AI and DS
DETAILED SYLLABUS:
Sr. Module Detailed Content Hours CO Mapping
No.
University of Mumbai, B. E. (Information Technology), Rev 2016 286
0 Prerequisite Data Mining, Data Science 02
I Introduction to Introduction to Big Data, Big Data characteristics, types of 03 CO1
Big Data Big Data, Traditional vs. Big Data business approach, Big
Data Challenges, Examples of Big Data in Real Life, Big
Data Applications
Self-learning Topics: Identification of Big Data applications
and its solutions
II Introduction to What is Hadoop? Core Hadoop Components; Hadoop 06 CO2
Big Data Ecosystem; Working with Apache Spark
Frameworks What is NoSQL? NoSQL data architecture patterns: Key-
value stores, Graph stores, Column family (Bigtable) stores,
Document stores, MongoDB
Self-learning Topics:HDFS vs GFS, MongoDB vs other
NoSQL system, Implementation of Apache Spark
III MapReduce MapReduce: The Map Tasks, Grouping by Key, The Reduce 07 CO3
Paradigm Tasks, Combiners, Details of MapReduce Execution, Coping
With Node Failures. Algorithms Using MapReduce: Matrix-
Vector Multiplication by MapReduce , Relational-Algebra
Operations, Computing Selections by MapReduce,
Computing Projections by MapReduce, Union, Intersection,
and Difference by MapReduce, Computing Natural Join by
MapReduce, Grouping and Aggregation by MapReduce,
Matrix Multiplication, Matrix Multiplication with One
MapReduce Step . Illustrating use of MapReduce with use of
real life databases and applications.
Self-learning Topics:Implementation of MapReduce
algorithms like Word count, Matrix-Vector and Matrix-
Matrix algorithm
IV Mining Big Data The Stream Data Model: A DataStream-Management System, 07 CO4
Streams Examples of Stream Sources, Stream Queries, Issues in
Stream Processing. Sampling Data in a Stream : Sampling
Techniques. Filtering Streams: The Bloom Filter Counting
Distinct Elements in a Stream : The Count-Distinct Problem,
The Flajolet-Martin Algorithm, Combining Estimates, Space
Requirements . Counting Ones in a Window: The Cost of
Exact Counts, The Datar-Gionis-Indyk, Motwani Algorithm,
Query Answering in the DGIM Algorithm.
Self-learning Topics: Streaming services like Apache
Kafka/Amazon Kinesis/Google Cloud DataFlow.
Standard spark streaming library.
Integration with IOT devices to capture real time stream data.
V Big Data Mining Frequent Pattern Mining : Handling Larger Datasets in Main 07 CO5
Algorithms Memory Basic Algorithm of Park, Chen, and Yu. The SON
Algorithm and MapReduce. Clustering Algorithms: CURE
Algorithm. Canopy Clustering, Clustering with MapReduce
Classification Algorithms: Overview SVM classifiers,
Parallel SVM, KNearest Neighbor classifications for Big
Data, One Nearest Neighbour.
Self-learning Topics: Standard libraries included with spark
like graphX, MLlib
University of Mumbai, B. E. (Information Technology), Rev 2016 287
VI Big Data Link Analysis : PageRank Definition, Structure of the web, 07 CO6
Analytics dead ends, Using Page rank in a search engine, Efficient
Applications computation of Page Rank: PageRank Iteration Using
MapReduce, Topic sensitive Page Rank, link Spam, Hubs and
Authorities, HITS Algorithm.
Mining Social- Network Graphs : Social Networks as
Graphs, Types , Clustering of Social Network Graphs, Direct
Discovery of Communities, Counting triangles using Map-
Reduce.
Recommendation Engines: A Model for Recommendation
Systems, Content-Based Recommendations, Collaborative
Filtering
Self-learning Topics: Sample applications like social media
feeds, multiplayer game interactions, retail industry, financial
data analysis. Use case like location data, real-time stock
trades, log monitoring etc
Text Books:
1. Anand Rajaraman and Jeff Ullman ―Mining of Massive Datasets‖, Cambridge University Press.
2. Alex Holmes ―Hadoop in Practice‖, Manning Press, Dreamtech Press.
3. Professional NoSQL Paperback, by Shashank Tiwari, Dreamtech Press
4. Rajkumar Buyya, ,Rodrigo N. Calheiros and Amir Vahid Dastjerdi, ―Big Data Principles and Paradigms‖, Morgan Kaufmann
References Books:
1. Analytics in a Big Data World: The Essential Guide to Data Science and its Applications, Bart Baesens , WILEY Big Data
Series.
2. Big Data Analytics with R and Hadoop by Vignesh Prajapati Paperback, Packt Publishing Limited
3. Hadoop: The Definitive Guide by Tom White, O'Reilly Publications
Online References:
1. https://nptel.ac.in/courses/106/104/106104189/
2. https://nptel.ac.in/courses/106106142/
3. https://nptel.ac.in/courses/106105186/
Assessment:
Internal Assessment (IA) for 20 marks:
IA will consist of Two Compulsory Internal Assessment Tests. Approximately 40% to 50% of syllabus content
must be covered in First IA Test and remaining 40% to 50% of syllabus content must be covered in Second IA
Test
Question paper format
Question Paper will comprise of a total of six questions each carrying 20 marks Q.1 will be compulsory and
should cover maximum contents of the syllabus
Remaining questions will be mixed in nature (part (a) and part (b) of each question must be from different
modules. For example, if Q.2 has part (a) from Module 3 then part (b) must be from any other Module randomly
selected from all the modules)
A total of four questions need to be answered.
University of Mumbai, B. E. (Information Technology), Rev 2016 288