Course Code          Course          Theory      Practical      Tutorial     Theory      Practical/ Tutorial          Total
Name                                                                Oral
ITDO8011             Big Data        03          --             --           03          --         --                03
                     Analytics
                                                                       Examination Scheme
                                              Theory Marks
                        Course
       Course Code                     Internal assessment              End       Term
                        Name                                                               Practical       Oral          Total
                                                     Avg. of 2         Sem.       Work
                                   Test1 Test 2
                                                       Tests           Exam
       ITDO8011      Big Data
                                      20       20          20           80         --         --            --           100
                     Analytics
Course Objectives:
 Sr.No                                                      Course Objectives
   1       To provide an overview of an exciting growing field of Big Data analytics.
   2       To discuss the challenges traditional data mining algorithms face when analyzing Big Data.
   3       To introduce the tools required to manage and analyze big data like Hadoop, NoSql MapReduce.
   4       To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming
           capability.
   5       To introduce to the students several types of big data like social media, web graphs and data streams.
   6       To enable students to have skills that will help them to solve complex real-world problems in decision support.
Course Outcomes:
Sr.                                    Course Outcomes                                             Cognitive levels of
No                                                                                                 attainment as per
                                                                                                   Bloom’s Taxonomy
On successful completion, of course, learner/student will be able to:
 1   Explain the motivation for big data systems and identify the main sources of Big Data         L1,L2,L3
     in the real world.
 2   Demonstrate an ability to use frameworks like Hadoop, NOSQL to efficiently store,             L1,L2,L3
     retrieve and process Big Data for Analytics.
 3   Implement several Data Intensive tasks using the Map Reduce Paradigm.                         L1,L2,L3
 4   Apply several newer algorithms for Clustering Classifying and finding associations in         L1,L2,L3
     Big Data.
 5   Design algorithms to analyze Big data like streams, Web Graphs and Social Media               L6
     data.
 6   Design and implement successful Recommendation engines for enterprises.                       L6
Prerequisite: AI and DS
DETAILED SYLLABUS:
 Sr.        Module                                  Detailed Content                               Hours         CO Mapping
 No.
          University of Mumbai, B. E. (Information Technology), Rev 2016                                           286
0     Prerequisite      Data Mining, Data Science                                        02
I     Introduction to   Introduction to Big Data, Big Data characteristics, types of     03   CO1
      Big Data          Big Data, Traditional vs. Big Data business approach, Big
                        Data Challenges, Examples of Big Data in Real Life, Big
                        Data Applications
                        Self-learning Topics: Identification of Big Data applications
                        and its solutions
II    Introduction to   What is Hadoop? Core Hadoop Components; Hadoop                   06   CO2
      Big Data          Ecosystem; Working with Apache Spark
      Frameworks        What is NoSQL? NoSQL data architecture patterns: Key-
                        value stores, Graph stores, Column family (Bigtable) stores,
                        Document stores, MongoDB
                        Self-learning Topics:HDFS vs GFS, MongoDB vs other
                        NoSQL system, Implementation of Apache Spark
III   MapReduce         MapReduce: The Map Tasks, Grouping by Key, The Reduce            07   CO3
      Paradigm          Tasks, Combiners, Details of MapReduce Execution, Coping
                        With Node Failures. Algorithms Using MapReduce: Matrix-
                        Vector Multiplication by MapReduce , Relational-Algebra
                        Operations, Computing Selections by MapReduce,
                        Computing Projections by MapReduce, Union, Intersection,
                        and Difference by MapReduce, Computing Natural Join by
                        MapReduce, Grouping and Aggregation by MapReduce,
                        Matrix Multiplication, Matrix Multiplication with One
                        MapReduce Step . Illustrating use of MapReduce with use of
                        real life databases and applications.
                        Self-learning Topics:Implementation of MapReduce
                        algorithms like Word count, Matrix-Vector and Matrix-
                        Matrix algorithm
IV    Mining Big Data   The Stream Data Model: A DataStream-Management System,           07   CO4
      Streams           Examples of Stream Sources, Stream Queries, Issues in
                        Stream Processing. Sampling Data in a Stream : Sampling
                        Techniques. Filtering Streams: The Bloom Filter Counting
                        Distinct Elements in a Stream : The Count-Distinct Problem,
                        The Flajolet-Martin Algorithm, Combining Estimates, Space
                        Requirements . Counting Ones in a Window: The Cost of
                        Exact Counts, The Datar-Gionis-Indyk, Motwani Algorithm,
                        Query Answering in the DGIM Algorithm.
                        Self-learning Topics: Streaming services like Apache
                        Kafka/Amazon Kinesis/Google Cloud DataFlow.
                        Standard spark streaming library.
                        Integration with IOT devices to capture real time stream data.
V     Big Data Mining   Frequent Pattern Mining : Handling Larger Datasets in Main       07   CO5
      Algorithms        Memory Basic Algorithm of Park, Chen, and Yu. The SON
                        Algorithm and MapReduce. Clustering Algorithms: CURE
                        Algorithm. Canopy Clustering, Clustering with MapReduce
                        Classification Algorithms: Overview SVM classifiers,
                        Parallel SVM, KNearest Neighbor classifications for Big
                        Data, One Nearest Neighbour.
                        Self-learning Topics: Standard libraries included with spark
                        like graphX, MLlib
       University of Mumbai, B. E. (Information Technology), Rev 2016                         287
 VI     Big Data            Link Analysis : PageRank Definition, Structure of the web,         07        CO6
        Analytics           dead ends, Using Page rank in a search engine, Efficient
        Applications        computation of Page Rank: PageRank Iteration Using
                            MapReduce, Topic sensitive Page Rank, link Spam, Hubs and
                            Authorities, HITS Algorithm.
                            Mining Social- Network Graphs : Social Networks as
                            Graphs, Types , Clustering of Social Network Graphs, Direct
                            Discovery of Communities, Counting triangles using Map-
                            Reduce.
                            Recommendation Engines: A Model for Recommendation
                            Systems, Content-Based Recommendations, Collaborative
                            Filtering
                            Self-learning Topics: Sample applications like social media
                            feeds, multiplayer game interactions, retail industry, financial
                            data analysis. Use case like location data, real-time stock
                            trades, log monitoring etc
Text Books:
1. Anand Rajaraman and Jeff Ullman ―Mining of Massive Datasets‖, Cambridge University Press.
2. Alex Holmes ―Hadoop in Practice‖, Manning Press, Dreamtech Press.
3. Professional NoSQL Paperback, by Shashank Tiwari, Dreamtech Press
4. Rajkumar Buyya, ,Rodrigo N. Calheiros and Amir Vahid Dastjerdi, ―Big Data Principles and Paradigms‖, Morgan Kaufmann
References Books:
1. Analytics in a Big Data World: The Essential Guide to Data Science and its Applications, Bart Baesens , WILEY Big Data
Series.
2. Big Data Analytics with R and Hadoop by Vignesh Prajapati Paperback, Packt Publishing Limited
3. Hadoop: The Definitive Guide by Tom White, O'Reilly Publications
Online References:
1. https://nptel.ac.in/courses/106/104/106104189/
2. https://nptel.ac.in/courses/106106142/
3. https://nptel.ac.in/courses/106105186/
Assessment:
Internal Assessment (IA) for 20 marks:
            IA will consist of Two Compulsory Internal Assessment Tests. Approximately 40% to 50% of syllabus content
               must be covered in First IA Test and remaining 40% to 50% of syllabus content must be covered in Second IA
               Test
     Question paper format
               Question Paper will comprise of a total of six questions each carrying 20 marks Q.1 will be compulsory and
                should cover maximum contents of the syllabus
               Remaining questions will be mixed in nature (part (a) and part (b) of each question must be from different
                modules. For example, if Q.2 has part (a) from Module 3 then part (b) must be from any other Module randomly
                selected from all the modules)
A total of four questions need to be answered.
          University of Mumbai, B. E. (Information Technology), Rev 2016                                288