KEMBAR78
Big Data Syllabus | PDF | Big Data | Apache Hadoop
0% found this document useful (0 votes)
49 views2 pages

Big Data Syllabus

CSR223 is an introductory course on Big Data, covering concepts such as the data life cycle, analytics methodologies, and statistical analysis. Students will learn data preparation, visualization techniques, and the use of various Big Data tools like Hadoop and MongoDB. The course includes practical experiments to reinforce theoretical knowledge and enhance hands-on skills in data manipulation and analysis.

Uploaded by

saiirctc786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views2 pages

Big Data Syllabus

CSR223 is an introductory course on Big Data, covering concepts such as the data life cycle, analytics methodologies, and statistical analysis. Students will learn data preparation, visualization techniques, and the use of various Big Data tools like Hadoop and MongoDB. The course includes practical experiments to reinforce theoretical knowledge and enhance hands-on skills in data manipulation and analysis.

Uploaded by

saiirctc786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CSR223:INTRODUCTION TO BIG DATA

L:2 T:0 P:2 Credits:3

Course Outcomes: Through this course students should be able to

CO1 :: demonstrate the different concepts of Big Data with Data life cycle.

CO2 :: apply basic strategies of the big data analytics and types of analytics.

CO3 :: infer the concept of statistical analysis.

CO4 :: evaluate the data preparation and modification task.

CO5 :: apply the data visualization and statistical analysis.

CO6 :: Explore the use of appropriate big data tools

Unit I
Overview of Big Data : introduction to big data, evolution of big data types and characteristics,
problem with traditional databases, basic architecture, application, advantages of big data processing,
data life cycle: business understanding, data understanding, data preparation, modelling, evaluation,
deployment
Unit II
Big Data analytics : methodology of big data analytics, introduction to data warehousing and data
mart, ETL and ELT, OLAP and OLTP, traditional analytics vs big data analytics, types of analytics
(prescriptive, predictive, descriptive) with examples, technologies for handling big data
Unit III
Introduction to statistical analysis : data modification: modifying data values, compute, selecting
cases, data and variables and their types, analysis and analytics, statistical analysis: introduction to
statistical analysis, levels of measurement, univariate, bivariate and multivariate analysis, parametric
and non-parametric tests
Unit IV
Data preparation and visualization : identifying duplicates and restructuring data, aggregating
data, merging files, basics of MS Excel software: removing duplicates, filtering, aggregation, pivoting
and visualization, data visualization: creating and editing charts (bar graph, pie chart, histogram, box
plot, scatter plot, line graph), crosstab, pivot table, outliers
Unit V
Introduction to Big Data tools : introduction to big data tools, hadoop distributed file system
(HDFS) architecture and operations, the MapReduce framework and its execution workflow, YARN for
resource management, Apache Hive for SQL-like querying, Apache Pig for data flow scripting, Apache
HBase as a column-oriented NoSQL database, integration of Hadoop with traditional databases and
data warehousing tools
Unit VI
Advanced Big Data tools and databases : brief introduction to Apache Spark, NoSQL databases:
MongoDB, Mongosh, Compass, configuration of Compass and Mongosh, CRUD Operations in MongoDB

List of Practicals / Experiments:

List of practicals
• Installation and configuration of Mongo DB and Mongosh

• Creation of database and Collection

• Implementation of CRUD operations

• Insert command and uploading multiple documents in MongoDB

• Performing Query operations (relation and logical operators) in Mongo DB

• Implementation of Update and Delete Document.

• Implementation of aggregation in Mogo DB

• Implementation of data preprocessing and modification

• Implementation of data visualization in python

Session 2024-25 Page:1/2


• Implementation of simple linear regression and Multiple Regression Analysis

References:
1. BIG DATA ANALYTICS by S CHANDRAMOULI ET AL, UNIVERSITIES PRESS PVT. LTD

2. SPSS STATISTICS FOR DATA ANALYSIS AND VISUALIZATION by KEITH MCCORMIK JESUS
SALCEDO, WILEY
3. BIG DATA ANALYTICS by VENKAT ANKAM, PACKT PUBLISHING

Session 2024-25 Page:2/2

You might also like