Data Analytics (BIT 601)
Course Outcome (CO) | Bloom's Knowledge Level (KL)
---------------------------------------------------
At the end of course, the student will be able to:
CO1: Discuss various concepts of data analytics pipeline - K1, K2
CO2: Apply classification and regression techniques - K3
CO3: Explain and apply mining techniques on streaming data - K2, K3
CO4: Compare different clustering and frequent pattern mining algorithms - K4
CO5: Describe the concept of R programming and implement analytics on Big data using R - K2, K3
DETAILED SYLLABUS (3-0-0)
Unit I: Introduction to Data Analytics
Sources and nature of data, classification of data (structured, semi-structured, unstructured), characteristics
of data, introduction to Big Data platform, need of data analytics, evolution of analytic scalability, analytic
process and tools, analysis vs reporting, modern data analytic tools, applications of data analytics.
Data Analytics Lifecycle:
Need, key roles for successful analytic projects, various phases of data analytics lifecycle - discovery, data
preparation, model planning, model building, communicating results, operationalization.
(08 Lectures)
Unit II: Data Analysis
Regression modeling, multivariate analysis, Bayesian modeling, inference and Bayesian networks, support
vector and kernel methods, analysis of time series: linear systems analysis & nonlinear dynamics, rule
induction, neural networks: learning and generalization, competitive learning, principal component analysis
and neural networks, fuzzy logic: extracting fuzzy models from data, fuzzy decision trees, stochastic search
methods.
(08 Lectures)
Unit III: Mining Data Streams
Introduction to streams concepts, stream data model and architecture, stream computing, sampling data in a
stream, filtering streams, counting distinct elements in a stream, estimating moments, counting oneness in a
window, decaying window, Real-time Analytics Platform (RTAP) applications, Case studies - real time
sentiment analysis, stock market predictions.
(08 Lectures)
Unit IV: Frequent Itemsets and Clustering
Mining frequent itemsets, market based modelling, Apriori algorithm, handling large data sets in main
memory, limited pass algorithm, counting frequent itemsets in a stream, clustering techniques: hierarchical,
K-means, clustering high dimensional data, CLIQUE and ProCLUS, frequent pattern based clustering
methods, clustering in non-euclidean space, clustering for streams and parallelism.
(08 Lectures)
Unit V: Frame Works and Visualization
MapReduce, Hadoop, Pig, Hive, HBase, MapR, Sharding, NoSQL Databases, S3, Hadoop Distributed File
Systems.
Visualization:
visual data analysis techniques, interaction techniques, systems and applications.
Introduction to R:
R graphical user interfaces, data import and export, attribute and data types, descriptive statistics, exploratory
data analysis, visualization before analysis, analytics.
(08 Lectures)
Text books and References:
1. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Data Sets, Cambridge University
Press.
2. John G. Haanraadts, Data Analytics for IT Networks, Pearson Education.
3. Bill Franks, Taming the Big Data Tidal Wave, Wiley & Sons.
4. Michael Minelli et al., Big Data, Big Analytics, Wiley.
5. David Dietrich et al., Data Science and Big Data Analytics, EMC Education.
6. Frank J Ohlhorst, Big Data Analytics, Wiley.
7. Colleen McCue, Data Mining and Predictive Analysis, Elsevier.
8. Michael Berthold et al., Intelligent Data Analysis, Springer.
9. Paul Zikopoulos et al., Understanding Big Data, McGraw Hill.
10. Trevor Hastie et al., The Elements of Statistical Learning, Springer.
11. Mark Gardner, Beginning R, Wiley.
12. Pete Warden, Big Data Glossary, O'Reilly.
13. Glenn J. Myatt, Making Sense of Data, Wiley.
14. Michael Minelli et al., Handbook of Big Data, CRC Press.