KEMBAR78
Big Data Syllabus | PDF | Big Data | Apache Hadoop
0% found this document useful (0 votes)
42 views3 pages

Big Data Syllabus

The document outlines the syllabus for a course on Big Data technologies. It covers topics like introduction to Big Data, distributed file systems, MapReduce frameworks, NoSQL databases, indexing and searching large datasets. Specific modules will discuss Google File System, Hadoop environment, functional programming applied to Big Data, and use cases of technologies like Elasticsearch, HBase and MongoDB. Lectures will explain fundamental concepts, architectures, optimization techniques, and real-world applications of systems and algorithms for large-scale data processing.

Uploaded by

Angel Dahal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views3 pages

Big Data Syllabus

The document outlines the syllabus for a course on Big Data technologies. It covers topics like introduction to Big Data, distributed file systems, MapReduce frameworks, NoSQL databases, indexing and searching large datasets. Specific modules will discuss Google File System, Hadoop environment, functional programming applied to Big Data, and use cases of technologies like Elasticsearch, HBase and MongoDB. Lectures will explain fundamental concepts, architectures, optimization techniques, and real-world applications of systems and algorithms for large-scale data processing.

Uploaded by

Angel Dahal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

 Introduction to Big Data (7 hours)

1. Big Data Overview


2. Background of Data Analytics
3. Role of Distributed System in Big Data
4. Role of data Scientist
5. Current Trend in Big Data Analytics
 Google File System (7 hours)
1. Architecture
2. Availability
3. Fault tolerance
4. Optimization for large scale data
 Map Framework (10 hours)
1. Basics of functional programming
2. Fundamentals of functional programming
3. Real world problems modeling in functional style
4. Map reduce fundamentals
5. Data Flow (Architecture)
6. Real world problems
7. Scalability goal
8. Fault tolerance
9. Optimization and data locality
10. Parallel Efficiency of Map-Reduce
 NoSQL (6 hours)
1. Structured and Unstructured Data
2. Taxonomy and NoSQL Implementation
3. Discussion of basic architecture of Hbase, Cassandra and MongoDb
 Searching and Indexing Big Data
1. Full text Indexing and Searching
2. Indexing with Lucene
3. Distributed Searching with Elastic search
 Case Study Hadoop
1. Introduction to Hadoop Environment
2. Data Flow
3. Hadoop I/O
4. Query Languages for Hadoop
5. Hadoop and Amazon Cloud

Based on the syllabus you provided, here are some possible questions that you might be asked:
1. Introduction to Big Data:
 What is Big Data, and why is it important in today's world?
 Explain the background of data analytics and its significance in understanding Big
Data.
 Discuss the role of distributed systems in handling Big Data. How do they
contribute to managing large volumes of data?
 What are the responsibilities and skills required for a data scientist in the context
of Big Data?
 Describe the current trends in Big Data analytics. How are technologies evolving
to address emerging challenges?
2. Google File System (GFS):
 What is the architecture of Google File System (GFS)? How does it facilitate the
storage and processing of large-scale data?
 Explain the concepts of availability and fault tolerance in the context of GFS.
 How is GFS optimized to handle large-scale data processing?
 Discuss the role of GFS in supporting distributed computing and data-intensive
applications.
3. Map Framework:
 What are the basics of functional programming, and how are they relevant to the
Map framework?
 Explain the fundamentals of MapReduce and its role in processing large-scale
data.
 How can real-world problems be modeled using functional programming
paradigms?
 Describe the architecture of MapReduce and its data flow. What are the
scalability goals and fault tolerance mechanisms?
 Discuss optimization techniques and data locality considerations in MapReduce.
4. NoSQL:
 Differentiate between structured and unstructured data. Why is NoSQL important
for handling such data types?
 Provide an overview of the taxonomy of NoSQL databases and their
implementations.
 Discuss the basic architecture of HBase, Cassandra, and MongoDB. How do they
differ in terms of data storage and retrieval?
5. Searching and Indexing Big Data:
 Explain the concept of full-text indexing and searching. How is it applied in
handling Big Data?
 Discuss the role of Lucene in indexing and searching large volumes of data.
 How does distributed searching with technologies like Elasticsearch contribute to
efficient data retrieval in Big Data environments?
6. Case Study: Hadoop:
 Introduce the Hadoop environment and its components. How does it support
large-scale data processing?
 Describe the data flow in Hadoop and its I/O operations.
 What query languages are commonly used for Hadoop? Discuss their advantages
and limitations.
 How does Hadoop integrate with cloud platforms like Amazon Web Services
(AWS)? What are the benefits of deploying Hadoop in the cloud?

You might also like