Big Data Technology
Dashboard / My courses / 8
Bridge Course in BIG DATA Technologies Your progress
A programme to create an up-skilling / re-skilling ecosystem in BIG DATA Technologies to facilitate continuous enhancement of skills as well
as knowledge of IT professionals in line with their aspirations and aptitude.
Instructions for Bridge Course:
Course Overview:
1. The Total course duration:110 Hours in 60 days.
Theory: 35 Hours
Lab: 45 hours
Project: 15 hours
Offline: 15 hours
2. A total of 15 hours session with faculty will be conducted for discussions on:
• Case study for each module
• Latest industry trends
• Assignment and Clearing of concepts
3. A dedicated 15 hours for Project work and Online demo of the same will be given by the candidate.
4. The modules in the course will be enabled based on the candidate’s progress i.e., the next module will be accessible only, when the previous
module is completed.
5. Doubt Clearing Session: 2 Modes
• Online Meeting: Monday, Thursday (4 PM to 5PM) (Working days).
• Mail: To be responded by CDAC Noida team within 24 hours (working days).
6. Eligibility Criteria to Qualify for Assessment:
• The candidate should complete at least 80% of the course to be eligible for the assessment.
• The candidates should attempt quiz and assignment after each module, with 60% passing rate.
For Contact:Tushar Patnaik: tusharpatnaik@cdac.in
About the Course:
The training program in Big Data Analytics course is designed with the objective to develop multi-disciplinary skill-sets in Big Data
Technologies, enhancing technical acumen in policy & decision-making roles and steering big projects efficiently with speed, scale and agility
across sectors. The course includes working on advanced framework of Hadoop, MapReduce programming technique, Hive,
Predictive Analytics Using Python, Visualization, Statistical Techniques and many more alongwith Practical and real-time Case studies on
Cloudera. The participants who have successfully undergone the training programme would be provided with a certification.
The programme would run in online /blended mode, which is its salient feature, and aims to facilitate the continuous enhancement of skills,
create awareness amongst IT professionals about the impact of evolving technological ecosystem and train them with relevant skill sets.
Applications:
With most business transactions and customer conversations happening online, a huge amount of data is being generated every day. This data
holds the secret to improving business processes and identifying new innovations in the process. Big Data is a collection of data that is huge in
volume, yet growing exponentially with time. It is data with so large size and complexity that none of the traditional data management tools can
store it or process it efficiently. Big Data techniques can help classify, sort, and analyses data to find hidden insights. Completing a course in
Big Data will equip you with the knowledge and expertise to analyze Big Data. Big data has found many applications today in various fields like
banking, agriculture, chemistry, data mining, cloud computing, finance, marketing, healthcare, stock exchanges, social media sites, jet engines
etc.
Outcomes:
To make data driven decisions Data Analytics & Predictive Analytics Industry ready & Have edge over their peers After completing this courses
candidates shall be expert in: Studying Big Data will broaden their horizon by Surpassing Market Forecast and uncover new opportunities.
Course USP:
Hadoop HDFS and YARN MapReduce framework and pig Apache Hive NoSQL MongoDB Tableau Spark Scala Operations Python Data Science
and Data Analytics using Python.
What you will get?
Joint Certification from MeitY, Govt. of India & NASSCOM Highly qualified faculties and experts from Academia and Industry, those will be
imparting knowledge by sharing their expertise and experiences in Big Data Technologies. A concept of ‘skills passport’ is introduced for
learners which they acquire during their re-skilling/ up-skilling journey with incentives and badges. A learner successfully completing the course
would be given a badge and the same will be added to his 'Skills Passport' which will act as a repository for his future employability. The
badges and certificates earned, can also be shared with peer networks such as the LinkedIn network of professionals, resulting in further value
for the learners.
Job Profile:
Business Analyst - Provide end to end solution to the client Solution Architect - Convert business solution into technical requirement
Data Integrator - Combines data from different sources to present it collectively
Data Architect - Collection, storage, transfer and extraction of data
Data Analyst - Using software to analyze data to develop insights
Data Scientist - Using technology to analyze data for making decisions Career opportunities in Big Data are numerous and identifying which is
more suitable for any given individual depends on interests, career path, skills and abilities.
Some well-known Big Data career paths are Data Analyst, Data Scientist, Big Data Engineer, Data Modeler, Solution Architect and many more.
With the rising demand that industries are witnessing, it is an ideal time to add Big data skills to your curriculum vitae and offer yourself the
wings to fly in the job market with the ample of Big Data jobs available today.
Centre Preference for Offline / Online mentoring
Course Introduction
Big Data and Hadoop
Unit 1.1.1 Introduction to Big Data Technologies
Unit 1.1.2 Introduction to Big Data and Big Data Challenges
Unit 1.1.3 Big Data Applications
Unit 1.1.4 Types of Big Data Technologies
Unit 1.2.1 Introdution to Hadoop
Unit 1.2.2 Limitations and Solutions to Big Data Architecture
Unit 1.2.3 What is Hadoop
Unit 1.2.4 Hadoop Ecosystem
Unit 1.2.5 Hadoop Features
Unit 1.2.6 Hadoop Core Components Part-1
Unit 1.2.7 Hadoop Core Components Part-2
Unit 1.2.8 HDFS Architecture
Unit 1.2.9 HDFS Daemons Properties
Unit 1.2.10 Replication and Rack Awareness
Unit 1.2.11 Rack Awareness Example-20210528T041941Z-001
Unit 1.2.12 HDFS Read Write Operation Part-1-20210528T041956Z-001
Unit 1.3.1 Hadoop Architecture and HDFS
Unit 1.3.2 Hadoop 2.x Cluster Architecture
Unit 1.3.3 Federation and High Availability Architecture
Unit 1.3.4 Hadoop Cluster Modes
Unit 1.3.5.1 Common Hadoop Shell Commands Part-I
Unit 1.3.5.2 Common Hadoop Shell Commands Part-II
Unit 1.3.6.1 Common Hadoop Shell Commands Part-III
Unit 1.3.6.2 Common Hadoop Shell Commands Part-IV
Unit 1.3.7 Hadoop 2.x Configuration Files
Unit 1.3.10 Basic Hadoop Administration
Unit 1.4.1 Hadoop Map Reduce Framework
Unit 1.4.2 Why Map Reduce
Unit 1.4.3 YARN Components and Architecture
Unit 1.4.4 YARN MapReduce Application Execution Flow
Unit 1.4.5 Anatomy of Map Reduce Program
Unit 1.4.6 Input Splits, Relation between Input Splits and HDFS Blocks
Unit 1.4.7 Map Reduce Combiner and Partitioner
Unit 1.4.8 Demo of Map Reduce
Unit 1.5.1 Advanced Hadoop MapReduce
Unit 1.5.2 Counters
Unit 1.5.3 Distributed Cache
Unit 1.5.4 Reduce Join
Unit 1.5.5 Custom Input Format
Unit 1.5.6 Output Formats
Unit 1.5.7 Sequence Input Format
Quiz on Hadoop Eco system
Unit 1.6.1 Apache PIG Learning Objectives
Unit 1.6.2 Introduction to Apache Pig
Unit 1.6.3 Map Reduce vs Pig
Unit 1.6.4 Pig Components and Execution
Unit 1.6.5.1 Pig Latin Programs Fundamentals Part-I
Unit 1.6.5.2 Pig Latin Programs Fundamentals Part-II
Unit 1.7.1 Pig built in functions Learning Objectives
Unit 1.7.2 Introduction to Pig Built In Functions
Unit 1.7.3 Shell and Utility Commands
Unit 1.7.4.1 Pig LATIN Commands PART -I
Unit 1.7.4.2 PIG LATIN COMMANDS PART -I(Continued)
Unit 1.7.5.1 PIG LATIN COMMANDS PART –II
Unit 1.7.5.2 PIG LATIN COMMANDS PART –II(Continued)
Unit 1.7.6.1 PIG UDF
Unit 1.7.6.2 PIG UDF PRACTICAL
Unit 1.7.7 Analytics using PIG
Unit 1.8 Apache Hive Learning Objectives
Unit 1.8.1 Introduction to Apache Hive
Unit 1.8.2 Hive vs Pig
Unit 1.8.3 Hive Architecture and Components
Unit 1.8.4 Limitations of Hive
Unit 1.8.5 Hive Data Types and Data Models
Unit 1.9.1Hive Tables (Managed Tables and External Tables)
Unit 1.9.2 Hive Tables Practical Part-I
Unit 1.9.3 Hive Tables Practical Part-II
Unit 1.9.4 Importing Data Querying Data and Managing Outputs
Unit 1.9.5 Importing Data Querying Data and Managing Outputs(Practical)
Unit 1.9.6 Hive Partition
Unit 1.9.7Hive Partition Practical
Unit 1.9.10 Hive Bucketing
Unit 1.11.1 Overview of RDBMS
Unit 1.11.1 Overview of RDBMS
Unit 1.11.2 Limitations of RDBMS
Unit 1.11.3 Introduction to NoSQL
Unit 1.11.4 NoSQL in Brief Part 1
Unit 1.11.5 NoSQL in Brief Part 2
Unit 1.11.6 SQL and NoSQL A Comparative Study
Unit 1.11.7 Categories of NoSQL Databases
Unit 1.12.1 Introduction to MongoDB
Unit 1.12.2 Installation Guidelines of MongoDB
Unit 1.12.3 Components of MongoDB
Unit 1.12.4 The MongoDB Data Model
Unit 1.12.5 JSON and MongoDB
Unit 1.13.1. Define Database in MongoDB
Unit 1.13.2. Define a Collection in MongoDB
Unit 1.13.3. CRUD Operation in MongoDB- (Create)-Part-1
unit 1.13.4. CRUD Operation in MongoDB- (Create)-Part-2
Unit 1.13.5. CRUD Operation in MongoDB- (Read)
Unit 1.13.6. CRUD Operation in MongoDB- (Update)
Unit 1.13.7. CRUD Operation in MongoDB- (Update)-Part-2
Unit 1.13.8.CRUD Operation in MongoDB- (Delete)
Unit 1.14.1 The find() in MongoDB
Unit 1.14.2 Key projection in find()
Unit 1.14.3 Querying in find()
Unit 1.14.4 Querying in find() part 2
Unit 1.14.5 Querying in find() Part 3
Unit 1.14.6 Find method with limit and skip
Unit 1.14.7 Mongoimport in MongoDB
Unit 1.14.8 Mongoexport
Unit 1.15.1 Aggregation in MongoDB
Unit 1.15.2 Project stage operator in aggregation part 1
Unit 1.15.3 Project stage operator in aggregation part 2
Unit 1.15.4 Group stage operator in MongoDB part 1
Unit 1.15.6 Match stage operator in MongoDB
Unit 1.15.5 Group stage operator in MongoDB part 2
1.15.6 Indexing in MongoDB
Contents on Big Data and Hadoop
Quiz on Hadoop
Restricted Not available unless: You achieve a required score in Quiz on Hadoop Eco system
Click here to join the virtual lab
Click here to join the virtual lab
Working with Sparks
Unit 2.1.0 Learning Objective of Introduction to Apache Spark
Unit 2.1.1 Introduction to Sparks
Unit 2.1.2 Introduction to Apache Spark
Unit 2.1.3Spark Components and Architecture
Unit 2.1.4 Spark deployment Modes
Unit 2.1.5- Learning Objective of Introduction to Spark
Unit 2.1.6 Introduction to Apache Spark animated
Unit 2.1.7 Spark Components and its Architecture animated
Unit 2.1.8Spark Deployment Modes animated
Unit 2.1.9 Introduction to Sparks
Unit 2.1.10 Hands on Sparks 1
Unit 2.1.11 Hands on Sparks 2
Unit 2.1.11 Hands on Sparks 3
Data Science
Unit 3.1.1 Introduction to Numpy
Unit 3.1.2 Installation to Numpy
Unit 3.1.3 Import Numpy
Unit 3.1.4 Creation of array (Part 1)
Unit 3.1.5 Creation of Array ( Part 2)
Unit 3.1.6. Installation Guideline of Anaconda
Unit 3.1.7 Hands on_Numpy
Unit 3.2.1 Introduction to Pandas
Unit 3.2.2 More on Pandas
Unit 3.2.3 Hands on Pandas
Unit 3.2. 4 Hands on handling Missing Data
Unit 3.2.5 Hands on data pre processing
3.3.1 Introduction to Data Visualization in Python
Unit 3.3.2 Hands on Data Visualization through Matplotlib
Unit 3.4.1 Introduction to Machine learning
Unit 3.4.2.1 Types of Learning
Unit 3.4.2.2 Types of Machine learning Algorithm
Unit3.4.3 Uses of machine learning Algorithm
Unit 3.4.4 Introduction to data Science
Unit 3.4.5 Predictive Analytics
Unit 3.4.6 Data preprocessing
Unit 3.5.1 Unsupervised Learning
Unit 3.5.2 Clustering
Unit 3.5.3 K-means Clustering Algorithm
Unit 3.5.4 Working Example of K-Means
Unit 3.5.6 Implementation of K-means Clustering Algorithm
Unit 3.5.5 Practical Implementation of K-Means
Unit 3.5.7 Determine Optimal Value of K
Unit 3.5.8 Elbow Method Implementation
Unit 3.5.9.1 Hierarchical Clustering
Unit 3.5.9.2 Hierarchical Clustering Example
Unit 3.5.10 Linkages in Hierarchical Clustering
Unit 3.5.11 Implementing Agglomerative Clustering
Unit 3.5.12 Measuring Quality of Clusters
Unit 3.5.13 Dimensioanlity Reduction
Unit 3.5.14 Principal Component Analysis
Unit 3.5.15 Steps Used in Calculating PCA
Unit 3.5.16 Implementation of PCA
Unit 3.7.1 Linear Regression
Unit 3.7.2 Implementation of Linear Regression
Unit 3.7.3 Classification
Unit 3.7.4.1 Decision Trees
Unit 3.7.4.2 Decision Tree Classifier
Unit 3.7.5 Specify Test Condition in Decision Tree
Unit 3.7.6.1 Determine Best Split Part 1
Unit 3.7.6.2 Determine the Best Split Part 2
Unit 3.7.7 Stopping Condition
Unit 3.7.8 Implementing DT Algorithm
Unit 3.7.9 Overfitting Underfitting
Unit 3.7.10 Controlling Overfitting
Unit 3.8.1 Naive Bayes Classifier
Unit 3.8.2 Naive Bayes Example
Unit 3.8.3 Types of Naive Bayes Classifier
Unit 3.8.4 Implementation of Naive Bayes Classifier
Unit 3.8.5 K-Nearest Neighbour Classifier
Unit 3.8.6 Knn Implementation
10. Clustering: the Unsupervised Learning
11. Hands on K-mean
12. Hands on Agglomerative
13. Naive Bayes Algorithm
14. Hands on Naive Bayes Algorithm
15. Regression Analysis
16. Hands on Linear Regression
17. Hands on Multiple Linear Regression
Quiz on Data Science
Not available unless:
Restricted
You achieve a required score in Quiz on Hadoop
You achieve a required score in Quiz on Hadoop Eco system
quiz on Big data
Not available unless:
Restricted
You achieve a required score in Quiz on Data Science
You achieve a required score in Quiz on Hadoop Eco system
You achieve a required score in Quiz on Hadoop
You are logged in as Rohit Kumar (Log out)
Home
Data retention summary
Get the mobile app