Training Curriculum for DBMS, Data Modeling,
andDataEngineering(ILTMode)
Phase1:FoundationBuilding
1. DBMS&DataModel
• Mode:Instructor-Led Training (ILT)
• Duration:8hours
• Objective:Understanddatabasemanagementsystems,relationalmodels,and data
modeling concepts.
• Topics:
– IntroductiontoDBMS
– RelationalDataModel
– ERDiagrams
– Normalization(1NF,2NF,3NF,BCNF)
– AdvancedDataModelingConcepts
2. ANSI-SQL
• Mode:Instructor-LedTraining(ILT)withHands-on
• Objective:GainproficiencyinSQLforqueryingandmanagingrelationaldatabases.
• Topics:
– SQLBasics(SELECT,INSERT,UPDATE,DELETE)
– Joins(INNER,LEFT,RIGHT,FULL)
– Aggregations(GROUPBY, HAVING)
– SubqueriesandNestedQueries
– WindowFunctions
– ConstraintsandIndexes
1
3. ANSISQLAssessment
• Mode:Instructor-LedAssessment
• Objective:EvaluateSQLknowledgeandskillsthroughpracticalexercisesand quizzes.
Phase2:DataWarehousingandETLConcepts
4. DWBasics(DataWarehousingBasics)
• Mode:Instructor-Led Training (ILT)
• Objective:Learnthefundamentalsofdatawarehousing.
• Topics:
– DataWarehouseArchitecture
– OLAPvsOLTP
– DimensionalModeling(StarSchema,Snowflake Schema)
5. ETLConcepts
• Mode:Instructor-Led Training (ILT)
• Objective:UnderstandExtract,Transform,Load(ETL) processesandtools.
• Topics:
– ETLPipelineDesign
– DataExtractionTechniques
– DataTransformationandCleansing
– LoadingStrategies(FullLoad,IncrementalLoad)
Phase 3:Programming and Advanced Data Engineer-
ing
6. Python
• Mode:Instructor-LedTraining(ILT)withHands-on
• Objective:DevelopprogrammingskillsinPythonfordataengineeringtasks.
• Topics:
– PythonBasics(Syntax,DataTypes,Loops)
– DataStructures(Lists,Dictionaries,Tuples)
– FileHandling(CSV,JSON,XML)
2
– LibrariesforDataEngineering(Pandas,NumPy,PySpark)
– AutomationScripts
7. PythonAssessment
• Mode:Instructor-LedAssessment
• Objective:EvaluatePythonprogrammingskillsthroughcodingexercises.
8. BigDataHadoop
• Mode:Instructor-Led Training (ILT)
• Duration:50hours
• Objective:LearnthefundamentalsofBigDataandHadoopecosystem.
• Topics:
– HadoopArchitecture(HDFS, MapReduce)
– Hive,Pig,andHBase
– YARNandResourceManagement
– DataProcessingwithSpark
9. AWS DataEngineer(Enhanced)
• Mode:Instructor-LedTraining(ILT)withHands-on
• Objective:Gain expertise AWS Cloud Platform for data engineering, aligned with
industrial standards and certification requirements.
AWS Overview
Introduction to AWS, Usage, core concepts such as elasticity, scalability, virtualization
IaaS, PaaS, SaaS
VPC, Subnet, Security Group, NACL, NAT Instance
Overview of DW and Data Lake related services, such as, EMR, Glue, Redshift, Athena
Region, Availability Zone, Edge Locations
IAM user, roles, policies
S3
Core concepts of object store
S3-storage class, Lifecycle, replication
S3 file handling using Boto3
EC2
EC2 instance type, storage etc
Streming
3
Kinesis Streaming and Kinesis Firehose
DynamoDB
Detailed architecture, NoSQL concepts
DynamoDB data structures, data modelling/design, organization, Key etc
CRUD, Query, Scan. Detailed level
Case Study on a sample schema
Lambda
Handling S3 events
Handling DynamoDB events
Handling SQS, SNS events along with DynamoDB, S3 events
Parsing S3 source files into different format
Redshift
Redshift Architecture, Building blocks
Redshift data structure, Key, Compression
Connecting, Querying and Consuming data using Python
Glue
Introduction to Glue
Create a ETL Workflow in Glue
Writing Custom Script in Glue
Glue as Metadata for Hive
Snowflake
Introduction to Snowflake
Key Concepts & Architecture
Supported Cloud Platforms
Supported Cloud Regions
Snowflake Editions
Snowflake Releases
Overview of Key Features
Overview of the Data Lifecycle
Continuous Data Protection
Connecting to Snowflake
Snowflake Ecosystem
Snowflake Partner Connect
General Configuration (All Clients)
SnowSQL (CLI Client)
Connectors & Drivers
Loading Data into Snowflake
Overview of Data Loading
Summary of Data Loading Features
Data Loading Considerations
4
Preparing to Load Data
Bulk Loading Using COPY
Loading Continuously Using Snowpipe
Loading Using the Web Interface (Limited)
Querying Data in Staged Files
Querying Metadata for Staged Files
Transforming Data During a Load
Unloading Data from Snowflake
Overview of Data Unloading
Summary of Data Unloading Features
Data Unloading Considerations
Preparing to Unload Data
Unloading into a Snowflake Stage
Unloading into Amazon S3
Unloading into Google Cloud Storage
Unloading into Microsoft Azure
Using Snowflake
Classic Web Interface
New Web Interface
Virtual Warehouses
Databases, Tables & Views
Queries
Binary Data
Date & Time Data
Semi-structured Data
Snowflake Time Travel & Fail-safe
Continuous Data Pipelines
Database Replication and Failover/Failback
Sharing Data Securely in Snowflake
Introduction to Secure Data Sharing
Overview of the Product Offerings for Secure Data Sharing
Granting Privileges to Other Roles
Working with Shared Data
Secure Direct Data Share
Snowflake Data Marketplace
Data Exchange
5
Managing Your Snowflake Organization
Introduction to Organizations
Getting Started with Organizations
Managing Accounts in Your Organization
Understanding Organization and Account Names
Managing Your Snowflake Account
Account Identifiers
Trial Accounts
System Usage & Billing
Parameter Management
User Management
Behavior Change Release Management
Managing Security in Snowflake
Summary of Security Features
Authentication
Networking & Private Connectivity
Administration & Authorization
Managing Governance in Snowflake
Summary of Governance Features
Column-level Security
Row-level Security
Access History
Developing Applications in Snowflake
Download and Install the SnowSQL CLI Client
Create a Virtual Warehouse, Sample Database, and Table Using SnowSQL
Load and Query Data Using SnowSQL
Use Python with Snowflake
Managing Your Snowflake Account
System Usage & Billing
Understanding Snowflake Credit and Storage Usage
Understanding Snowflake Data Transfer Billing
Monitoring Account-level Credit and Storage Usage
Working with Resource Monitors
Parameter Management
User Management
Monitoring and Status Page for Snowflake
Managing Multiple Snowflake Accounts
6
10. AWS Cloud Platform - Knowledge Based Assessment [101-
BASICS]
• Mode:Instructor-LedAssessment
• Duration:2hours
• Objective:Assessfoundationalknowledgeof AWS services.
Phase4:ProjectWorkandEvaluation
11. ProjectCaseStudy
• Mode:Instructor-LedProject
• Duration:96hours
• Objective:Applylearnedconceptstoareal-worlddataengineeringproject.
• Deliverables:
– Designandimplementadatapipeline
– Build a data warehouseusing Redshift
– Create reports and dashboards using QuickSight
– Integrate machinelearning models using AIPlatform
12. InterimEvaluation
• Mode:Instructor-LedEvaluation
• Objective:Mid-termevaluationofprojectprogressandconceptualunderstanding.
13. FinalEvaluation
• Mode:Instructor-LedEvaluation
• Objective:Comprehensiveevaluationoftheprojectandoveralllearning.
Phase5:BehavioralandEmergingTechnologies
14. FundamentalsofGenAI
• Mode:Instructor-Led Training (ILT)
• Objective:IntroductiontoGenerativeAIanditsapplicationsindataengineering.
7
15. Qualifier
• Mode:Instructor-LedAssessment
• Duration:16hours
• Objective:Final assessment to certify skills in data engineering and related tech-
nologies.
TotalDuration
• Instructor-LedTraining(ILT):328hours
• Assessments:23hours
• ProjectWork:96hours
• Evaluations:40hours
• Total:487hours
KeyFeaturesofILTMode
• InstructorGuidance:Allmodulesaretaughtbyexperiencedinstructors.
• Hands-onPractice:Labsandexercisesareconductedunderinstructorsupervi-
sion.
• Real-timeDoubtClarification:Immediateresolutionofqueriesduringsessions.
• CollaborativeLearning:Groupactivitiesanddiscussionstoenhanceunder- standing.