KEMBAR78
AWS Data Eng | PDF | Amazon Web Services | Databases
0% found this document useful (0 votes)
21 views8 pages

AWS Data Eng

The document outlines a comprehensive training curriculum for Database Management Systems (DBMS), Data Modeling, and Data Engineering, structured into five phases. Each phase includes various instructor-led training sessions, assessments, and project work, covering topics such as ANSI-SQL, Data Warehousing, ETL concepts, Python programming, Big Data, AWS, and Snowflake. The total duration of the program is 487 hours, emphasizing hands-on practice and collaborative learning.

Uploaded by

babjeeponnam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

AWS Data Eng

The document outlines a comprehensive training curriculum for Database Management Systems (DBMS), Data Modeling, and Data Engineering, structured into five phases. Each phase includes various instructor-led training sessions, assessments, and project work, covering topics such as ANSI-SQL, Data Warehousing, ETL concepts, Python programming, Big Data, AWS, and Snowflake. The total duration of the program is 487 hours, emphasizing hands-on practice and collaborative learning.

Uploaded by

babjeeponnam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Training Curriculum for DBMS, Data Modeling,

andDataEngineering(ILTMode)

Phase1:FoundationBuilding
1. DBMS&DataModel
• Mode:Instructor-Led Training (ILT)

• Duration:8hours

• Objective:Understanddatabasemanagementsystems,relationalmodels,and data
modeling concepts.

• Topics:

– IntroductiontoDBMS
– RelationalDataModel
– ERDiagrams
– Normalization(1NF,2NF,3NF,BCNF)
– AdvancedDataModelingConcepts

2. ANSI-SQL
• Mode:Instructor-LedTraining(ILT)withHands-on

• Objective:GainproficiencyinSQLforqueryingandmanagingrelationaldatabases.

• Topics:

– SQLBasics(SELECT,INSERT,UPDATE,DELETE)
– Joins(INNER,LEFT,RIGHT,FULL)
– Aggregations(GROUPBY, HAVING)
– SubqueriesandNestedQueries
– WindowFunctions
– ConstraintsandIndexes

1
3. ANSISQLAssessment
• Mode:Instructor-LedAssessment

• Objective:EvaluateSQLknowledgeandskillsthroughpracticalexercisesand quizzes.

Phase2:DataWarehousingandETLConcepts
4. DWBasics(DataWarehousingBasics)
• Mode:Instructor-Led Training (ILT)

• Objective:Learnthefundamentalsofdatawarehousing.

• Topics:

– DataWarehouseArchitecture
– OLAPvsOLTP
– DimensionalModeling(StarSchema,Snowflake Schema)

5. ETLConcepts
• Mode:Instructor-Led Training (ILT)

• Objective:UnderstandExtract,Transform,Load(ETL) processesandtools.

• Topics:

– ETLPipelineDesign
– DataExtractionTechniques
– DataTransformationandCleansing
– LoadingStrategies(FullLoad,IncrementalLoad)

Phase 3:Programming and Advanced Data Engineer-


ing
6. Python
• Mode:Instructor-LedTraining(ILT)withHands-on

• Objective:DevelopprogrammingskillsinPythonfordataengineeringtasks.

• Topics:

– PythonBasics(Syntax,DataTypes,Loops)
– DataStructures(Lists,Dictionaries,Tuples)
– FileHandling(CSV,JSON,XML)

2
– LibrariesforDataEngineering(Pandas,NumPy,PySpark)
– AutomationScripts

7. PythonAssessment
• Mode:Instructor-LedAssessment

• Objective:EvaluatePythonprogrammingskillsthroughcodingexercises.

8. BigDataHadoop
• Mode:Instructor-Led Training (ILT)

• Duration:50hours

• Objective:LearnthefundamentalsofBigDataandHadoopecosystem.

• Topics:

– HadoopArchitecture(HDFS, MapReduce)
– Hive,Pig,andHBase
– YARNandResourceManagement
– DataProcessingwithSpark

9. AWS DataEngineer(Enhanced)
• Mode:Instructor-LedTraining(ILT)withHands-on
• Objective:Gain expertise AWS Cloud Platform for data engineering, aligned with
industrial standards and certification requirements.

AWS Overview

Introduction to AWS, Usage, core concepts such as elasticity, scalability, virtualization


IaaS, PaaS, SaaS
VPC, Subnet, Security Group, NACL, NAT Instance
Overview of DW and Data Lake related services, such as, EMR, Glue, Redshift, Athena
Region, Availability Zone, Edge Locations
IAM user, roles, policies

S3
Core concepts of object store
S3-storage class, Lifecycle, replication
S3 file handling using Boto3
EC2
EC2 instance type, storage etc

Streming

3
Kinesis Streaming and Kinesis Firehose

DynamoDB
Detailed architecture, NoSQL concepts
DynamoDB data structures, data modelling/design, organization, Key etc
CRUD, Query, Scan. Detailed level
Case Study on a sample schema

Lambda

Handling S3 events
Handling DynamoDB events
Handling SQS, SNS events along with DynamoDB, S3 events
Parsing S3 source files into different format

Redshift
Redshift Architecture, Building blocks
Redshift data structure, Key, Compression
Connecting, Querying and Consuming data using Python

Glue
Introduction to Glue
Create a ETL Workflow in Glue
Writing Custom Script in Glue
Glue as Metadata for Hive

Snowflake

Introduction to Snowflake
Key Concepts & Architecture
Supported Cloud Platforms
Supported Cloud Regions
Snowflake Editions
Snowflake Releases
Overview of Key Features
Overview of the Data Lifecycle
Continuous Data Protection

Connecting to Snowflake
Snowflake Ecosystem
Snowflake Partner Connect
General Configuration (All Clients)
SnowSQL (CLI Client)
Connectors & Drivers

Loading Data into Snowflake


Overview of Data Loading
Summary of Data Loading Features
Data Loading Considerations
4
Preparing to Load Data
Bulk Loading Using COPY
Loading Continuously Using Snowpipe
Loading Using the Web Interface (Limited)
Querying Data in Staged Files
Querying Metadata for Staged Files
Transforming Data During a Load

Unloading Data from Snowflake


Overview of Data Unloading
Summary of Data Unloading Features
Data Unloading Considerations
Preparing to Unload Data
Unloading into a Snowflake Stage
Unloading into Amazon S3
Unloading into Google Cloud Storage
Unloading into Microsoft Azure

Using Snowflake
Classic Web Interface
New Web Interface
Virtual Warehouses
Databases, Tables & Views
Queries
Binary Data
Date & Time Data
Semi-structured Data
Snowflake Time Travel & Fail-safe
Continuous Data Pipelines
Database Replication and Failover/Failback

Sharing Data Securely in Snowflake


Introduction to Secure Data Sharing
Overview of the Product Offerings for Secure Data Sharing
Granting Privileges to Other Roles
Working with Shared Data
Secure Direct Data Share
Snowflake Data Marketplace
Data Exchange

5
Managing Your Snowflake Organization
Introduction to Organizations
Getting Started with Organizations
Managing Accounts in Your Organization
Understanding Organization and Account Names

Managing Your Snowflake Account


Account Identifiers
Trial Accounts
System Usage & Billing
Parameter Management
User Management
Behavior Change Release Management

Managing Security in Snowflake


Summary of Security Features
Authentication
Networking & Private Connectivity
Administration & Authorization

Managing Governance in Snowflake


Summary of Governance Features
Column-level Security
Row-level Security
Access History

Developing Applications in Snowflake


Download and Install the SnowSQL CLI Client
Create a Virtual Warehouse, Sample Database, and Table Using SnowSQL
Load and Query Data Using SnowSQL
Use Python with Snowflake

Managing Your Snowflake Account


System Usage & Billing
Understanding Snowflake Credit and Storage Usage
Understanding Snowflake Data Transfer Billing
Monitoring Account-level Credit and Storage Usage
Working with Resource Monitors
Parameter Management
User Management
Monitoring and Status Page for Snowflake
Managing Multiple Snowflake Accounts

6
10. AWS Cloud Platform - Knowledge Based Assessment [101-
BASICS]
• Mode:Instructor-LedAssessment
• Duration:2hours
• Objective:Assessfoundationalknowledgeof AWS services.

Phase4:ProjectWorkandEvaluation
11. ProjectCaseStudy
• Mode:Instructor-LedProject
• Duration:96hours
• Objective:Applylearnedconceptstoareal-worlddataengineeringproject.
• Deliverables:
– Designandimplementadatapipeline
– Build a data warehouseusing Redshift
– Create reports and dashboards using QuickSight
– Integrate machinelearning models using AIPlatform

12. InterimEvaluation
• Mode:Instructor-LedEvaluation
• Objective:Mid-termevaluationofprojectprogressandconceptualunderstanding.

13. FinalEvaluation
• Mode:Instructor-LedEvaluation
• Objective:Comprehensiveevaluationoftheprojectandoveralllearning.

Phase5:BehavioralandEmergingTechnologies
14. FundamentalsofGenAI
• Mode:Instructor-Led Training (ILT)
• Objective:IntroductiontoGenerativeAIanditsapplicationsindataengineering.

7
15. Qualifier
• Mode:Instructor-LedAssessment

• Duration:16hours

• Objective:Final assessment to certify skills in data engineering and related tech-


nologies.

TotalDuration
• Instructor-LedTraining(ILT):328hours

• Assessments:23hours

• ProjectWork:96hours

• Evaluations:40hours

• Total:487hours

KeyFeaturesofILTMode
• InstructorGuidance:Allmodulesaretaughtbyexperiencedinstructors.

• Hands-onPractice:Labsandexercisesareconductedunderinstructorsupervi-
sion.

• Real-timeDoubtClarification:Immediateresolutionofqueriesduringsessions.

• CollaborativeLearning:Groupactivitiesanddiscussionstoenhanceunder- standing.

You might also like