0% found this document useful (0 votes)

112 views6 pages

Mastering Databricks Data Engineering-AWS-Azure

Uploaded by

sivasanni03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views6 pages

Mastering Databricks Data Engineering-AWS-Azure

Uploaded by

sivasanni03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Mastering Databricks Data Engineering using AWS & Azure

Introduction to Big Data and Hadoop

• What is Big Data?
• What is Hadoop?
• What is Spark?
• What are NoSQL Databases?
• Difference Between Hadoop and Spark
• Common Big Data Problems
• Hadoop Ecosystem

AWS Introduction (40 Hours)

EC2
• Create Windows/Mac/Linux Servers
• Create a Sample Website
• Autoscaling
• Create and Use AMIs

Athena
• What is Serverless Computing?
• Process JSON and CSV Data with Athena
• Recommended Approaches

Sreyobhilashi IT | WhatsApp me at +91-9247159150

S3
• Store Data in S3
• Submit Commands in Client Mode
• Get Data from Various Sources and Store in S3
• S3 Bucket Policies

RDS
• Create Different Databases
• Create Sample Tables and Process Data
• Best Practices for Cost Optimization
• Practice Oracle and MySQL Using RDS

EMR
• Practice PySpark and Hive
• Create EMR Clusters and Process Data
• EMR vs EC2
• Hive Internals and Sample Programs
• Import Data from RDS to S3 Using Sqoop

Lambda & Boto3

• Access AWS Resources Using Boto3 from PyCharm
• Use Boto3 in Lambda Functions
• Integrate Lambda with Glue and Redshift
• Connect Boto3 with Services Like EC2, EMR, Glue, Redshift

CloudWatch
• How to Monitor Resources
• Debugging Application Failures
• Autoscaling Based on CloudWatch Metrics
• Usage Across AWS Services (EC2, RDS, Glue)

IAM (Identity and Access Management)

• Users, Groups, and Roles
• Custom Policies
• Importance of IAM Keys in Snowflake, Databricks, PyCharm Use Cases

Redshift
• Load and Process Data from S3
• SortKey and DistKey Optimization
• Redshift Architecture
• Compare Snowflake vs Redshift

Glue
• Process CSV and JSON Data Using Glue
• Retrieve Data from Athena Using Glue

Sreyobhilashi IT | WhatsApp me at +91-9247159150

• Use Crawlers and Execute PySpark/Scala Jobs
• Glue Architecture and Best Practices

Introduction to Spark

Spark Core
• Why Use Spark Instead of Hadoop?
• Importance of HDFS/YARN in Spark
• Spark Architecture
• Types of APIs: RDD, DataFrame, Dataset
• Use Cases for Spark
• Why Spark is Faster Than MapReduce
• In-Memory Processing in Spark

RDD Internals
• Properties of RDD: Immutability, Laziness, Fault Tolerance
• SparkContext, SQLContext, SparkSession Internals
• Create RDDs in Different Ways
• Transformations and Actions
• Debugging Transformations
• Spark Web UI

RDD Hands-On
• Map, FlatMap, Filter, Distinct
• ReduceByKey vs GroupByKey
• Spark-submit Examples
• 20 RDD Use Case Programs

Spark SQL
• Convert RDD to DataFrame
• Python DataFrame vs Spark DataFrame
• DataFrame Reader
• Processing Data in Different Formats: CSV, JSON, XML, Avro, ORC, Text, Parquet
• Database Integration: Oracle, MySQL, Sqoop vs Spark
• NoSQL Integration: HBase, Cassandra, MongoDB

PySpark Advanced Concepts

• Dataset API Importance
• Spark Memory Management
• Resource Optimization
• Spark Debugging with Client Mode and Web UI
• Automate Spark with Oozie and Airflow
• Spark-Snowflake Integration

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Spark Streaming

Introduction to Spark Streaming

• Micro-Batch vs Stream Processing
• D-Stream API Internals
• Live Data Processing

Structured Streaming
• Real-World Examples
• Integration with Kafka
• Log Analysis
• Export to Databases
• Snowflake Integration

Apache Kafka
• Kafka Architecture
• Producer and Consumer APIs
• Integration with Spark
• End-to-End Workflow with AWS, Azure, Databricks, and Cloudera

Apache NiFi
• NiFi Internals
• Data Flow Examples (Local to S3, API to S3)
• Integration with Kafka and Spark
• Templates & most frequently used processors

Apache Airflow
• Airflow Installation in EC2
• Data Pipeline Creation
• DAG Management
• Airflow-Spark-Snowflake Integration

Introduction to Databricks
• Databricks vs Spark vs Snowflake
• Databricks Architecture
• Working in Databricks Workspace
• Using Databricks Notebooks

Databricks File System (DBFS)

• What is DBFS?
• DBFS Commands (mkdirs, cp, mv, head, put, rm, rmdir)
• Magic Commands (sh, fs, scala, python)

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Databricks Utilities
• Credentials Utility
• FileSystem Utility
• Notebook Utility
• Secrets Utility
• Widgets Utility

Databricks Cluster Management

• Creating and Configuring Clusters
• Managing Clusters
• Starting, Terminating, and Deleting Clusters
• Cluster Information and Logs
• Types of Clusters: All-Purpose, Job Clusters
• Cluster Modes: Standard, High Concurrency, Autoscaling

Azure Overview
• Azure Databricks
• Azure VM & HDInsight vs EMR
• Azure Data Lake Storage (ADLS)
• Azure Blob Storage vs S3
• Azure SQL Database vs RDS
• Azure Active Directory vs IAM
• Azure Data Explorer
• Azure Stream Analytics vs SnowPipe
• Event Hub vs Kafka
• Azure Data Factory for Data Integration
• Azure Synapse vs Snowflake

Databricks Integration
• Integration with Azure Services:
• Blob Storage,
• Data Lake Storage Gen2,
• SQL Database, Synapse,
• Key Vault
• Triggers

Databricks Streaming API

• Introduction to Streaming
• Handling Bad Records, Regular Expression
• Streaming Data into Gen2 Lake and Tables

Databricks Lakehouse (Delta Lake)

• Data Lake vs Delta Lake

Sreyobhilashi IT | WhatsApp me at +91-9247159150

• Delta Lake Best Practices
• Delete, Update, Alter Tables
• Optimization Steps
• Handling SCD (Type 1 & Type 2)
• Deduplication and Streaming Data Handling

Databricks Unity Catalog

• Create Schema and Table Using Unity Catalog
• Access Controls, User Management, and Metastore
• Row-Level Access Control
• Masking Columns
• Roles, Users, and Groups
• Managing External Tables
• Lakehouse Federation

Databricks Workflows
• Introduction to Workflows
• Creating, Running, and Managing Jobs
• Scheduling and Monitoring Jobs
• Create Dependency Between Multiple Jobs

Delta Live Tables

• Introduction to Delta Live Tables
• Creating and Configuring Delta Pipelines
• Real-Time Streaming with Delta Live Tables
• Error Handling and Recovery in Delta Live Tables
• Delta Live Tables Best Practices

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Venu Data Engineering Training in Hyderabad 1
No ratings yet
Venu Data Engineering Training in Hyderabad 1
8 pages
Azure Databricks Documentation
100% (1)
Azure Databricks Documentation
7,197 pages
Cloud Data Engineering
No ratings yet
Cloud Data Engineering
2 pages
Azure de and Fabric de Full Edited
No ratings yet
Azure de and Fabric de Full Edited
7 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
Comprehensive Azure SQL Training Guide
No ratings yet
Comprehensive Azure SQL Training Guide
6 pages
Big Data Training in Chennai - Big Data Course in Chennai
No ratings yet
Big Data Training in Chennai - Big Data Course in Chennai
1 page
TB-Data Engineering - Syllabus-2024
No ratings yet
TB-Data Engineering - Syllabus-2024
4 pages
Big Data Masters Program Curriculum
No ratings yet
Big Data Masters Program Curriculum
14 pages
Azure Data Engineering Course Interview Questions 1751484980
No ratings yet
Azure Data Engineering Course Interview Questions 1751484980
20 pages
Data Engineering
No ratings yet
Data Engineering
15 pages
Data Bricks S
No ratings yet
Data Bricks S
18 pages
Toc D&a Azure Aws
No ratings yet
Toc D&a Azure Aws
12 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
No ratings yet
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
193 pages
DE Python
No ratings yet
DE Python
11 pages
Azure Databricks
No ratings yet
Azure Databricks
5 pages
Azure Data Engineer Road Map
No ratings yet
Azure Data Engineer Road Map
8 pages
GCP Data Engineer Curriculum
No ratings yet
GCP Data Engineer Curriculum
7 pages
Data Engineering Skills Guide
100% (1)
Data Engineering Skills Guide
5 pages
Ultimate Data Engineering Masters Program v1
No ratings yet
Ultimate Data Engineering Masters Program v1
10 pages
Course Content
No ratings yet
Course Content
13 pages
Course Handout - 21CSE372P - Mastering Cloud Data Services and Analytics With AWS, Azure, and GCP - VF-1
No ratings yet
Course Handout - 21CSE372P - Mastering Cloud Data Services and Analytics With AWS, Azure, and GCP - VF-1
18 pages
Python and Pyspark With Databricks, With Azure Project
No ratings yet
Python and Pyspark With Databricks, With Azure Project
9 pages
Day 1
No ratings yet
Day 1
10 pages
DP 900 Day 4
No ratings yet
DP 900 Day 4
40 pages
Data and Analytics Syllabus
No ratings yet
Data and Analytics Syllabus
4 pages
Pyspark TOC - 24 Hours
No ratings yet
Pyspark TOC - 24 Hours
2 pages
2525872-Azure Data Engineering
No ratings yet
2525872-Azure Data Engineering
11 pages
Data Engineering Databricks
No ratings yet
Data Engineering Databricks
139 pages
Bigdata Engineer Complete Syllabus: Presented by
No ratings yet
Bigdata Engineer Complete Syllabus: Presented by
21 pages
Azure Databricks - An Introduction 2019 Roadshow
No ratings yet
Azure Databricks - An Introduction 2019 Roadshow
13 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Databricks Platform & Workspace Guide
No ratings yet
Databricks Platform & Workspace Guide
131 pages
Databricks Guide
No ratings yet
Databricks Guide
31 pages
Azure Databricks: A Hands-On Guide
No ratings yet
Azure Databricks: A Hands-On Guide
36 pages
Big Data & Hadoop Training Guide
No ratings yet
Big Data & Hadoop Training Guide
3 pages
Ude My For Business Course List New
No ratings yet
Ude My For Business Course List New
64 pages
Azure Development Course
No ratings yet
Azure Development Course
10 pages
Data Engineering Roadmap
No ratings yet
Data Engineering Roadmap
2 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Azure Data Engineering
No ratings yet
Azure Data Engineering
6 pages
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
3 pages
Real-Time Data Analytics Guide
100% (2)
Real-Time Data Analytics Guide
30 pages
Data Engineer's Career Portfolio
No ratings yet
Data Engineer's Career Portfolio
6 pages
B2. Introduction To Big Data With Spark and Hadoop - Coursera
No ratings yet
B2. Introduction To Big Data With Spark and Hadoop - Coursera
12 pages
Spark Development for Developers
No ratings yet
Spark Development for Developers
172 pages
TA3 Big Data Analytics
No ratings yet
TA3 Big Data Analytics
13 pages
Big Data Technologies Presentation
No ratings yet
Big Data Technologies Presentation
10 pages
Ravindra Gude Senior Data Engineer
No ratings yet
Ravindra Gude Senior Data Engineer
6 pages
Data Engineering Brochure FXSr63lN9T
No ratings yet
Data Engineering Brochure FXSr63lN9T
14 pages
Data Engineering Agenda
No ratings yet
Data Engineering Agenda
19 pages
Complete Spark & Azure Databricks Interview Guide - Claude
No ratings yet
Complete Spark & Azure Databricks Interview Guide - Claude
46 pages
Databricks Certified Data Engineer Associate Course V2 Release
No ratings yet
Databricks Certified Data Engineer Associate Course V2 Release
300 pages
iphone_testing
No ratings yet
iphone_testing
12 pages
Test_Datamanagement
No ratings yet
Test_Datamanagement
15 pages
release_based_testing
No ratings yet
release_based_testing
14 pages
AURA PRIDE PRICELIST - PDF UPD
No ratings yet
AURA PRIDE PRICELIST - PDF UPD
2 pages
Test_matrix
No ratings yet
Test_matrix
12 pages
Performance Testing With JMeter
No ratings yet
Performance Testing With JMeter
5 pages
Vijaysql
No ratings yet
Vijaysql
279 pages
SQL For SDET
No ratings yet
SQL For SDET
60 pages
Mastering POM
No ratings yet
Mastering POM
12 pages
MSSQL Tester JD
No ratings yet
MSSQL Tester JD
1 page
QA Lead JD
No ratings yet
QA Lead JD
2 pages
Face Recognition Attendance System
No ratings yet
Face Recognition Attendance System
8 pages
R - KES3 Define Characteristic Hierarchy
No ratings yet
R - KES3 Define Characteristic Hierarchy
21 pages
SGDM User's Manual
100% (1)
SGDM User's Manual
613 pages
Learning Tools Interoperability (LTI) Engine 20.1.x Documentation
No ratings yet
Learning Tools Interoperability (LTI) Engine 20.1.x Documentation
1 page
ACL TOP 500 Specifications Sheet
No ratings yet
ACL TOP 500 Specifications Sheet
1 page
REVTEX 4.1 Formatting Guide
No ratings yet
REVTEX 4.1 Formatting Guide
6 pages
SSRN Id4166233
No ratings yet
SSRN Id4166233
30 pages
APNA-380 Instruction Manual (E)
No ratings yet
APNA-380 Instruction Manual (E)
179 pages
FANUC System Variables Guide
No ratings yet
FANUC System Variables Guide
810 pages
CLOUD COMPUTING LAB MANUAL V Semester
No ratings yet
CLOUD COMPUTING LAB MANUAL V Semester
63 pages
Circuit
No ratings yet
Circuit
6 pages
6th Computer Science Eng Version
No ratings yet
6th Computer Science Eng Version
2 pages
S.E.RTS-Chapter 3
No ratings yet
S.E.RTS-Chapter 3
18 pages
Connecting To Your Database
No ratings yet
Connecting To Your Database
422 pages
Cloud Computing for B.Tech Students
No ratings yet
Cloud Computing for B.Tech Students
72 pages
CodeWarrior Development Studio Common
No ratings yet
CodeWarrior Development Studio Common
306 pages
Stacks Notes With Programs
No ratings yet
Stacks Notes With Programs
2 pages
Road Map For Digital Design and Construction
No ratings yet
Road Map For Digital Design and Construction
20 pages
Stephanie Wu - Resume
No ratings yet
Stephanie Wu - Resume
1 page
AWS +devops Fresher Resume Format
No ratings yet
AWS +devops Fresher Resume Format
2 pages
Documentum Kerberos Sso WP
No ratings yet
Documentum Kerberos Sso WP
0 pages
Steel Detailing Guide for Engineers
100% (1)
Steel Detailing Guide for Engineers
43 pages
Save Text To PDF Extension - Opera Add-Ons
No ratings yet
Save Text To PDF Extension - Opera Add-Ons
1 page
PowerShell - Working With Providers-F22
No ratings yet
PowerShell - Working With Providers-F22
24 pages
Ridham's CV
No ratings yet
Ridham's CV
1 page
Cloud Computing Challenges & Overview
No ratings yet
Cloud Computing Challenges & Overview
9 pages
Group Decision Support Systems (GDSS) : Sumit Ghunake (M.B.A.-I) Presented by
No ratings yet
Group Decision Support Systems (GDSS) : Sumit Ghunake (M.B.A.-I) Presented by
10 pages
Spare Parts Provisioning
100% (3)
Spare Parts Provisioning
47 pages
Source Code Documentation
100% (2)
Source Code Documentation
338 pages
Bit Resume
No ratings yet
Bit Resume
2 pages

Mastering Databricks Data Engineering-AWS-Azure

Uploaded by

Mastering Databricks Data Engineering-AWS-Azure

Uploaded by

Mastering Databricks Data Engineering using AWS & Azure

Introduction to Big Data and Hadoop

AWS Introduction (40 Hours)

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Lambda & Boto3

IAM (Identity and Access Management)

Sreyobhilashi IT | WhatsApp me at +91-9247159150

PySpark Advanced Concepts

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Introduction to Spark Streaming

Databricks File System (DBFS)

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Databricks Cluster Management

Databricks Streaming API

Databricks Lakehouse (Delta Lake)

Sreyobhilashi IT | WhatsApp me at +91-9247159150

Databricks Unity Catalog

Delta Live Tables

Sreyobhilashi IT | WhatsApp me at +91-9247159150

You might also like