AWS Cloud & Python Essentials
AWS Cloud & Python Essentials
AWS IAM
Unique email address for this account Check if you can use a
dynamic alias with an existing
email address
john@gmail.com
john+ACCOUNT-ALIAS-1@gmail.com
john+ACCOUNT-ALIAS-2@gmail.com
AWS account name / alias
Cloud-based IDEs
• Accessible from any device with internet
• Scales with project needs (CPU, memory, storage)
• Collaborative features for real-time editing and sharing
• Pre-configured with essential tools
• Integrated with AWS services for seamless deployment
AWS Cloud9
RunInstances
CLI
CreateUser
Identity- Resource-
based policy based policy
API AWS determines whether IAM
to authorize the request
(allow/deny)
© Digital Cloud Training | https://digitalcloud.training
Setup Individual User
Account
Customized AMI
© Digital Cloud Training | https://digitalcloud.training
Benefits of Amazon EC2
• Elastic computing – easily launch hundreds to thousands of
EC2 instances within minutes
• Complete control – you control the EC2 instances with full
root/administrative access
• Flexible – Choice of instance types, operating systems, and
software packages
• Reliable – EC2 offers very high levels of availability and
instances can be rapidly commissioned and replaced
• Secure – Fully integrated with Amazon VPC and security
features
• Inexpensive – Low cost, pay for what you use
Public subnet
Security group
AWS Management
Console EC2 Instance Internet Gateway
EBS Volume
Admin
Data is stored on an EBS volume
(virtual hard drive) A Security Group controls inbound
and outbound traffic Admin connects to EC2 Instance
over the Internet
All operating systems Windows only All operating systems All operating systems
Port 22 must be open Port 3389 must be Port 22 must be open No ports must be open
open
Anyone with key pair can Need user name and IAM access control via IAM access control via
access instance password to login policy policy
Public subnet
AWS CLI configured
with access keys
Private subnet
IAM Role
S3 Bucket EC2 Instance
Private subnet
Policy
The policy determines
the access permissions
VPC
The VPC router takes
care of routing within the
Availability Zone
VPC and outside of the
Public subnet VPC
Subnets are Main Route Table
created within
AZs Destination Target
EC2 Instance
10.0.0.0/16 Local
0.0.0.0/0 igw-id
Availability Zone
Router Internet
Private subnet gateway
The route table is used
to configure the VPC
You can launch EC2 instances into router
EC2 Instance
your VPC subnets
An Internet Gateway is
attached to a VPC and used
to connect to the Internet
VPC VPC
CIDR 10.0.0.0/16 CIDR 10.1.0.0/16
of isolated resources
Internet Gateway/Egress- The Amazon VPC side of a connection to the public Internet for IPv4/IPv6
only Internet Gateway
Router Routers interconnect subnets and direct traffic between Internet gateways, virtual private
gateways, NAT gateways, and subnets
Peering Connection Direct connection between two VPCs
VPC Endpoints Private connection to public AWS services
NAT Instance Enables Internet access for EC2 instances in private subnets managed by you)
NAT Gateway Enables Internet access for EC2 instances in private subnets (managed by AWS)
Virtual Private Gateway The Amazon VPC side of a Virtual Private Network (VPN) connection
Customer Gateway Customer side of a VPN connection
AWS Direct Connect High speed, high bandwidth, private network connection from customer to aws
Security Group Instance-level firewall
Network ACL Subnet-level firewall
• You can launch your AWS resources, such as Amazon EC2 instances,
into your VPC
Region
Availability Zone
Internet
Private Route table Main Route table Private Route Table
Private subnet Public subnet gateway
10.0.4.0/24 10.0.2.0/24 Destination Target
10.0.0.0/16 Local
An object is a file
you upload
Object
https://mybucket.s3-website-us-west-1.amazonaws.com
The object name is the Key the
mybucket data is the Value
https://mybucket.s3.us-west-1.amazonaws.com/Orders.xlsx
Orders.xlsx
https://mybucket.s3.us-west-1.amazonaws.com/Project+Alpha/Sales+Proposal.docx
Sales Proposal.docx
https://mybucket.s3.us-west-1.amazonaws.com/Project+Alpha/Project+Brief.docx
Project Brief.docx
Private Connection
Amazon S3
User browses with a VPC
standard web browser S3 Gateway Endpoint
Public subnet
EC2 Instance
Application
Application uses REST API
programmatically
Python Fundamentals
5X-10X
3x-5X
Comment for
your future self
float bool
Integer String
Represent whole numbers. Strings are sequences of
character data.
Assignment Arithmetic
Can be used to
Operators used to assign values to variables. concatenate strings.
Comparison
Compare values and return if the comparison is true or Logical
false. Can be used to combine conditional
statements.
Membership
Can be used to test
whether a value is a
member of a
sequence, like strings,
lists, or tuples.
What is a Function?
• A block of organized and reusable code that
performs a specific task
• Functions help break a program into smaller,
more manageable pieces
• Code reuse and modularity
• Essential for code that is:
• Efficient
• Maintainable
• Scalable
• Built-in & user-defined functions
• Think of functions as factories
Function
Inputs Output
© Digital Cloud Training | https://digitalcloud.training
Lists
• Dictionary methods
• keys()
• values()
© Digital Cloud Training | https://digitalcloud.training
Loops
create_fleet()
filter_log_events()
upload_file() put_item()
Resource Client
• Higher-level, object-oriented API • Use when you need fine-grained control
• Represents AWS service objects (e.g., S3 and access to all available operations
bucket, EC2 instance) • Lower-level, service-oriented API
• Allows actions directly on objects • Provides access to all AWS service actions
• Use when you prefer a more Pythonic and
concise interface
• Previous examples had their own client or resource • You can create multiple VPCs
• VPC is created with EC2 client or resource • Helpful for managing separate environments
• Methods used with VPCs include: • Creating and tagging a VPC
create_vpc(CidrBlock='10.0.0.0/16')
create_tags(**kwargs)
AWS Cloud
Create
Create Internet
Subnet Gateway MY VPC PROD
Create Route 10.0.0.0/16
Create VPC Create Route
Table
EC2 Resource/Client
Managed Service
Amazon RDS takes care of the tedious database Use Cases
administration tasks such as:
• Backups
• Software Patching
• Monitoring E-commerce platforms Customer relationship
• Scaling management apps
Aurora Serverless V2
• Supports all features of Aurora
• Suitable for production environments
• Instant scaling to hundreds-of-thousands of transactions per
second
• Scales in increments of 0.5 ACUs
• Cannot cut capacity to zero
• Does not support the query editor
• About $0.12 an hour per ACU Aurora Serverless
V2
© Digital Cloud Training | https://digitalcloud.training
SECTION 5
Real-time file
Resize
processing
images
Scales in
Highly available
response to
infrastructure Only pay for the Parse/filter
each trigger
compute time
and number of
requests that you
© Digital Cloud Training | https://digitalcloud.training consume
Lambda & Python
Food Delivery App Example
Why Lambda is important
• More efficient code execution Pricing Lambda
• Lambda's serverless nature lets developers focus • Compute and request charges • 3 million requests per month
• Free Tier Usage includes • 120 milliseconds per request
more on their code and less on infrastructure
• 400,000 GB-seconds • Configured with 1.5 GB of
management • 1 million requests per month memory, on an x86 based
• Quicker deployment times • x86 based processor price: processor
$0.0000166667 USD per GB-second
Pandas Pillow
• Requests: $0.20 USD per 1M
Why Python?
• Readability Compute Charges:
• Extensive libraries 3 million * 120 milliseconds = 360,0000 seconds
• Python simplicity + power of Lambda requestsseconds * 1.5 GB = 540,000 GB-s
360,000
• Robust, scalable serverless applications 540,000 GB-s - 400,000 GB-s (Free Tier) = 140,000 GB-s
140,000 GB-s * $0.0000166667 = $2.33
Example of services used with
Lambda in this course Request Charges: Total cost to run Lambda:
3M requests – 1M (free tier) = 2M $2.73 USD per month
2M x $0.2/M = $0.40
Cost-
Effectiveness
© Digital Cloud Training | https://digitalcloud.training
Automating EC2 Backups with Lambda
Scenario
Your EC2 instance contains crucial data that would be
very costly to lose.
Solution
Best practice: Create backups at regular intervals to
prevent any data loss in case of instance failures or
errors.
Task:
Create a lambda function which creates a snapshot of our
EC2 instance at regular intervals.
EventBridge
Scheduler
Prerequisites:
• Create ‘LambdaEC2DailySnapshot’ lambda function
• Download lambda function into your Cloud9
environment
• Create or have an EC2 instance ready to use
LambdaEC2DailySnapshot
IAM
Permissions
Efficiency
• Delete • Put
• DeleteMarkerCreated • Post
• Copy
© Digital Cloud Training | https://digitalcloud.training
Automating S3 with Lambda: Real-time Data
Validation HOL
Prerequisites: BillingBucketParser
1. Boilerplate (default) Lambda function ‘BillingBucketParser’
downloaded in Cloud9 environment.
2. Two S3 buckets with default settings.
a) ‘dct-billing-x’ for uploading csv files.
b) ‘dct-billing-errors-x’ for error files.
3. Download the provided csv test files.
dct-billing-x dct-billing-errors-x
Tasks:
1. Add necessary permissions for the Lambda role.
2. Write the Lambda Python code.
3. Set up the S3 trigger.
Overview:
Configure a Lambda function that’s triggered whenever a new billing CSV file is uploaded Tasks:
to an S3 bucket. The function will read the file, convert any billing amount not in USD 1.Write and test the Lambda Python code in Cloud9 IDE.
into USD, and insert the records into an Aurora Serverless V1 database. 2.Add necessary permissions to the Lambda execution role.
1. Access to S3 to read the input CSV file.
Prerequisites: 2. Access to RDS to write data to Aurora Serverless.
1. Create an Aurora Serverless V1 database cluster 3. Access to Secrets Manager to retrieve database
a) Create the DB cluster we generated in the “RDS Hands-on Lab” from the credentials.
“AWS Services using Python” section of this course 3.Set up the S3 bucket trigger to invoke the Lambda function
2. Boilerplate (default) Python Lambda function 'BillingDataConversionAndIngestion’ whenever a new CSV file is uploaded.
downloaded into Cloud9 environment. 4.Test the Lambda function by uploading a CSV file to the 'dct-
3. An S3 bucket with default settings for uploading CSV files: 'dct-billing-processed-x’. billing-processed-x' S3 bucket and verify the data insertion in
4. Download the provided CSV test files. Aurora Serverless database.
AWS Cloud
Permissions
BillingDataConversionAndIngestion
Example Scenario:
• You have several Elastic IPs that are not currently associated with
an instance and result in unnecessary costs Lambda
• A Lambda function
• checking for any unassociated Elastic Ips EventBridge
• Releasing them to save costs
• Triggered by EventBridge every day
• This demonstrates how we can automate and streamline VPC
operations and tasks using Lambda
Scenario:
Your company regularly uses Elastic IPs for EC2 instances but at times some of
these IPs are left unassociated and result in unnecessary costs. Your task is to
automate the process of checking for these unassociated Elastic IPs and releasing AWS Cloud
them.
Permissions
Overview:
We’ll create a Lambda function that is triggered daily by EventBridge. The Lambda
function will check for any unassociated Elastic IP addresses in our VPC and release
them. This way, we ensure we don't incur unnecessary costs from unused Elastic EventBridge ManageEIPs
IPs.
Prerequisites:
1. Boilerplate (default) Python Lambda function 'ManageEIPs’ downloaded into
Cloud9 environment.
EIP 1 EIP 3
Tasks:
1. Create three EIPs and one EC2 instance. Associate one of the EIPs with the EIP 2
EC2 instance
2. Write and test the Lambda Python code in Cloud9 IDE.
3. Upload the local Cloud9 ‘ManageEIP’ version of our function to Lambda
4. Add the necessary EC2 permissions to the Lambda execution role.
5. Set up the EventBridge trigger and test it
Order Tracking
System
Warehouse Mgmt.
System
What is SQS? • A fully managed message queuing service that enables you AWS Cloud
to decouple and scale DispatchItem
• Microservices Topic
• Distributed systems
• Serverless applications
• Eliminates the complexity and overhead associated with
managing and operating message-oriented middleware
• Stores messages on multiple servers for redundancy and to
OrderProcessingQueue
ensure message durability
Warehouse Mgmt.
System
OrderConfirmationProcessor
Scenario:
Your company has a system for validating CSV billing files stored in an S3 bucket for North American customers. In order to get real-time international tax
data, you need to make a 3rd party API call. Occasionally, errors might occur during calls to a 3rd party service, which will prevent the billing file from being
processed correctly. Your task is to automate the process of handling these errors, notifying the relevant employees, and ensuring error-related
information is stored for further investigation.
Overview:
We’ll modify the finalized version of the “Automating S3 with Lambda: Real-time Data Validation HOL” by creating an additional lambda
‘RetryBillingParser’ function, an SNS topic, and an SQS queue. If there is an error in the ‘BillingBucketParser’ function’s mock API call to a 3rd party service
‘BillingBucketParser’ publishes to the SNS topic. An email is sent will be sent out and a subscribed SQS queue triggers the ‘RetryBillingParser’ function. This
function re-attempts the data validation.
Prerequisites:
• Completed Automating S3 with Lambda: Real-time Data Validation HOL
• Boilerplate Python Lambda functions 'RetryBillingParser’
• S3 bucket ‘dct-billing-processed’
Move file to
error bucket
dct-billing-errors
Yes
Parse Billing
Data Error
Files
dct-billing BillingBucketParser No
Do Nothing
AWS Cloud
No Move file to
SQS Queue error bucket
SNS
RetryBillingParser dct-billing-
Topic
errors
Yes
dct-billing BillingBucketParser
No
Move file to
processed
bucket dct-billing-
processed
Mock 3rd
Party API Call API
• Glue Crawlers connect to your source data, extract metadata and create table Table x
definitions in the Data Catalog Table y
• Securing access to your data is crucial. Lake Formation simplifies the process of setting Table z
up, securing, and managing data Crawler Glue Data
• Lake Formation ensures that only those users and services (such as AWS EMR) with Catalog
granted permissions can access the data in your Data Catalog
Scenario:
• Consider a company that holds vast amounts of data across several S3 buckets. By
using Glue Crawlers, they can automate the process of extracting, transforming, and
loading this data for analytics
Lake Formation
• With Lake Formation, they can secure their Data Catalog, making sure only authorized User
users have access to specific data
Scenario: Prerequisites:
Your company processes a multitude of transaction files daily which detail Download the provided files: S3 buckets:
billing, products sold, and their quantities. This data is stored in different • emr_software_settings.json • dct-billing-processed
CSV files within S3. There's a need to consolidate this data, analyze it, and • billing_data_dairy_07_2023.csv • dct-billing-data-lake-x
derive insights like the gross profit. You are tasked with automating the data • units_sold_07_2023.csv • units-sold/
processing and analytics pipeline using AWS Glue and EMR. • production_costs_07_2023.csv • production-costs/
• reports/
• pyspark_script/
Overview:
1. Store the raw CSV data in S3 buckets
2. Use Glue Crawlers to catalog this data
3. Spin up an EMR cluster, which we’ll use to run a PySpark script
a) This script will read the data from the Glue catalog and
determine the gross profit for each product sold
b) Store gross profit results back into S3 for reporting units_sold_07_2023.csv
production_cost_07_2023.csv
billing_data_dairy_07_2023.csv
AWS Cloud
Crawler Billing
Crawlers
Populates Units Sold
parse csv data
Data Catalog Production Costs
dct-billing-processed
Glue Crawlers Glue Data Catalog
dct-billing-data-lake Generate
Gross Profit Process Data
units-sold/ reports/
Report
production-costs/ pyspark-script/ EMR Cluster
IAM Lake
Formation
AWS Cloud
No Move file to
SQS Queue error bucket
SNS
RetryBillingParser dct-billing-
Topic
errors
Yes
dct-billing BillingBucketParser
No
Move file to
processed
bucket dct-billing-
processed
Mock 3rd
Party API Call API
• Glue Crawlers connect to your source data, extract metadata and create table Table x
definitions in the Data Catalog Table y
• Securing access to your data is crucial. Lake Formation simplifies the process of setting Table z
up, securing, and managing data Crawler Glue Data
• Lake Formation ensures that only those users and services (such as AWS EMR) with Catalog
granted permissions can access the data in your Data Catalog
Scenario:
• Consider a company that holds vast amounts of data across several S3 buckets. By
using Glue Crawlers, they can automate the process of extracting, transforming, and
loading this data for analytics
Lake Formation
• With Lake Formation, they can secure their Data Catalog, making sure only authorized User
users have access to specific data
Scenario: Prerequisites:
Your company processes a multitude of transaction files daily which detail Download the provided files: S3 buckets:
billing, products sold, and their quantities. This data is stored in different • emr_software_settings.json • dct-billing-processed
CSV files within S3. There's a need to consolidate this data, analyze it, and • billing_data_dairy_07_2023.csv • dct-billing-data-lake-x
derive insights like the gross profit. You are tasked with automating the data • units_sold_07_2023.csv • units-sold/
processing and analytics pipeline using AWS Glue and EMR. • production_costs_07_2023.csv • production-costs/
• reports/
• pyspark_script/
Overview:
1. Store the raw CSV data in S3 buckets
2. Use Glue Crawlers to catalog this data
3. Spin up an EMR cluster, which we’ll use to run a PySpark script
a) This script will read the data from the Glue catalog and
determine the gross profit for each product sold
b) Store gross profit results back into S3 for reporting units_sold_07_2023.csv
production_cost_07_2023.csv
billing_data_dairy_07_2023.csv
AWS Cloud
Crawler Billing
Crawlers
Populates Units Sold
parse csv data
Data Catalog Production Costs
dct-billing-processed
Glue Crawlers Glue Data Catalog
dct-billing-data-lake Generate
Gross Profit Process Data
units-sold/ reports/
Report
production-costs/ pyspark-script/ EMR Cluster
IAM Lake
Formation
Python, coupled with AWS Lambda, provides a powerful Automated Security Audit System
platform for automating security tasks. Performs automatic audits of your AWS resources for
potential security issues. Such as Security Group rules
For example, if Security Hub detects an unauthorized that allow traffic from any IP address (0.0.0.0/0).
access attempt, a Lambda function could automatically
strengthen security group rules, revoke IAM credentials, AWS Cloud
or isolate affected resources.
EventBridge
Do an SGs allow
Yes traffic from any IP
address?
Task:
Create a local python script with the help of Copilot that processes ‘Units Sold’ files and pushes them to S3
Prerequisites:
1. GitHub account
6. ‘dct-billing-data-lake-x’ bucket
6. ‘units-sold’ subfolder
Script
Processes Files