0% found this document useful (0 votes)

118 views35 pages

AWS Cloud Interview Prep Guide

Uploaded by

aonoodp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views35 pages

AWS Cloud Interview Prep Guide

Uploaded by

aonoodp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

AWS Cloud Interview Guide

Cloud With Raj

www.cloudwithraj.com

Table of Contents Raj's Bio

- AWS Services for Interviews - Principal SA at (5.5+ Years)
- Microservice with ALB - 20+ Years of IT experience
- Event Driven Architecture (Basic) - Designed and implemented multiple world-scale
- Microservice Vs Event Driven projects with official AWS blogs on them
- Event Driven Architectures (Advanced) - Trained students get SA jobs via SA Bootcamp
- Three-Tier Architecture - Bestselling author (60,000+ paid students)
- Availability Zone & Data Center - Presented highly-rated talks at major events
- Lambda Cheaper Than EC2? - LinkedIn Top Systems Design Voice
- RPO Vs RTO
- IP Address Vs URL
- DevOps Phases Lighthouse projects designed : Dr. B. Covid Vaccine
- CI Vs CD Vs CD Registration, Collins Flight Systems used by 100+
- Kubernetes Tools Landscape airlines, Freddie Mac Datacenter to AWS Migration, and
- Traditional CICD Vs GitOps more
- Platform Vs Developer Team
- Scaling EC2 Vs Lambda Vs EKS Please follow me on socials:
- Kubernetes (EKS) Scaling
- Gen AI Layers
- Prompt Engineering Vs RAG Vs Fine Tuning
- RAG with Bedrock
- EKS Upgrade With Karpenter
- Container Lifecyle - Local to Cloud
- Gen AI Multi Model Invocation
- Git Workflow
- DevSecOps Workflow
- What Happens When You Type an URL
- AWS Well Architected Framework
- AWS Migration Tools
- Multi-site Active Active DR
- Kubernetes Node Pod Container Relationship
- Running Batch Workloads on AWS
- STAR Interview Format
- Hybrid Cloud Architecture
- How Karpenter Saves Money

More To Be Added...
Important AWS Services for Interviews
Cloud With Raj
www.cloudwithraj.com

Compute

EC2 Lambda Elastic Kubernetes Service ECS Auto Scaling

Storage

S3 EBS RDS DynamoDB ElastiCache

Network

VPC Load Balancer API Gateway AWS Global Infrastructure

(Region, AZ)

Security

KMS IAM WAF Shield GuardDuty Secrets Config

Manager

Gen AI

BedRock Q Partyrock SageMaker

Migration

DMS Migration Hub Application Migration Application Discovery

Service Service

Event Driven

SNS SQS EventBridge Step Functions

Observability

CloudWatch CloudTrail X-Ray

Cost Optimization

Compute CloudWatch Cost Explorer Budget Spot Instance Reserve Savings Plan
Optimizer Insights Instance
Reporting

Analytics

Glue EMR Athena QuickSight Kinesis

DevOps

CloudFormation
Microservices with ALB
Cloud With Raj
www.cloudwithraj.com

Domain: cloudwithraj.com ALB

(url: cloudwithraj.com)
Path Based Routing

/browse /buy /* (Catch all)

Target Group 1 Target Group 2 Target Group 3
Auto Scaling Group

EC2 EC2 EKS

Lambda
(Running Code) (Scaled Up)
(Handles traffic of anything
(Handles traffic of (Handles traffic of
else)
cloudwithraj.com/browse) cloudwithraj.com/buy)

Database Database Database

Amazon Aurora Amazon DynamoDB Amazon Aurora

Microservice 1 Microservice 2 Microservice 3

Event Driven Architecture (Basic)
Cloud With Raj
www.cloudwithraj.com

An event-driven architecture decouples the producer and processor. In this

example producer (human) invokes an API, and send information in JSON
payload. API Gateway puts it into an event store (SQS), and the processor
(Lambda) picks it up and processes it. Note that, the API gateway and Lambda
can scale (and managed/deployed) independently

Benefits of an event-driven architecture

1. Scale and fail independently - By decoupling your services, they are only
aware of the event router, not each other. This means that your services are
interoperable, but if one service has a failure, the rest will keep running. The
event router acts as an elastic buffer that will accommodate surges in
workloads.
API Gateway
2. Develop with agility - You no longer need to write custom code to poll, filter,
and route events; the event router will automatically filter and push events to
consumers. The router also removes the need for heavy coordination between
Event Store producer and consumer services, speeding up your development process.

SQS
3. Audit with ease - An event router acts as a centralized location to audit your
application and define policies. These policies can restrict who can publish and
subscribe to a router and control which users and resources have permission to
access your data. You can also encrypt your events both in transit and at rest.

Lambda 4. Cut costs - Event-driven architectures are push-based, so everything

happens on-demand as the event presents itself in the router. This way, you’re
not paying for continuous polling to check for an event. This means less network
bandwidth consumption, less CPU utilization, less idle fleet capacity, and less
SSL/TLS handshakes.
Microservice Vs Event Driven
Cloud With Raj
www.cloudwithraj.com

Domain: cloudwithraj.com API Gateway Domain: cloudwithraj.com

API Gateway
Websocket /buy (POST)
/buy (POST)
API

Event Store

SQS

Lambda
Lambda
(Handles traffic of
(processes messages from SQS for
cloudwithraj.com/buy (POST))
cloudwithraj.com/buy (POST)) )

Database Database
Amazon DynamoDB Amazon DynamoDB

Microservice Microservice with Event Driven

The main differences are:

1. Traditional microservice is synchronous i.e. the request and response happens
with the same invocation. Whereas with Event Driven, User gets a confirmation
that message is inserted into SQS. But he doesn't get the response from the
actual message processing by the Lambda in the same invocation. Instead the
backend Lambda needs to send response out, in this case, using websocket APIs,
to the user. Or the user can query the status afterwards

2. With EDA, API Gateway, and Lambda/Database can scale independently.

Lambda can consume messages at a rate not to overwhelm the database

3. With EDA retries are built in. With microservices, if Lambda fails, user need
to send the request again. With EDA, once the message is in SQS, even if
Lambda fails, SQS will automatically retry
Event Driven Architecture (Advanced)
Cloud With Raj
www.cloudwithraj.com

Application

SNS
API Gateway

Event Store + Router

EventBridge SQS 1 SQS 2 SQS 3

Destination Destination Destination

Rule 1 Rule 3 Filter 1 Filter 2 Filter 3
Rule 2

Lambda 1 Step Function Lambda 2 Lambda 1 EKS Lambda 2

Based on values in the message, EventBridge Based on values in the message, SNS can fire
can fire different targets different targets

SNS Vs SQS Vs EventBridge Detailed Video:

3 Tier Architecture
Cloud With Raj
www.cloudwithraj.com

External Facing ALB

PRESENTATION LAYER

Auto Scaling Group Availability Zone 1 Availability Zone 2

EC2 EC2
Webserver Webserver
APPLICATION LAYER

Internal ALB
Availability Zone 1 Availability Zone 2
Auto Scaling Group

EC2 EC2
Appserver Appserver
DATABASE

Database
Amazon Aurora

1. First layer is presentation layer. Customers consume the application using this layer.
Generally, this is where the front end runs. For example - amazon.com website. This is
implemented using an external facing load balancer distributing traffic to VMs (EC2s) running
webserver.

2. Second layer is application layer. This is where the business logic resides. Going with the
previous example - you browsed your products on amazon.com, and now you found the product
you like and then click "add to cart". The flow comes to the application layer, validates the
availability, and then creates a cart. This layer is implemented with internal facing load
balancer and VMs running applications.

3. The last layer is the database layer. This is where information is stored. All the product
information, your shopping cart, order history etc. Application layer interacts with this layer
for CRUD (Create, Read, Update, Delete) operations. This could be implemented using one or
mix of databases - SQL (e.g. Amazon Aurora), and/or NoSQL (DynamoDB)

Lastly - why is this so popular in interviews? This architecture comprised of many critical
patterns - microservices, load balancing, scaling, performance optimization, high availability,
and more. Based on your answers, interviewer can dig deep and check your understanding of
the core concepts.
How Many Data Centers in One Availability
Zone?
Cloud With Raj
Incorrect Answer: www.cloudwithraj.com
One Availability Zone means one data center

1 Availability Zone

Correct Answer:
An AWS availability zone (AZ) can contain multiple data centers. Each zone is usually backed
by one or more physical data centers, with the largest backed by as many as five.
Is AWS Lambda Cheaper than Amazon EC2?
Cloud With Raj
www.cloudwithraj.com

Incorrect Answer:
Yes, AWS Lambda is cheaper than Amazon EC2
s

Lambda Cost Factors: EC2 Main Cost Factors:

Architecture (x86 vs. Graviton) Instance family
Number of requests Attached EBS
Duration of each request Duration of EC2 runtime
Amount of memory allocated (NOT used)
Amount of ephemeral storage allocated

Correct Answer:
It depends on the application. Both Lambda and EC2 have different cost factors (see above).
It is possible that, depending on the application, AWS Lambda can have a higher charge than
EC2, and vice versa. it is important to consider not just the compute cost but the TCO (Total
Cost of Ownership). With AWS Lambda there is no AMI to maintain, patch, and rehydrate,
reducing management overhead and hence overall TCO.
RPO Vs RTO
Cloud With Raj
www.cloudwithraj.com

Most Recent Backup

Disaster

Recovery Time (RTO)

Recovery Point (RPO)

Time < DATA LOSS > < DOWNTIME >

Candidates are sometimes confused by RPO thinking it's measured in unit of data, e.g.
gigabyte, petabyte etc.

Correct Answer:
Both RPO and RTO are measured in time. RTO stands for Recovery Time Objective and is
a measure of how quickly after an outage an application must be available again. RPO, or
Recovery Point Objective, refers to how much data loss your application can tolerate.
Another way to think about RPO is how old can the data be when this application is
recovered i.e. the time between the last backup and the disaster. With both RTO and
RPO, the targets are measured in hours, minutes, or seconds, with lower numbers
representing less downtime or less data loss. RPO and RTO can be and often have
different values for an application.
IP Address Vs. URL
Cloud With Raj
www.cloudwithraj.com

Virtual
Machine
(E.g. EC2)

IPAddress1
192.50.20.12

Virtual
Machine
Load Balancer
(E.g. EC2)

Access URL IPAddress2

212.60.20.12

Virtual
DNS Machine
(Domain Name (E.g. EC2)
System)
IPAddress1
Assigns URL to Load Balancer
250.80.10.12
(Uniform Resource Locator)
(Went Down!!)

Bad Answer:
URL is a link assigned to an IP address

Correct Answer:
IP Address is a unique number that identifies a device connected to the internet, such as a
Virtual Machine running your application. However, accessing a resource using this unique
number is cumbersome; moreover, let's say when a VM comes down (the bottom one in the
diagram), a new VM comes up to replace it with a different IP address. Hence, in reality,
application running inside the VM is accessed using URL or Uniform Resource Locator.

One URL does generally NOT map to one IP address; rather, the URL (e.g., www.amazon.com) is
mapped to a Load Balancer, and that Load Balancer distributes traffic to multiple VMs with
different IP addresses. Even if one VM goes down and another comes up, this Load Balancer
using a URL always works because the Load Balancer appropriately distributes traffic across
healthy instances. This way, you (the user) do not need to worry about the underlying IP
addresses.
DevOps CICD Phases with Tools
Cloud With Raj
www.cloudwithraj.com

Amazon AWS X-Ray

VS Code GitHub AWS CodeCommit Jenkins AWS CodeBuild Jenkins AWS CodeBuild Jenkins AWS CodeDeploy CloudWatch

Author Source Build Test Deploy Monitor

Write code Check-in Compile code Integration testing Deploy artifacts Logs, metrics, and
source code Create artifacts Load testing traces
Unit testing UI testing
Penetration testing

Continuous Integration (CI)

Continuous Deployment (CD)

DevOps CICD Phases
Cloud With Raj
www.cloudwithraj.com

Amazon AWS X-Ray

VS Code GitHub AWS CodeCommit Jenkins AWS CodeBuild Jenkins AWS CodeBuild Jenkins AWS CodeDeploy CloudWatch

Author Source Build Test Deploy Monitor

Write code Check-in Compile code Integration testing Deploy artifacts Logs, metrics, and
source code Create artifacts Load testing traces
Unit testing UI testing
Penetration testing

Continuous Integration (CI)

Manual Approval

Continuous Delivery (CD)

Continuous Deployment (CD)

Kubernetes Tools Ecosystem with AWS
Cloud With Raj
www.cloudwithraj.com

Cloud Implementation

Amazon EKS

Observability

Prometheus Grafana Fluentbit Jaeger ADOT CloudWatch X-Ray

Scaling
Karpenter AutoScaling

Delivery/Automation

Argo Terraform Jenkins Github Actions Gitlab CICD

Security

Gatekeeper Trivvy ECR Scan GuardDuty Kube Bench Secrets Istio

Manager

Cost Optimization
CloudWatch Cost and Usage Kubecost
Container Report (New
Insights feature - Split
Cost Allocation)
Traditional CICD Vs. GitOps
Cloud With Raj
Traditional CICD www.cloudwithraj.com

Container Image
CD Tool
CI Tool CD Tool Pushes files
Code &
Dockerfile 3
Git Repo Amazon ECR 2
Amazon EKS
1 Manifests
updated with
Manifests container image tag

C
GitOps Checks for GitOps Tool Installed
difference in Cluster
Container Image between cluster
and Git
CI Tool CD Tool
Code &
Dockerfile
Git Repo Amazon ECR Pulls in
Amazon EKS
changed files
A B Manifests
updated with
Manifests container image tag

Traditional DevOps

Step 1: Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins)
kick off, build the container image and save the image in a container registry such as Amazon ECR.

Step 2: CD tools (e.g. Jenkins) update the deployment manifest files with the tag of the container image.

Step 3: CD tools (e.g. Jenkins) execute the command to deploy the manifest files into the cluster, which, in terms,
deploys the newly built container in the Amazon EKS cluster.

Conclusion - Traditional CICD is a push based model. If a sneaky SRE changes the YAML file directly in the cluster (e.g.
changes number of replica, or even the container image itself!), the resources running in the cluster will deviate from
what's defined in the YAML in the Git. Worse case, this change can break something, and DevOps team need to rerun
part of the CICD process to push the intended YAMLs to the cluster

GitOps

Step A: Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins)
kick off, build the container image and save the image in a container registry such as Amazon ECR.

Step B: CD tools (e.g. Jenkins) update the deployment manifest files with the tag of the container image.

Step C: With GitOps, Git becomes the single source of truth. You need to install a GitOps tool like Argo inside the cluster
and point to a Git repo. Git keeps checking if there is a new file, or if the files in the cluster drifts from the ones in Git.
As soon as YAML is updated with new container image, there is a drift between what's running in the cluster vs what's in
Git. ArgoCD pulls in this updated YAML file and deploys new container.

Conclusion - GitOps does NOT replace DevOps. As you can see GitOps only replaces part of the CD process. If we think
about the previous scenario where the sneaky SRE directly changes the YAML in cluster, ArgoCD will detect the mismatch
between the changed file vs the one in Git. Since there is a difference, it will pull in the file from Git and bring
Kubernetes resources to it's intended state. And don't worry, Argo can also send a message to the sneaky SRE's manager
;).
Platform Team and Developer Team
Cloud With Raj
www.cloudwithraj.com
Container Image

CI Tool CD Tool
Code &
CD
6
Dockerfile
Git Repo Amazon ECR Co To
nt ol
ain
4 5 Manifests
er
de
pl
updated with oy
Manifests ed
container image tag

Amazon EKS
Requests
Infrastructure
Developer 3

1
2
Ticketing Infra as Code (IaC)
Platform Team
System (Terraform, CDK etc.)

Recently, the term "platform team" has been floating around plenty. But what do platform team do? How are they
different from the developer team? Let's understand with the diagram below:

Step 1: The developer team requests the Platform team to provision appropriate AWS resources. In this example, we are
using Amazon EKS for the application, but this concept can be extended to any other AWS service. This request for AWS
resources is typically done via the ticketing system.

Step 2: The platform team receives the request.

Step 3: The platform team uses Infrastructure as Code (IaC), such as Terraform, CDK, etc., to provision the requested
AWS resources, and share the credentials with the Developer team.

Step 4: The developer team kicks off the CICD process. We are using a container process to understand the flow.
Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins, GitHub
actions) kick off, build the container image and save the image in a container registry such as Amazon ECR.

Step 5: CD tools (e.g. Jenkins, Spinnaker) update the deployment manifest files with the tag of the container image.

Step 6: CD tools execute the command to deploy the manifest files into the cluster, which, in terms, deploys the newly
built container in the Amazon EKS cluster.

Conclusion - The platform team takes care of the infrastructure (often with the guardrails) appropriate for the
organization, and the developer team uses that infrastructure to deploy their application. The platform team does the
upgrade and maintenance of the infrastructure to reduce the burden on the developer team.
Scaling Difference Between Lambda, EC2, EKS
Cloud With Raj
www.cloudwithraj.com
Auto Scaling Group

EC2 Scaling: You need to use a Auto Scaling Group (ASG) and define on
what EC2 metric you want it to scale e.g. CPU utilization. You can use
ASG's "minimum number of instances" to run certain number of instances
on all time. Recently ASG also supports scheduled scaling, and warm pool.
EC2 EC2 EC2

Lambda Scaling: No ASG needed. For each incoming connection, Lambda

automatically scales. Consider increasing the concurrency setting for
Lambda as needed. Implement Provisioned Concurrency to keep certain
number of Lambda pre-warmed. This can be done either with schedule or
based on Provisioned Concurrency utilization.
Lambda Lambda Lambda

EKS
Auto Scaling Group

EKS Scaling: EKS scaling is the most complex scaling. You may or may
NOT use a Auto Scaling Group but it does NOT work like regular EC2
scaling. Please refer to the next page to learn about EKS
(Kubernetes/K8s) scaling in detail
EC2 EC2 EC2
How Does Kubernetes Worker Nodes Scale?

Cloud With Raj

www.cloudwithraj.com

Worker VM Worker VM
1 2

Node Autoscaler
Pending
Unschedulable

Worker VM Worker VM Worker VM

3 4

Incorrect Answer:
Set Auto Scaling Groups to scale at a certain VM metric utilization like scaling regular VMs.

Correct Answer:
Step 1: You configure HPA (Horizontal Pod Autoscaler) to increase the replica of your pods at a
certain CPU/Memory/Custom Metrics threshold.

Step 2: As traffic increases and the pod metric utilization crosses the threshold, HPA
increases the number of pods. If there is capacity in the existing worker VMs, then the
Kubernetes kube-scheduler binds that pod into the running VMs.

Step 3: Traffic keeps increasing, and HPA increases the number of replicas of the pod. But
now, there is no capacity left in the running VMs, so the kube-scheduler can't schedule the
pod(yet!). That pod goes into pending, unschedulable state

Step 4: As soon as pod(s) go to pending unschedulable state, Kubernetes node scalers (such as
Cluster Autoscaler, Karpenter etc.) provisions a new node. Cluster Autoscaler requires an Auto
Scaling Group where it increases the desired VM count, whereas Karpenter doesn't require an
Auto Scaling Group or Node Group. Once the new VM comes up, kube-scheduler puts that
pending pod into the new node.
Gen AI 4 Layers
Cloud With Raj
www.cloudwithraj.com

EASY
Applications
(E.g. Adobe Firefly, LLM

Opportunity For New Market Players

Chatbots)

Most amount of jobs

Learning Curve Infrastructure Providers to (MLOps, LLM with
Host/Train LLMs Kubernetes/Serverless,
(E.g. AWS, Azure, GCP) Cloud LLM services etc.)

LLM Models
(E.g. Open AI, Anthropic)

Silicon Chips
HARD (E.g. AMD, NVIDIA)

Gen AI hype is at an all-time high, and so is the confusion. What do you study, how do you think about it,
and where are the most jobs? These are the burning questions in our minds. Gen AI can be broken down into
the following four layers.

1. The bottom layer is the hardware layer, i.e., the silicon chips that can train the models. Example - AMD,
NVIDIA

2. Then comes the LLM models that get trained and run on the chips. Examples are Open AI, Anthropic etc.

3. Then comes infrastructure providers who provide an easier way to consume, host, train, inference the
models. Example is AWS. This layer consists of managed services such as Amazon Bedrock, which hosts pre-
trained models, or provision VMs (Amazon EC2) where you can train your own LLM

4. Finally, we have the application layer which uses those LLMs. Some examples are Adobe Firefly, LLM
chatbots, LLM travel agents etc.

Now, the important part - as you go from the bottom to the top, the learning curve gets easier, and so does
the opportunity for new market players to enter. Building new chips requires billions of dollars of investments,
and hence, it's harder for new players to enter the market. The most opportunities are in the top two
layers. If you already know the cloud, then integrating Gen AI with your existing knowledge will increase your
value immensely. If you are working in DevOps, learn MLOps; if you know K8s/Serverless, learn how you can
integrate Gen AI with those; if you work in an application; integrate with managed LLM services to enhance
functionality, you got the idea!
Prompt Engineering Vs RAG Vs Fine Tuning
Cloud With Raj
www.cloudwithraj.com

Prompt Engineering
1
Prompt

Subpar Response

2
Enhanced Prompt
Amazon Bedrock
Better Response (Hosts LLM)

1. You send a prompt to the LLM (hosted in Amazon Bedrock in this case), and
get a response which you are not satisfied with
2. You enhance the prompt, and finally come up with a prompt that gives desired
better response

RAG (Retrieval Augmented Generation)

Prompt that can be Augment original prompt

2 4
enhanced by company data with retrieved info

Generated answer 5 Generated answer

Amazon Bedrock
Code/Jupyter Notebook/App
(Hosts LLM)

Search Vector DB with Retrieve relevant info

3
the prompt related to prompt

Embeddings

1
Company data
Vector Database

1. RAG (Retrieval Augmented Generation) is used where the response can be

made better by using company specific data that the LLM does NOT have. You
store relevant company data into a vector database. This is done by a process
called embeddings where data is transformed into numeric vectors
2. User gives a prompt which can be made better by adding company specific
info
3. A process (code/jupyter notebook/application) converts the prompt into vector
and then search the vector database. Relevant info from the vector database is
RETRIEVED (First Part of RAG) and returned
4. The original prompt is AUGMENTED (Second part of RAG) with this company
specific info and sent to LLM
5. LLM GENERATES (Last part of RAG) the response and sends back to the
user

Fine Tuning

Prompt for Task-specific

organization use case training dataset

Response
Fine Tuned LLM Base LLM

1. If you need a LLM which is very specific to your company/organization's use

case that RAG can't solve, you train the base LLM with large training dataset
for the tasks. The output is a fine tuned LLM.
2. User asks question related to the use case and gets answer
RAG (Retrieval Augmented Generation) with Amazon BedRock
Cloud With Raj
www.cloudwithraj.com

Prompt that can be Augment original prompt

2 4
enhanced by company data with retrieved info

Generated answer 5 Generated answer

Amazon Bedrock LLMs
Code/Jupyter Notebook/App

Search Vector DB with Retrieve relevant info

3
the prompt related to prompt

Embeddings
1

OpenSearch Bedrock LLM to S3 Company data

Serverless convert data to
(Vector Store) embedding
OR
(BYO Vector Store)

Bedrock Knowledge Base

RAG (Retrieval Augmented Generation) is used where the response can be made better by using
company specific data that the LLM does NOT have. Amazon BedRock makes it very easy to do
RAG. Below are the steps:
1. You store relevant company data into a S3 bucket. Then from BedRock Knowledge bases, you
select an emedding LLM (Amazon Titan Embed or Cohere Embed) which converts the S3 data into
embeddings (vector). Knowledge Base can also create a serverless vector store for you to save
those embeddings. Alternatively you can also bring your own Vector Database (OpenSearch,
Aurora, Pinecone, Redis)
2. User gives a prompt which can be made better by adding company specific info
3. A process (code/jupyter notebook/application) converts the prompt into vector and then search
the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG)
and returned
4. The original prompt is AUGMENTED (Second part of RAG) with this company specific info and
sent to another Bedrock LLM
5. BedRock LLM GENERATES (Last part of RAG) the response and sends back to the user
EKS Upgrade Simplified Using Karpenter
Cloud With Raj
www.cloudwithraj.com

EKS Version 1.27

EKS Control Plane

Provisions EC2s with latest
AMI for v1.27 Gets latest AMI for v1.27
EKS Data Plane

EC2 Karpenter Systems Manager

EC2
(EKS-Optimized AMI (Running in Data Plane) Parameter Store
(EKS-Optimized AMI
for v1.27) for v1.27) (EKS-Optimized AMI list
for all EKS versions)

New EKS Version Released

Updates Control Plane to

v1.28
EKS Version 1.28
1
EKS Control Plane
Recycles Worker Nodes
with v1.28 AMI in Rolling
Deployment Fashion Gets latest AMI for v1.28
EKS Data Plane
3 2
EC2 Karpenter Systems Manager
EC2
(EKS-Optimized AMI (Running in Data Plane) Parameter Store
(EKS-Optimized AMI
for v1.28) for v1.28) (EKS-Optimized AMI list
for all EKS versions)

EKS upgrade can be tedious. But Karpenter can automatically upgrade your Data Plane worker nodes reducing your burden. Here's
how:

a. EKS Optimized AMI IDs are listed in AWS Systems Manager parameter store. CNCF project Karpenter, the next gen cluster
autoscaler, periodically checks this, and reconciles with the running worker nodes to see if they are running with the latest EKS-
Optimized AMI for the particular EKS version. In this case, let's assume EKS is running with v1.27.

b. At a certain point, EKS releases next version 1.28. And the below workflow takes place:
1. Admin upgrades the EKS control plane to v1.28.
2. Following the previous logic, Karpenter retrieves the latest AMI for v1.28 and check if worker nodes are running with those.
They are NOT, so a Karpenter Drift is triggered.
3. To fix this Drift, Karpenter automatically updates the worker nodes to v1.28 AMIs. And it does it using rolling deployment
(i.e. a new node comes up, existing node cordoned and drained, then terminated). It also respects the Kubernetes eviction API
parameters, such as maintaining PDB.

If you want to know the process in detail, including with custom AMIs, check out the Karpenter drift blog -
https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/
Container Lifecycle - Local to Cloud
Cloud With Raj
www.cloudwithraj.com

Container Image
docker build docker run docker push kubectl apply

Developer Code & (In Local Machine) Test Container in Local

Amazon ECR
Dockerfile Amazon EKS
Machine
Manifest
with image url

The fundamental container workflow from local machine to cloud is below:

1. Developer writes code, and associated Dockerfile to containerize the code in her local machine

2. She uses "Docker build" command to create the container image, in her local machine. At this point
container image is saved in the local machine

3. Developer uses "Docker run" command to run the container image, and test out the code running from the
container. Developer can repeat Steps 1-3, till the testing goes as per the requirements

4. Next, developer runs "Docker push" command to push the container image from the local machine to a
container registry. Some examples are DockerHub, or Amazon ECR.

5. Finally, using "Kubectl apply" command, an YAML manifest which has the URL of the container image from
the Amazon ECR, is deployed into the running Kubernetes cluster.

Note that, this is to understand the lifecycle of the container. In real-world, after testing is done in local
machine, the following steps are automated. Refer to the "Traditional CICD vs GitOps" page for that workflow
Gen AI Multi Model Invocation
Cloud With Raj
www.cloudwithraj.com

API Gateway

Event Store + Router

EventBridge
Based on values in
the message,
EventBridge can fire
different targets
Rule 1 Rule 3
Rule 2

Lambda 1 Step Function ECS Fargate

Bedrock Sagemaker Bedrock

Invokes Jumpstart Invokes
LLM A Invokes LLM C
LLM B
Git Workflow
Cloud With Raj
www.cloudwithraj.com

file1 file1
file1 file1

git add git commit git push

Local IDE Index/Staging

Local Repository
(E.g. Visual Studio Code) Area Remote Repository
(Works like a
(E.g. GitHub, GitLab)
database)

Local Machine

Git and GitHub for Beginners Crash Course (Click on the YouTube icon):
DevSecOps Workflow
Cloud With Raj
www.cloudwithraj.com

Amazon AWS X-Ray

VS Code GitHub AWS CodeCommit Jenkins AWS CodeBuild Jenkins AWS CodeBuild Jenkins AWS CodeDeploy CloudWatch

Author Source Build Test Deploy Monitor

Use AWS Secrets Manager Static code analysis

Penetration testing Dynamic testing and analysis
How is the app exposed? Sample tools -
DDoS Testing Sample tools - Astra, Invicti
Use private subnet, AWS WAF, SonarQube, graudit
Fault injection Monitor host
AWS Shield, AuthN/Z Lint Infra as Code
simulation

Security Embedded Throughout The Pipeline

What Happens When You Type An URL
Cloud With Raj
www.cloudwithraj.com

3
2 Get IP DNS root name server
address (LB
Checks local
of URL
machine If NOT cached frontend)
caches for
IP Address
of the URL
Name server for .com TLD

DNS Resolver

1 Types www.amazon.com
4
Amazon Route 53
Obtained IP address

EC2 Hosting Amazon.com

6 Front end pages

Page rendered and displayed to 5

user
Load Balancer (LB)

EC2 Hosting Amazon.com

Free video explaining steps in detail: Click

Well Architected Framework
Cloud With Raj
www.cloudwithraj.com

The AWS Well-Architected Framework describes key concepts, design principles, and architectural best practices
for designing and running workloads in the cloud. By answering a few foundational questions, learn how well your
architecture aligns with cloud best practices and gain guidance for making improvements. AWS Well-Architected
Framework assessment questions can be answered inside your AWS account, and then a scorecard can be obtained
that evaluates your application in respect to the below six pillars

Operational Excellent Pillar Security Pillar Reliability Pillar

The operational excellence pillar focuses The security pillar focuses on protecting The reliability pillar focuses on
on running and monitoring systems, and information and systems. Key topics workloads performing their intended
continually improving processes and include confidentiality and integrity of functions and how to recover quickly
procedures. Key topics include data, managing user permissions, and from failure to meet demands. Key
automating changes, responding to establishing controls to detect security topics include distributed system
events, and defining standards to events. design, recovery planning, and adapting
manage daily operations. to changing requirements.

Performance Efficiency Pillar Cost Optimization Pillar Sustainability Pillar

The performance efficiency pillar The cost optimization pillar focuses on The sustainability pillar focuses on
focuses on structured and streamlined avoiding unnecessary costs. Key topics minimizing the environmental impacts of
allocation of IT and computing include understanding spending over running cloud workloads. Key topics
resources. Key topics include selecting time and controlling fund allocation, include a shared responsibility model for
resource types and sizes optimized for selecting resources of the right type sustainability, understanding impact, and
workload requirements, monitoring and quantity, and scaling to meet maximizing utilization to minimize
performance, and maintaining efficiency business needs without overspending. required resources and reduce
as business needs evolve. downstream impacts.
AWS Migration Tools
Cloud With Raj
www.cloudwithraj.com

AWS Direct Connect AWS Migration Hub AWS Application Discovery

(or VPN or Internet) Service
(Discovers on premises
applications)

Oracle AWS Database Migration Amazon Aurora

Service (DMS)

AWS Application Migration Amazon EC2

Servers
Service

Amazon S3

Shared File System AWS DataSync

Amazon EFS
Multi-site Active Active
Cloud With Raj
www.cloudwithraj.com

AWS Cloud
Geolocation/Latency Routing

Route 53

Region A Region B
Availability Zone Availability Zone Availability Zone Availability Zone
VPC VPC

Elastic Load Balancer Elastic Load Balancer

Front end Front end Front end Front end

server server server server

Auto Scaling Group Auto Scaling Group

App App App App

server server server server
Auto Scaling Group Auto Scaling Group

DynamoDB global
table replication

DynamoDB DynamoDB DynamoDB DynamoDB

continuous backup continuous backup
Kubernetes Node Pod Container Relationship
Cloud With Raj
www.cloudwithraj.com

Amazon EKS
Container Image
(Control Plane)

Developer Code & (In Local Machine) Amazon ECR

Dockerfile Container
Stored Manifest
with image url

EKS Worker Node EKS Worker Node

1. Developer creates the container image which gets deployed on Amazon EKS.

2. The container image runs inside Kubernetes Pod. Pod is the minimum deployable unit in Kubernetes.
A container can NOT run without being inside a pod

3. Pod runs inside EC2 worker nodes. There can be multiple pods running inside one EC2.

4. Easy way to remember this is "NPC" (like in video games) = Node - Pod - Container in that order
i.e. Node hosts one or many pods. One pod hosts one or many containers
Batch Workloads on AWS
Cloud With Raj
www.cloudwithraj.com

Amazon ECR
Amazon ECS

Fargate EC2
Amazon EventBridge AWS Batch
Scheduler
Amazon EKS

1. Use EventBridge scheduler to schedule jobs. You can also trigger jobs based on EventBridge rules
which give you great flexibility and power!

2. Then the batch job is submitted via AWS Batch. Obviously you are thinking, I see container image
and then ECS/EKS, why can't I submit a container job directly from EventBridge without AWS
Batch. it's because AWS Batch maintains job queues, restart, resource management etc. that is not
available if you skip it.

3. The actual steps of the job needs to be in a container

4. The job running in a container gets submitted in either ECS or EKS. AWS Batch supports both EC2
and Fargate
STAR Interview
Cloud With Raj
www.cloudwithraj.com

S T A R

SITUATION ACTION RESULT

TASK
background of the goals that you need steps that you took what you have achieved
project. to achieve

One of the biggest mistake people make in behavioral interviews is, they keep on saying "we" - "we did this task", "we came up
with a plan", "we completed step X". This may sound weird - in Amazon we expect you to be a team player, but in interview we
want to know precisely what YOU did. Think of this as a player selection in your superbowl team. Yes, we want the receiver to
play well with the team, but while selecting, we only care about his stats and his abilities. So make sure to clarify what part
YOU played in your answers.

Next biggest mistake people do is they talk in hypotheticals. When a question starts with "Tell me a time when you..", you
must give example from your past projects. If you just answer in hypotheticals such as "I will do this, that..", you will fail the
interview.

Okay, now let's look at a sample question and answer.

Q: Tell me about one difficult project that you delivered. What was the difficulty, and how did you determine the course of
action? What was the result?

A: I migrated our project to AWS Serverless after considering K8s and EC2. We coded the lambda, tested it, implemented
DevOps then deployed into prod and it was a huge success.
Is the above answer good or bad? It's quite bad, why?
• Situation and Task not described
• “We” - what actions did YOU perform?
• “Huge Success” - No Data, Very Subjective

A good answer may look like this:

Situation - We have 20 Microservices running on-prem on PCF. PCF license needed to be renewed in 6 months, leadership
wanted the project to migrate to AWS before that to save cost and increase agility.

Task - As a lead architect/developer/techlead, I was tasked to find out suitable AWS solution, within the timeframe given.

Action - I researched possible ways to run Microservices on AWS. I narrowed it down to three options - run each
microservice on vanilla EC2, or run on K8s using EKS, or Serverless. I took one of the microservices and did POC on Vanilla
EC2, EKS and Lambda-API Gateway. While they all did the job, I found that with EC2 I have to take care of making it HA by
spinning multiple EC2 in multiple AZs, and there is overhead of AMI rehydration. EKS seems to be a valid solution. However,
given the traffic patterns, we have to pay more than necessary. There is also an overhead of training the team on K8s.
Lambda-API Gateway is inherently HA, scalable, and pay what we use and no server to manage at all. This simplifies our day 2
operational overhead and let us focus on delivering business value.

Result - Based on all the POC data of performance, cost and time to deploy, I selected Serverless solution. We converted
rest of the microservices to Lambda and implemented in production within 3 months. It resulted in over 90% cost savings
over EC2 and K8s. I shared my project learnings with other teams and showed them how to code Lambda so they can utilize it
as well. I got recognized by CIO for this effort.

Why is this answer good?

• Situation, Task, Action, Results are clearly defined
• Gives details on what I did
• Result has data, and not just "huge success"

Note that you will get follow-up questions on the answer to understand your depth and to make sure you are not just copying
and pasting a generic answer from the Internet 😅.
Hybrid Cloud Architecture
Cloud With Raj
www.cloudwithraj.com

Hybrid means AWS and Data Center working together for the application

User

ALB Route 53

Site to Site
VPN

Amazon EC2 On-Prem Databases

(Application code moved to (Will stay on-premises till all
AWS) apps move to AWS, or perhaps
due to regulatory
requirements)
Karpenter Consolidation Saving Money
Cloud With Raj
www.cloudwithraj.com

Worker VM Worker VM Worker VM Worker VM

Underutilized Nodes

kind: NodePool
spec:
2
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized

Enable Consolidation at Karpenter NodePool YAML

Worker VM Worker VM Worker VM Worker VM

Pods Consolidated (Binpacked)

4 Unused nodes terminated == Significant savings

Worker VM Worker VM

To Know More: Get Highest Rated Karpenter

Masterclass Course in Udemy (Click the image)

OpenShift Novedades y Roadmap
No ratings yet
OpenShift Novedades y Roadmap
64 pages
Microservices for Developers
No ratings yet
Microservices for Developers
62 pages
OpenShift - Container - Platform 4.18 Architecture en US
No ratings yet
OpenShift - Container - Platform 4.18 Architecture en US
73 pages
Oracle SOA Suite Developer S Guide 1st Edition Matt Wright Full Digital Chapters
No ratings yet
Oracle SOA Suite Developer S Guide 1st Edition Matt Wright Full Digital Chapters
164 pages
OpenShift CLI Essential Commands and Examples
No ratings yet
OpenShift CLI Essential Commands and Examples
10 pages
1 - Oracle Cloud Overview
No ratings yet
1 - Oracle Cloud Overview
34 pages
Devops SRE
No ratings yet
Devops SRE
7 pages
Architecting and Operating OpenShift Clusters: OpenShift For Infrastructure and Operations Teams 1st Edition William Caban Instant Download
100% (1)
Architecting and Operating OpenShift Clusters: OpenShift For Infrastructure and Operations Teams 1st Edition William Caban Instant Download
47 pages
Oracle SOA Suite 12c Handbook 1st Edition Jellema Official Test Bank
No ratings yet
Oracle SOA Suite 12c Handbook 1st Edition Jellema Official Test Bank
310 pages
Openshift - Container - Platform 4.15 Cli - Tools en Us
No ratings yet
Openshift - Container - Platform 4.15 Cli - Tools en Us
143 pages
Naukri SriramNalla (13y 0m)
No ratings yet
Naukri SriramNalla (13y 0m)
4 pages
Cicd Microservice Pipline
No ratings yet
Cicd Microservice Pipline
21 pages
50 Years of Artificial Intelligence 2007th Edition Max Lungarella Full Chapters Instanly
100% (1)
50 Years of Artificial Intelligence 2007th Edition Max Lungarella Full Chapters Instanly
68 pages
Oop Java
No ratings yet
Oop Java
375 pages
Microservice Architecture
No ratings yet
Microservice Architecture
172 pages
Artificial Intelligence Programming With Python From Zero To Hero 1st Edition Perry Xiao Download
50% (2)
Artificial Intelligence Programming With Python From Zero To Hero 1st Edition Perry Xiao Download
77 pages
Java
No ratings yet
Java
129 pages
Continuous Delivery and GitOps On OpenShift
No ratings yet
Continuous Delivery and GitOps On OpenShift
32 pages
Oracle SOA Consultant Resume
No ratings yet
Oracle SOA Consultant Resume
6 pages
OpenShift Container Platform 4.17 Installing On A Single Node
No ratings yet
OpenShift Container Platform 4.17 Installing On A Single Node
37 pages
CIS Red Hat OpenShift Container Platform Benchmark V1.6.0 PDF
No ratings yet
CIS Red Hat OpenShift Container Platform Benchmark V1.6.0 PDF
354 pages
An Architect's Guide to SRE
No ratings yet
An Architect's Guide to SRE
375 pages
Microservice Architecture
No ratings yet
Microservice Architecture
22 pages
Cloud Giants: AWS vs Azure vs GCP
No ratings yet
Cloud Giants: AWS vs Azure vs GCP
2 pages
Openshift - Container - Platform 4.10 Distributed - Tracing en Us
No ratings yet
Openshift - Container - Platform 4.10 Distributed - Tracing en Us
50 pages
Google Cloud Platform Forensics
No ratings yet
Google Cloud Platform Forensics
1 page
Java
No ratings yet
Java
68 pages
Cloud Devops Interview Questions Part-2
No ratings yet
Cloud Devops Interview Questions Part-2
139 pages
Devops Shack Interview Quide
No ratings yet
Devops Shack Interview Quide
34 pages
Aws Security Best Practices Cheat Sheet
No ratings yet
Aws Security Best Practices Cheat Sheet
5 pages
13 - Guided Lab 1 - Breaking A Monolithic Node - Js Application Into Microservices
No ratings yet
13 - Guided Lab 1 - Breaking A Monolithic Node - Js Application Into Microservices
25 pages
AWS Certified Solutions Architect Associate Jon Bonso & Adrian Formaran All Chapters Available
No ratings yet
AWS Certified Solutions Architect Associate Jon Bonso & Adrian Formaran All Chapters Available
116 pages
Openshift Container Platform 4.6: Jaeger
No ratings yet
Openshift Container Platform 4.6: Jaeger
49 pages
AWS Notion
No ratings yet
AWS Notion
297 pages
Raj Saha Cloud With Raj: Instructor Bio
No ratings yet
Raj Saha Cloud With Raj: Instructor Bio
331 pages
Ann Afamefuna
No ratings yet
Ann Afamefuna
42 pages
AWS Interview Question
No ratings yet
AWS Interview Question
210 pages
Certified Blockchain Developer CBDH Complete Video Course and Practice
No ratings yet
Certified Blockchain Developer CBDH Complete Video Course and Practice
4 pages
Cybersecurity Roadmap (Offensive + Defensive)
No ratings yet
Cybersecurity Roadmap (Offensive + Defensive)
15 pages
Guide To Orchestrating and Deploying Containers
100% (1)
Guide To Orchestrating and Deploying Containers
49 pages
Mod 1 Architecting Fundamentals
No ratings yet
Mod 1 Architecting Fundamentals
39 pages
Manisha Telugu
No ratings yet
Manisha Telugu
22 pages
Containers in The Cloud
No ratings yet
Containers in The Cloud
56 pages
Docker Error Troubleshooting Guide
No ratings yet
Docker Error Troubleshooting Guide
40 pages
Cheat Sheet AWS Solutions Architect Professional
No ratings yet
Cheat Sheet AWS Solutions Architect Professional
177 pages
AWS Microservices & CI/CD Guide
No ratings yet
AWS Microservices & CI/CD Guide
58 pages
DevOps Culture and Practice With Openshift Section4
No ratings yet
DevOps Culture and Practice With Openshift Section4
82 pages
OpenShift Container Platform 4.17 Installing An On-Premise Cluster With The Agent-Based Installer
No ratings yet
OpenShift Container Platform 4.17 Installing An On-Premise Cluster With The Agent-Based Installer
94 pages
Bedtime Kubernetes Stories For DevOps Architects PDF
No ratings yet
Bedtime Kubernetes Stories For DevOps Architects PDF
15 pages
Istio Service Mesh Summary1 8th April 2023
No ratings yet
Istio Service Mesh Summary1 8th April 2023
4 pages
Kubernetes Basic Blog
No ratings yet
Kubernetes Basic Blog
25 pages
How I Cracked The AWS Solution Architect Cloud Quest. - DEV Community
No ratings yet
How I Cracked The AWS Solution Architect Cloud Quest. - DEV Community
8 pages
Aws Short Notes
No ratings yet
Aws Short Notes
28 pages
01 Migration Policy and Process
No ratings yet
01 Migration Policy and Process
14 pages
CFN Ug
No ratings yet
CFN Ug
4,069 pages
OpenShift Container Platform 4.17 Installing On Any Platform
No ratings yet
OpenShift Container Platform 4.17 Installing On Any Platform
73 pages
Artificial Intelligence With Python 1st Edition Prateek Joshi Newest Edition 2025
100% (1)
Artificial Intelligence With Python 1st Edition Prateek Joshi Newest Edition 2025
158 pages
Cloud Interview Guide 2024
No ratings yet
Cloud Interview Guide 2024
29 pages
Cloud Interview Guide V 7
No ratings yet
Cloud Interview Guide V 7
55 pages
Being Well-Architected in The Cloud
No ratings yet
Being Well-Architected in The Cloud
69 pages
React JS
No ratings yet
React JS
4 pages
Cybersecurity in The Digital Age Protecting Data in A Connected World
No ratings yet
Cybersecurity in The Digital Age Protecting Data in A Connected World
10 pages
Mad 2.2
No ratings yet
Mad 2.2
6 pages
Module 3 Private Blockchain System
No ratings yet
Module 3 Private Blockchain System
34 pages
M05 - Identify and Resolve Network Problems
No ratings yet
M05 - Identify and Resolve Network Problems
38 pages
Using React Hooks, UseEffect To Make Reoccurring API Calls - by Johee Chung - Medium
No ratings yet
Using React Hooks, UseEffect To Make Reoccurring API Calls - by Johee Chung - Medium
11 pages
Review2 PPT - Meghana
No ratings yet
Review2 PPT - Meghana
32 pages
LI 6500WX: Making Life Easier and Safer
No ratings yet
LI 6500WX: Making Life Easier and Safer
17 pages
Numsol Project 1
No ratings yet
Numsol Project 1
3 pages
8 Modularization Techniques
100% (2)
8 Modularization Techniques
34 pages
(Ebook PDF) Adaptive Health Management Information Systems: Concepts, Cases, and Practical Applications 4th Edition Instant Download
0% (1)
(Ebook PDF) Adaptive Health Management Information Systems: Concepts, Cases, and Practical Applications 4th Edition Instant Download
56 pages
Unit Converter: CM To Inches Converter - Rapidtables
No ratings yet
Unit Converter: CM To Inches Converter - Rapidtables
4 pages
Vandana Resume 12122023
No ratings yet
Vandana Resume 12122023
3 pages
MySynovus BillPay QuickTips
No ratings yet
MySynovus BillPay QuickTips
3 pages
Matrikon OPC UA Explorer: Datasheet
No ratings yet
Matrikon OPC UA Explorer: Datasheet
3 pages
CPU Instruction Set Basics
No ratings yet
CPU Instruction Set Basics
13 pages
وبا کے دنوں میں محبت - گیبریل گارشیا مارکیز PDF
No ratings yet
وبا کے دنوں میں محبت - گیبریل گارشیا مارکیز PDF
410 pages
LLMNR Attack
No ratings yet
LLMNR Attack
4 pages
Performance Task 1 - Attempt Review Prog 114
100% (1)
Performance Task 1 - Attempt Review Prog 114
4 pages
IStorage Datashur User Guide - V2.1
No ratings yet
IStorage Datashur User Guide - V2.1
14 pages
UNIT 2 Notes CSS
No ratings yet
UNIT 2 Notes CSS
12 pages
An Analysis On Measuring Graph Patterns in Social Networks
No ratings yet
An Analysis On Measuring Graph Patterns in Social Networks
6 pages
DBMS Module 5
No ratings yet
DBMS Module 5
10 pages
AES and DES Performance Comparison
No ratings yet
AES and DES Performance Comparison
9 pages
Spring Data JPA
No ratings yet
Spring Data JPA
13 pages
Supplychainobject
No ratings yet
Supplychainobject
10 pages
DPCM
No ratings yet
DPCM
9 pages
Chapter 9 - Parallel Computation Problems
No ratings yet
Chapter 9 - Parallel Computation Problems
43 pages
DDM Question Bank @
100% (1)
DDM Question Bank @
20 pages
Repair and Maintenance of Mobile Cell Phones
No ratings yet
Repair and Maintenance of Mobile Cell Phones
6 pages