AWS Cloud Interview Prep Guide
AWS Cloud Interview Prep Guide
More To Be Added...
                    Important AWS Services for Interviews
                                              Cloud With Raj
                                              www.cloudwithraj.com
Compute
Storage
Network
Security
Gen AI
Migration
Event Driven
Observability
Cost Optimization
                      Compute     CloudWatch        Cost Explorer         Budget      Spot Instance       Reserve      Savings Plan
                      Optimizer     Insights                                                             Instance
                                                                                                         Reporting
Analytics
DevOps
                     CloudFormation
                                                   Microservices with ALB
                                                          Cloud With Raj
                                                          www.cloudwithraj.com
              SQS
                                3. Audit with ease - An event router acts as a centralized location to audit your
                                application and define policies. These policies can restrict who can publish and
                                subscribe to a router and control which users and resources have permission to
                                access your data. You can also encrypt your events both in transit and at rest.
Event Store
SQS
                                Lambda
                                                                                       Lambda
                           (Handles traffic of
                                                                           (processes messages from SQS for
                      cloudwithraj.com/buy (POST))
                                                                             cloudwithraj.com/buy (POST)) )
                              Database                                                  Database
                           Amazon DynamoDB                                           Amazon DynamoDB
                             3. With EDA retries are built in. With microservices, if Lambda fails, user need
                             to send the request again. With EDA, once the message is in SQS, even if
                             Lambda fails, SQS will automatically retry
                                       Event Driven Architecture (Advanced)
                                                        Cloud With Raj
                                                        www.cloudwithraj.com
Application
                                                                                        SNS
                    API Gateway
        Based on values in the message, EventBridge              Based on values in the message, SNS can fire
                 can fire different targets                                    different targets
                                                 EC2                                 EC2
                                               Webserver                           Webserver
 APPLICATION LAYER
                                                             Internal ALB
                                           Availability Zone 1               Availability Zone 2
                      Auto Scaling Group
                                                 EC2                                 EC2
                                               Appserver                           Appserver
        DATABASE
                                                              Database
                                                            Amazon Aurora
1. First layer is presentation layer. Customers consume the application using this layer.
Generally, this is where the front end runs. For example - amazon.com website. This is
implemented using an external facing load balancer distributing traffic to VMs (EC2s) running
webserver.
2. Second layer is application layer. This is where the business logic resides. Going with the
previous example - you browsed your products on amazon.com, and now you found the product
you like and then click "add to cart". The flow comes to the application layer, validates the
availability, and then creates a cart. This layer is implemented with internal facing load
balancer and VMs running applications.
3. The last layer is the database layer. This is where information is stored. All the product
information, your shopping cart, order history etc. Application layer interacts with this layer
for CRUD (Create, Read, Update, Delete) operations. This could be implemented using one or
mix of databases - SQL (e.g. Amazon Aurora), and/or NoSQL (DynamoDB)
Lastly - why is this so popular in interviews? This architecture comprised of many critical
patterns - microservices, load balancing, scaling, performance optimization, high availability,
and more. Based on your answers, interviewer can dig deep and check your understanding of
the core concepts.
How Many Data Centers in One Availability
                Zone?
                                                                    Cloud With Raj
Incorrect Answer:                                                   www.cloudwithraj.com
One Availability Zone means one data center
1 Availability Zone
  Correct Answer:
  An AWS availability zone (AZ) can contain multiple data centers. Each zone is usually backed
  by one or more physical data centers, with the largest backed by as many as five.
   Is AWS Lambda Cheaper than Amazon EC2?
                                       Cloud With Raj
                                       www.cloudwithraj.com
   Incorrect Answer:
   Yes, AWS Lambda is cheaper than Amazon EC2
                                                                                                s
Correct Answer:
It depends on the application. Both Lambda and EC2 have different cost factors (see above).
It is possible that, depending on the application, AWS Lambda can have a higher charge than
EC2, and vice versa. it is important to consider not just the compute cost but the TCO (Total
Cost of Ownership). With AWS Lambda there is no AMI to maintain, patch, and rehydrate,
reducing management overhead and hence overall TCO.
                                   RPO Vs RTO
                                       Cloud With Raj
                                       www.cloudwithraj.com
 Candidates are sometimes confused by RPO thinking it's measured in unit of data, e.g.
 gigabyte, petabyte etc.
 Correct Answer:
 Both RPO and RTO are measured in time. RTO stands for Recovery Time Objective and is
 a measure of how quickly after an outage an application must be available again. RPO, or
 Recovery Point Objective, refers to how much data loss your application can tolerate.
 Another way to think about RPO is how old can the data be when this application is
 recovered i.e. the time between the last backup and the disaster. With both RTO and
 RPO, the targets are measured in hours, minutes, or seconds, with lower numbers
 representing less downtime or less data loss. RPO and RTO can be and often have
 different values for an application.
                                   IP Address Vs. URL
                                                  Cloud With Raj
                                                  www.cloudwithraj.com
                                                                                Virtual
                                                                                Machine
                                                                               (E.g. EC2)
                                                                               IPAddress1
                                                                               192.50.20.12
                                                                                Virtual
                                                                                Machine
                                  Load Balancer
                                                                               (E.g. EC2)
                                                                                Virtual
                                      DNS                                       Machine
                                  (Domain Name                                 (E.g. EC2)
                                     System)
                                                                              IPAddress1
                            Assigns URL to Load Balancer
                                                                              250.80.10.12
                             (Uniform Resource Locator)
                                                                              (Went Down!!)
             Bad Answer:
             URL is a link assigned to an IP address
             Correct Answer:
             IP Address is a unique number that identifies a device connected to the internet, such as a
             Virtual Machine running your application. However, accessing a resource using this unique
             number is cumbersome; moreover, let's say when a VM comes down (the bottom one in the
             diagram), a new VM comes up to replace it with a different IP address. Hence, in reality,
             application running inside the VM is accessed using URL or Uniform Resource Locator.
             One URL does generally NOT map to one IP address; rather, the URL (e.g., www.amazon.com) is
             mapped to a Load Balancer, and that Load Balancer distributes traffic to multiple VMs with
             different IP addresses. Even if one VM goes down and another comes up, this Load Balancer
             using a URL always works because the Load Balancer appropriately distributes traffic across
             healthy instances. This way, you (the user) do not need to worry about the underlying IP
             addresses.
                                DevOps CICD Phases with Tools
                                                   Cloud With Raj
                                                   www.cloudwithraj.com
Write code             Check-in              Compile code            Integration testing    Deploy artifacts         Logs, metrics, and
                      source code          Create artifacts             Load testing                                       traces
                                             Unit testing                UI testing
                                                                     Penetration testing
Write code             Check-in              Compile code             Integration testing    Deploy artifacts         Logs, metrics, and
                      source code          Create artifacts              Load testing                                       traces
                                             Unit testing                 UI testing
                                                                      Penetration testing
Cloud Implementation
Amazon EKS
Observability
 Scaling
                       Karpenter      AutoScaling
Delivery/Automation
Security
 Cost Optimization
                       CloudWatch    Cost and Usage        Kubecost
                        Container      Report (New
                         Insights     feature - Split
                                     Cost Allocation)
                               Traditional CICD Vs. GitOps
                                                                                                       Cloud With Raj
                                    Traditional CICD                                                   www.cloudwithraj.com
                                                         Container Image
                                                                                                    CD Tool
                                             CI Tool                  CD Tool                      Pushes files
                   Code &
                  Dockerfile                                                                             3
                                  Git Repo             Amazon ECR          2
                                                                                                                    Amazon EKS
                                     1                                               Manifests
                                                                                    updated with
                  Manifests                                                      container image tag
                                                                                                        C
                                                GitOps                                             Checks for           GitOps Tool Installed
                                                                                                   difference                in Cluster
                                                         Container Image                         between cluster
                                                                                                     and Git
                                             CI Tool                   CD Tool
                   Code &
                  Dockerfile
                                  Git Repo             Amazon ECR                                     Pulls in
                                                                                                                 Amazon EKS
                                                                                                   changed files
                                     A                                     B         Manifests
                                                                                    updated with
                  Manifests                                                      container image tag
Traditional DevOps
Step 1: Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins)
kick off, build the container image and save the image in a container registry such as Amazon ECR.
Step 2: CD tools (e.g. Jenkins) update the deployment manifest files with the tag of the container image.
Step 3: CD tools (e.g. Jenkins) execute the command to deploy the manifest files into the cluster, which, in terms,
deploys the newly built container in the Amazon EKS cluster.
Conclusion - Traditional CICD is a push based model. If a sneaky SRE changes the YAML file directly in the cluster (e.g.
changes number of replica, or even the container image itself!), the resources running in the cluster will deviate from
what's defined in the YAML in the Git. Worse case, this change can break something, and DevOps team need to rerun
part of the CICD process to push the intended YAMLs to the cluster
GitOps
Step A: Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins)
kick off, build the container image and save the image in a container registry such as Amazon ECR.
Step B: CD tools (e.g. Jenkins) update the deployment manifest files with the tag of the container image.
Step C: With GitOps, Git becomes the single source of truth. You need to install a GitOps tool like Argo inside the cluster
and point to a Git repo. Git keeps checking if there is a new file, or if the files in the cluster drifts from the ones in Git.
As soon as YAML is updated with new container image, there is a drift between what's running in the cluster vs what's in
Git. ArgoCD pulls in this updated YAML file and deploys new container.
Conclusion - GitOps does NOT replace DevOps. As you can see GitOps only replaces part of the CD process. If we think
about the previous scenario where the sneaky SRE directly changes the YAML in cluster, ArgoCD will detect the mismatch
between the changed file vs the one in Git. Since there is a difference, it will pull in the file from Git and bring
Kubernetes resources to it's intended state. And don't worry, Argo can also send a message to the sneaky SRE's manager
;).
                            Platform Team and Developer Team
                                                                                                              Cloud With Raj
                                                                                                              www.cloudwithraj.com
                                                                    Container Image
                                                        CI Tool                  CD Tool
                            Code &
                                                                                                                       CD
                                                                                                                                      6
                           Dockerfile
                                             Git Repo             Amazon ECR                               Co               To
                                                                                                             nt                  ol
                                                                                                                 ain
                                                    4                                 5        Manifests
                                                                                                                    er
                                                                                                                         de
                                                                                                                            pl
                                                                                              updated with                    oy
                           Manifests                                                                                            ed
                                                                                           container image tag
                                                                                                                                          Amazon EKS
                     Requests
                  Infrastructure
 Developer                                                                                                                       3
                    1
                                                        2
                                        Ticketing                                          Infra as Code (IaC)
                                                             Platform Team
                                         System                                           (Terraform, CDK etc.)
Recently, the term "platform team" has been floating around plenty. But what do platform team do? How are they
different from the developer team? Let's understand with the diagram below:
Step 1: The developer team requests the Platform team to provision appropriate AWS resources. In this example, we are
using Amazon EKS for the application, but this concept can be extended to any other AWS service. This request for AWS
resources is typically done via the ticketing system.
Step 3: The platform team uses Infrastructure as Code (IaC), such as Terraform, CDK, etc., to provision the requested
AWS resources, and share the credentials with the Developer team.
Step 4: The developer team kicks off the CICD process. We are using a container process to understand the flow.
Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins, GitHub
actions) kick off, build the container image and save the image in a container registry such as Amazon ECR.
Step 5: CD tools (e.g. Jenkins, Spinnaker) update the deployment manifest files with the tag of the container image.
Step 6: CD tools execute the command to deploy the manifest files into the cluster, which, in terms, deploys the newly
built container in the Amazon EKS cluster.
Conclusion - The platform team takes care of the infrastructure (often with the guardrails) appropriate for the
organization, and the developer team uses that infrastructure to deploy their application. The platform team does the
upgrade and maintenance of the infrastructure to reduce the burden on the developer team.
                            Scaling Difference Between Lambda, EC2, EKS
                                                   Cloud With Raj
                                                   www.cloudwithraj.com
Auto Scaling Group
                                                                  EC2 Scaling: You need to use a Auto Scaling Group (ASG) and define on
                                                                  what EC2 metric you want it to scale e.g. CPU utilization. You can use
                                                                  ASG's "minimum number of instances" to run certain number of instances
                                                                  on all time. Recently ASG also supports scheduled scaling, and warm pool.
                      EC2         EC2      EC2
                                 EKS
Auto Scaling Group
                                                                  EKS Scaling: EKS scaling is the most complex scaling. You may or may
                                                                  NOT use a Auto Scaling Group but it does NOT work like regular EC2
                                                                  scaling. Please refer to the next page to learn about EKS
                                                                  (Kubernetes/K8s) scaling in detail
                      EC2         EC2      EC2
How Does Kubernetes Worker Nodes Scale?
                   Worker VM                            Worker VM
                      1                                      2
                                                                 Node Autoscaler
                       Pending
                    Unschedulable
3 4
Incorrect Answer:
Set Auto Scaling Groups to scale at a certain VM metric utilization like scaling regular VMs.
Correct Answer:
Step 1: You configure HPA (Horizontal Pod Autoscaler) to increase the replica of your pods at a
certain CPU/Memory/Custom Metrics threshold.
Step 2: As traffic increases and the pod metric utilization crosses the threshold, HPA
increases the number of pods. If there is capacity in the existing worker VMs, then the
Kubernetes kube-scheduler binds that pod into the running VMs.
Step 3: Traffic keeps increasing, and HPA increases the number of replicas of the pod. But
now, there is no capacity left in the running VMs, so the kube-scheduler can't schedule the
pod(yet!). That pod goes into pending, unschedulable state
Step 4: As soon as pod(s) go to pending unschedulable state, Kubernetes node scalers (such as
Cluster Autoscaler, Karpenter etc.) provisions a new node. Cluster Autoscaler requires an Auto
Scaling Group where it increases the desired VM count, whereas Karpenter doesn't require an
Auto Scaling Group or Node Group. Once the new VM comes up, kube-scheduler puts that
pending pod into the new node.
                                                      Gen AI 4 Layers
                                                           Cloud With Raj
                                                           www.cloudwithraj.com
                                           EASY
                                                             Applications
                                                      (E.g. Adobe Firefly, LLM
                                                             LLM Models
                                                      (E.g. Open AI, Anthropic)
                                                            Silicon Chips
                                          HARD          (E.g. AMD, NVIDIA)
Gen AI hype is at an all-time high, and so is the confusion. What do you study, how do you think about it,
and where are the most jobs? These are the burning questions in our minds. Gen AI can be broken down into
the following four layers.
1. The bottom layer is the hardware layer, i.e., the silicon chips that can train the models. Example - AMD,
NVIDIA
2. Then comes the LLM models that get trained and run on the chips. Examples are Open AI, Anthropic etc.
3. Then comes infrastructure providers who provide an easier way to consume, host, train, inference the
models. Example is AWS. This layer consists of managed services such as Amazon Bedrock, which hosts pre-
trained models, or provision VMs (Amazon EC2) where you can train your own LLM
4. Finally, we have the application layer which uses those LLMs. Some examples are Adobe Firefly, LLM
chatbots, LLM travel agents etc.
Now, the important part - as you go from the bottom to the top, the learning curve gets easier, and so does
the opportunity for new market players to enter. Building new chips requires billions of dollars of investments,
and hence, it's harder for new players to enter the market. The most opportunities are in the top two
layers. If you already know the cloud, then integrating Gen AI with your existing knowledge will increase your
value immensely. If you are working in DevOps, learn MLOps; if you know K8s/Serverless, learn how you can
integrate Gen AI with those; if you work in an application; integrate with managed LLM services to enhance
functionality, you got the idea!
Prompt Engineering Vs RAG Vs Fine Tuning
                            Cloud With Raj
                            www.cloudwithraj.com
                   Prompt Engineering
                            1
                                Prompt
Subpar Response
                            2
                                Enhanced Prompt
                                                          Amazon Bedrock
                                Better Response            (Hosts LLM)
      1. You send a prompt to the LLM (hosted in Amazon Bedrock in this case), and
      get a response which you are not satisfied with
      2. You enhance the prompt, and finally come up with a prompt that gives desired
      better response
Embeddings
                                                              1
                                                                        Company data
                                 Vector Database
Fine Tuning
                           Response
                                               Fine Tuned LLM                        Base LLM
                                                     Embeddings
                                                           1
            RAG (Retrieval Augmented Generation) is used where the response can be made better by using
            company specific data that the LLM does NOT have. Amazon BedRock makes it very easy to do
            RAG. Below are the steps:
            1. You store relevant company data into a S3 bucket. Then from BedRock Knowledge bases, you
            select an emedding LLM (Amazon Titan Embed or Cohere Embed) which converts the S3 data into
            embeddings (vector). Knowledge Base can also create a serverless vector store for you to save
            those embeddings. Alternatively you can also bring your own Vector Database (OpenSearch,
            Aurora, Pinecone, Redis)
            2. User gives a prompt which can be made better by adding company specific info
            3. A process (code/jupyter notebook/application) converts the prompt into vector and then search
            the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG)
            and returned
            4. The original prompt is AUGMENTED (Second part of RAG) with this company specific info and
            sent to another Bedrock LLM
            5. BedRock LLM GENERATES (Last part of RAG) the response and sends back to the user
                                       EKS Upgrade Simplified Using Karpenter
                                                                       Cloud With Raj
                                                                       www.cloudwithraj.com
EKS upgrade can be tedious. But Karpenter can automatically upgrade your Data Plane worker nodes reducing your burden. Here's
how:
a. EKS Optimized AMI IDs are listed in AWS Systems Manager parameter store. CNCF project Karpenter, the next gen cluster
autoscaler, periodically checks this, and reconciles with the running worker nodes to see if they are running with the latest EKS-
Optimized AMI for the particular EKS version. In this case, let's assume EKS is running with v1.27.
b. At a certain point, EKS releases next version 1.28. And the below workflow takes place:
 1. Admin upgrades the EKS control plane to v1.28.
 2. Following the previous logic, Karpenter retrieves the latest AMI for v1.28 and check if worker nodes are running with those.
They are NOT, so a Karpenter Drift is triggered.
 3. To fix this Drift, Karpenter automatically updates the worker nodes to v1.28 AMIs. And it does it using rolling deployment
(i.e. a new node comes up, existing node cordoned and drained, then terminated). It also respects the Kubernetes eviction API
parameters, such as maintaining PDB.
If you want to know the process in detail, including with custom AMIs, check out the Karpenter drift blog -
https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/
                              Container Lifecycle - Local to Cloud
                                                         Cloud With Raj
                                                         www.cloudwithraj.com
                                          Container Image
                              docker build              docker run                   docker push                kubectl apply
1. Developer writes code, and associated Dockerfile to containerize the code in her local machine
 2. She uses "Docker build" command to create the container image, in her local machine. At this point
 container image is saved in the local machine
 3. Developer uses "Docker run" command to run the container image, and test out the code running from the
 container. Developer can repeat Steps 1-3, till the testing goes as per the requirements
 4. Next, developer runs "Docker push" command to push the container image from the local machine to a
 container registry. Some examples are DockerHub, or Amazon ECR.
 5. Finally, using "Kubectl apply" command, an YAML manifest which has the URL of the container image from
 the Amazon ECR, is deployed into the running Kubernetes cluster.
 Note that, this is to understand the lifecycle of the container. In real-world, after testing is done in local
 machine, the following steps are automated. Refer to the "Traditional CICD vs GitOps" page for that workflow
             Gen AI Multi Model Invocation
                                Cloud With Raj
                                www.cloudwithraj.com
API Gateway
                                EventBridge
 Based on values in
   the message,
EventBridge can fire
 different targets
                       Rule 1                          Rule 3
                                        Rule 2
              file1                     file1
                                                                      file1                           file1
Local Machine
Git and GitHub for Beginners Crash Course (Click on the YouTube icon):
                                         DevSecOps Workflow
                                                      Cloud With Raj
                                                      www.cloudwithraj.com
                                                                                                   3
                                        2                                                     Get IP      DNS root name server
                                                                                            address (LB
                                   Checks local
                                                                                              of URL
                                     machine      If NOT cached                              frontend)
                                    caches for
                                   IP Address
                                   of the URL
                                                                                                          Name server for .com TLD
DNS Resolver
1   Types www.amazon.com
                                                                             4
                                                                                                            Amazon Route 53
                                                                  Obtained IP address
The AWS Well-Architected Framework describes key concepts, design principles, and architectural best practices
for designing and running workloads in the cloud. By answering a few foundational questions, learn how well your
architecture aligns with cloud best practices and gain guidance for making improvements. AWS Well-Architected
Framework assessment questions can be answered inside your AWS account, and then a scorecard can be obtained
that evaluates your application in respect to the below six pillars
The operational excellence pillar focuses         The security pillar focuses on protecting   The reliability pillar focuses on
on running and monitoring systems, and            information and systems. Key topics         workloads performing their intended
continually improving processes and               include confidentiality and integrity of    functions and how to recover quickly
procedures. Key topics include                    data, managing user permissions, and        from failure to meet demands. Key
automating changes, responding to                 establishing controls to detect security    topics include distributed system
events, and defining standards to                 events.                                     design, recovery planning, and adapting
manage daily operations.                                                                      to changing requirements.
The performance efficiency pillar                 The cost optimization pillar focuses on     The sustainability pillar focuses on
focuses on structured and streamlined             avoiding unnecessary costs. Key topics      minimizing the environmental impacts of
allocation of IT and computing                    include understanding spending over         running cloud workloads. Key topics
resources. Key topics include selecting           time and controlling fund allocation,       include a shared responsibility model for
resource types and sizes optimized for            selecting resources of the right type       sustainability, understanding impact, and
workload requirements, monitoring                 and quantity, and scaling to meet           maximizing utilization to minimize
performance, and maintaining efficiency           business needs without overspending.        required resources and reduce
as business needs evolve.                                                                     downstream impacts.
                     AWS Migration Tools
                             Cloud With Raj
                             www.cloudwithraj.com
Amazon S3
AWS Cloud
                                                                                                 Geolocation/Latency Routing
Route 53
    Region A                                                                                       Region B
            Availability Zone                              Availability Zone                             Availability Zone                                Availability Zone
      VPC                                                                                            VPC
                                                                      DynamoDB global
                                                                      table replication
                                                                                              Amazon EKS
                                Container Image
                                                                                             (Control Plane)
1. Developer creates the container image which gets deployed on Amazon EKS.
      2. The container image runs inside Kubernetes Pod. Pod is the minimum deployable unit in Kubernetes.
      A container can NOT run without being inside a pod
3. Pod runs inside EC2 worker nodes. There can be multiple pods running inside one EC2.
      4. Easy way to remember this is "NPC" (like in video games) = Node - Pod - Container in that order
      i.e. Node hosts one or many pods. One pod hosts one or many containers
                            Batch Workloads on AWS
                                         Cloud With Raj
                                         www.cloudwithraj.com
                                                   Amazon ECR
                                                                Amazon ECS
                                                                             Fargate     EC2
       Amazon EventBridge           AWS Batch
           Scheduler
                                                                Amazon EKS
1. Use EventBridge scheduler to schedule jobs. You can also trigger jobs based on EventBridge rules
which give you great flexibility and power!
2. Then the batch job is submitted via AWS Batch. Obviously you are thinking, I see container image
and then ECS/EKS, why can't I submit a container job directly from EventBridge without AWS
Batch. it's because AWS Batch maintains job queues, restart, resource management etc. that is not
available if you skip it.
4. The job running in a container gets submitted in either ECS or EKS. AWS Batch supports both EC2
and Fargate
                                                  STAR Interview
                                                         Cloud With Raj
                                                         www.cloudwithraj.com
S T A R
One of the biggest mistake people make in behavioral interviews is, they keep on saying "we" - "we did this task", "we came up
with a plan", "we completed step X". This may sound weird - in Amazon we expect you to be a team player, but in interview we
want to know precisely what YOU did. Think of this as a player selection in your superbowl team. Yes, we want the receiver to
play well with the team, but while selecting, we only care about his stats and his abilities. So make sure to clarify what part
YOU played in your answers.
Next biggest mistake people do is they talk in hypotheticals. When a question starts with "Tell me a time when you..", you
must give example from your past projects. If you just answer in hypotheticals such as "I will do this, that..", you will fail the
interview.
A: I migrated our project to AWS Serverless after considering K8s and EC2. We coded the lambda, tested it, implemented
DevOps then deployed into prod and it was a huge success.
Is the above answer good or bad? It's quite bad, why?
• Situation and Task not described
• “We” - what actions did YOU perform?
• “Huge Success” - No Data, Very Subjective
Task - As a lead architect/developer/techlead, I was tasked to find out suitable AWS solution, within the timeframe given.
Action - I researched possible ways to run Microservices on AWS. I narrowed it down to three options - run each
microservice on vanilla EC2, or run on K8s using EKS, or Serverless. I took one of the microservices and did POC on Vanilla
EC2, EKS and Lambda-API Gateway. While they all did the job, I found that with EC2 I have to take care of making it HA by
spinning multiple EC2 in multiple AZs, and there is overhead of AMI rehydration. EKS seems to be a valid solution. However,
given the traffic patterns, we have to pay more than necessary. There is also an overhead of training the team on K8s.
Lambda-API Gateway is inherently HA, scalable, and pay what we use and no server to manage at all. This simplifies our day 2
operational overhead and let us focus on delivering business value.
Result - Based on all the POC data of performance, cost and time to deploy, I selected Serverless solution. We converted
rest of the microservices to Lambda and implemented in production within 3 months. It resulted in over 90% cost savings
over EC2 and K8s. I shared my project learnings with other teams and showed them how to code Lambda so they can utilize it
as well. I got recognized by CIO for this effort.
Note that you will get follow-up questions on the answer to understand your depth and to make sure you are not just copying
and pasting a generic answer from the Internet 😅.
                               Hybrid Cloud Architecture
                                          Cloud With Raj
                                          www.cloudwithraj.com
Hybrid means AWS and Data Center working together for the application
User
ALB Route 53
                                           Site to Site
                                               VPN
Underutilized Nodes
    kind: NodePool
    spec:
2
      disruption:
        consolidationPolicy: WhenEmptyOrUnderutilized
Worker VM Worker VM