resume
Saturday, July 12, 2025 10:56 PM
AWS S3 (Amazon Simple Storage Service)
What is it?
Amazon S3 is a cloud-based object storage service.
You can store and retrieve any amount of data, at any time, from anywhere on the web.
Think of it as an infinite USB drive on the cloud.
✅ Key Concepts:
Term Meaning
Bucket Container to store your files (like a folder)
Object Any file (image, video, CSV, etc.) stored in a bucket
Key Unique name (path) for each object in the bucket
Region Geographical location of your bucket
✅ Features:
• Unlimited Storage
• Versioning – Keep track of file versions
• Secure – Supports IAM, bucket policies, encryption
• Public or Private access
• Lifecycle Rules – Auto-delete or move files (e.g., to Glacier)
✅ Common Use Cases:
• Store data for ML/DL projects
• Host static websites
• Store logs, backups, datasets
• Trigger events with Lambda when a file is uploaded
AWS Elastic Beanstalk
What is it?
Elastic Beanstalk is a Platform-as-a-Service (PaaS) offered by AWS.
PaaS is a cloud-based platform that provides developers with everything they need to build, deploy,
and manage applications — without worrying about servers, storage, or infrastructure.
It lets you deploy and manage web apps without worrying about the infrastructure (servers,
load balancers, scaling, etc.).
✅ Key Features:
Feature Explanation
machine learning Page 1
Feature Explanation
Managed Service AWS manages EC2, load balancers, autoscaling, etc.
Supports many languages Python, Java, Node.js, PHP, .NET, etc.
Fast Deployment Just upload your code — Beanstalk handles everything
Monitoring Integrated with CloudWatch for health and logs
Environment Config Customize EC2 type, scaling rules, environment variables
Example Use Case (Python Web App):
1. You write a Flask/Django app.
2. Zip the project folder and upload to Elastic Beanstalk.
3. AWS:
○ Launches an EC2 instance
○ Deploys your app
○ Sets up a load balancer
○ Creates logs and health checks
Deployment Workflow:
markdown
CopyEdit
1. Code →
2. Zip →
3. Upload to Beanstalk →
4. Deployed with environment →
5. Scalable Web App running ✅
Amazon EC2 (Elastic Compute Cloud)
What is EC2?
EC2 = Virtual machine (server) in the AWS cloud.
You can launch, configure, and manage servers on demand.
Think of EC2 as a computer in the cloud that you can start, stop, and run any app on (like a
personal VPS).
✅ Why Use EC2?
Use Case Example
Host a web app Django, Flask, Node.js on a VM
Run ML models/scripts Train or infer models on GPU machines
Set up custom environments Use your own OS, tools, configs
Batch jobs Run data processing or ETL scripts
machine learning Page 2
✅ Key EC2 Concepts
Concept Meaning
AMI Amazon Machine Image = OS template (like Ubuntu, Amazon Linux)
Instance Type Defines power (CPU, RAM, GPU). Eg: t2.micro, g4dn.xlarge
EBS Volume Storage attached to your instance (like a hard disk)
Key Pair SSH key to securely access your EC2 via terminal
Security Group Firewall rules (like opening port 22 for SSH or 80 for web app)
Elastic IP Static public IP for your instance
Example: Deploy a Python Flask App
1. Launch EC2 (Ubuntu, t2.micro, free tier)
2. SSH into EC2:
bash
CopyEdit
ssh -i "key.pem" ubuntu@your-ec2-ip
3. Install Python, Flask, Git, etc.
4. Run your app:
bash
CopyEdit
python app.py
5. Open port 5000 in Security Group to view app in browser
Basic EC2 Lifecycle
mathematica
CopyEdit
Launch → Configure (AMI, instance type, SG) → Connect (SSH) → Run app → Stop/Terminate
Billing Note:
You are charged by the second/minute/hour based on:
• Instance type
• Storage
• Data transfer
Free Tier: t2.micro 750 hrs/month for 1 year
Interview Highlights (say these!):
• “I used EC2 to deploy and test Python/ML applications in a custom environment.”
• “I configured security groups, key pairs, and used SSH to manage the server.”
• “EC2 gives me full flexibility over OS, packages, and environment.”
Amazon Redshift
machine learning Page 3
Amazon Redshift
What is it?
Amazon Redshift is a fully managed, cloud-based Data Warehouse
used to store and analyze huge volumes of data using SQL.
Think of it as a super-fast cloud database for analytics.
✅ What Makes Redshift Special?
Feature Meaning
Columnar Storage Stores data column-wise, not row-wise → faster for analytical
queries
Massively Parallel Processing Splits queries across multiple nodes for speed
(MPP)
SQL-Compatible Works with standard SQL + supports PostgreSQL syntax
Can store petabytes of data, scale as needed
✅ Key Use Cases
Use Case Example
Business Intelligence (BI) Dashboards, PowerBI/Tableau
Analytics Query millions of rows of sales, traffic, etc.
Data Warehousing Store processed data from ETL pipelines
Machine Learning Input Aggregate/query features from large datasets
Basic Architecture
java
CopyEdit
Data Sources (S3, RDS, etc.)
↓
AWS Glue / ETL Tools
↓
Redshift Cluster
↓
Analytics / BI Tools
Amazon ECR (Elastic Container Registry)
What is ECR?
Amazon ECR is a fully managed Docker container registry by AWS.
You can store, manage, and pull Docker images securely in the cloud.
Think of ECR like GitHub for Docker images.
machine learning Page 4
Think of ECR like GitHub for Docker images.
✅ Why Use ECR?
Feature Benefit
Stores Docker images So you can reuse them across EC2, ECS, EKS, etc.
Secure Integrated with IAM roles and policies
Versioned Each image has tags (e.g., :latest, :v1)
⚡ Fast Optimized for AWS services
✅ Typical Workflow
markdown
CopyEdit
1. Build Docker Image locally
2. Push to Amazon ECR
3. Pull from ECR in EC2/ECS/EKS
bash
CopyEdit
aws ecr get-login-password | docker login --username AWS --password-stdin
<account_id>.dkr.ecr.<region>.amazonaws.com
bash
CopyEdit
docker build -t my-app .
bash
CopyEdit
docker tag my-app:latest <account_id>.dkr.ecr.<region>.amazonaws.com/my-app
bash
CopyEdit
docker push <account_id>.dkr.ecr.<region>.amazonaws.com/my-app
✅ Common Use Cases
Use Case Example
Deploy ML models Store inference container images
Host web apps Upload web server image to ECR
Use with ECS / Fargate Auto-pull image from ECR for containers
machine learning Page 5
Use with ECS / Fargate Auto-pull image from ECR for containers
CI/CD pipelines Integrate ECR with CodePipeline / Jenkins
Docker
What is Docker?
Docker is a containerization platform that lets you package your app + dependencies into a single unit
called a container.
Think: A Docker container is like a sealed box that runs your code the same way anywhere —
on your laptop, server, or cloud.
✅ Why Use Docker?
Feature Benefit
Consistency Works the same on any machine
Lightweight Containers share OS kernel → less overhead
Portable Easy to move between systems or cloud
Reproducible Keeps your environment and versions locked
Fast Deployment Faster than virtual machines (VMs)
Term Meaning
Image Template for container (e.g., Ubuntu + Python + Flask)
Container Running instance of an image
Dockerfile Script to build an image
Docker Hub Online registry to push/pull Docker images
Volume Storage for persisting data outside the container
Port Mapping Exposes container port to your local machine
Basic Docker Commands
Task Command
Build image docker build -t myapp .
Run container docker run -p 5000:5000 myapp
List containers docker ps
Stop container docker stop <container_id>
See image list docker images
Remove container/image docker rm / docker rmi
✅ Docker in ML Projects
machine learning Page 6
✅ Docker in ML Projects
Usage Example
Package ML model + dependencies scikit-learn, numpy, flask
Deploy in EC2 / ECR / ECS Use same image everywhere
Reproducible experiments Lock Python, CUDA, packages
CI/CD pipelines Docker used in testing and deploy step
Docker vs VM
Feature Docker Virtual Machine
OS Shares host OS Has full guest OS
Size Lightweight (MBs) Heavy (GBs)
Boot time Seconds Minutes
Portability High Medium
GitHub
What is GitHub?
GitHub is a cloud-based platform that hosts Git repositories — where you can store, manage, and share
your code.
Think of GitHub as a social network + cloud storage for code, powered by Git.
✅ Why Use GitHub?
Feature Purpose
Version Control Track changes in your code over time
Collaboration Work with teams, manage branches, pull requests
Portfolio Show your projects to recruiters
CI/CD Integration Automate testing, deployment (e.g., with GitHub Actions)
Open Source Explore or contribute to open-source projects
Common Git Workflow (Git + GitHub)
csharp
CopyEdit
1. git init # Initialize local repo
2. git add . # Stage changes
3. git commit -m "msg" # Commit changes
4. git remote add origin <repo-URL>
5. git push -u origin main # Push to GitHub
Example Git Commands (Must-Know)
machine learning Page 7
Example Git Commands (Must-Know)
Action Command
Clone repo git clone <repo-url>
Add changes git add . or git add filename.py
Commit git commit -m "message"
Push changes git push
Create branch git checkout -b new-branch
Merge branch git merge branch-name
Pull latest changes git pull origin main
✅ Key GitHub Features
Feature Use Case
README.md Explain your project (what, why, how)
Issues Track bugs/tasks
Pull Requests Collaborate and suggest changes
Actions Automate CI/CD (test, deploy, etc.)
Fork/Star Explore and support open-source projects
LangChain
What is LangChain?
LangChain is a framework that helps developers build applications using LLMs (like OpenAI's GPT) — by
combining language models with external tools, data, and memory.
Think of LangChain as a smart “p p ” for LLM-powered apps.
✅ Why Use LangChain?
Feature Benefit
Modular Helps chain together multiple components (LLM, tools, DB)
Memory Keeps context across conversations
Document QA Lets LLMs answer questions over files or PDFs
Tools Integration Use search engines, APIs, or even calculators
Agents LLMs that can reason and decide which tools to use
Core Components of LangChain
Component Role
LLMs Language models (like GPT, Claude, LLaMA, etc.)
Prompt Templates Structure the prompts to LLMs
machine learning Page 8
Prompt Templates Structure the prompts to LLMs
Chains Sequence of steps (like prompt → LLM → output → DB)
Memory Store past interactions (like conversation history)
Agents LLMs that use tools based on reasoning
Tools External APIs (search, math, file lookup, etc.)
Retrievers Search through your custom documents
✅ Popular Use Cases
Use Case Description
Chatbots Context-aware virtual assistants
PDF Q&A Ask questions on your uploaded documents
Code Assistants Chain LLM with code runners or compilers
Search-augmented QA Combine GPT with Google or private DBs
Autonomous Agents LLM that decides what action to take next
✅ Extra Tip (Real-world use)
• Combine LangChain + Streamlit to make interactive LLM apps
• Combine LangChain + ChromaDB for fast vector DB search
• Use LangChain Agents to build LLMs that Google, calculate, and think
Ollama
What is it?
Ollama is a tool that allows you to run LLMs (Large Language Models) locally — like LLaMA, Mistral,
Gemma, or Code Llama — without needing the cloud.
Think of Ollama as your personal offline ChatGPT using open-source models.
✅ Key Features
Feature Benefit
⚡ Local Inference Run models directly on your CPU/GPU
Pre-built Models Pull models like llama2, mistral, gemma
Simple CLI/API Use via terminal or integrate with apps easily
Private No data leaves your machine
Custom Models You can create and run your own model file
How It Works
bash
CopyEdit
machine learning Page 9
CopyEdit
# 1. Install
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull a model
ollama pull llama2
# 3. Chat with it
ollama run llama2
It works like Docker but for LLMs: pull → run → chat
✅ Example Use Cases
Use Case Example
Offline Chatbot Run ChatGPT-like model without internet
Private Q&A Ask personal data questions (safely)
Local Agent Build LangChain agent using local LLM
Code Assistant Use CodeLlama for programming help locally
Works Well With
• LangChain: Use Ollama as your llm=Ollama() in LangChain agents
• Local vector DBs: Combine with ChromaDB, FAISS, LLamaIndex
• Streamlit/Gradio: Build frontends for your private assistant
• FastAPI: Turn into a local REST API
Example: LangChain + Ollama (Python)
python
CopyEdit
from langchain.llms import Ollama
llm = Ollama(model="llama2") # or "mistral", "gemma"
response = llm.invoke("Explain PCA in simple terms.")
print(response)
Popular Models in Ollama
Model Purpose
llama2 General chatbot (meta)
mistral Light, fast, good quality
gemma Google's small LLM
codellama Code generation, completion
Interview Highlights (say these!):
• “Ollama lets me experiment with LLMs like LLaMA locally without GPU clusters.”
• “I’ve integrated Ollama with LangChain for building privacy-first LLM pipelines.”
• “It’s useful for offline LLM inference and personal chatbots.”
machine learning Page 10
Memory Trick
Term Think of...
Ollama Offline-Llama (runs LLMs on your laptop)
Pull Download a model (like pulling a Docker image)
Run Launch a chatbot from terminal
What is a Vector Database?
A Vector Database is a special type of database designed to store, index, and search vectors
(embeddings) efficiently.
Think of it as a search engine for meanings, not just keywords.
Why "Vector"?
In AI/NLP/ML, we convert:
• Text
• Images
• Audio
...into vectors (arrays of numbers) using models like BERT, OpenAI embeddings, or CLIP.
These vectors capture semantic meaning, like:
Text Embedding (vector)
"cat" [0.12, 0.98, -0.44, ...]
"kitten" Close vector → similar meaning
"car" Very different vector → different meaning
✅ Why Do We Need a Vector Database?
Traditional DB Vector DB
Exact match (WHERE name = 'cat') Similar match (Which vectors are closest to "kitten"?)
Fast on text/numbers Fast on high-dimensional vectors
Can’t do similarity search Built for similarity search
What Vector DBs Do:
1. Store embeddings (vectors)
2. Search similar vectors given a query vector
3. Return top-k results (semantic search)
✅ Real-World Use Cases
Use Case Example
Semantic Search Search docs/images by meaning
RAG (LLM) Retrieve relevant text chunks for GPT
machine learning Page 11
RAG (LLM) Retrieve relevant text chunks for GPT
Recommendation Suggest products based on user behavior
Document Q&A Ask questions over PDFs using embeddings
Image Similarity Find visually similar images using vectors
Popular Vector Databases
Name Type Use Case
FAISS Local Fast, open-source, no persistence
ChromaDB Local Easy RAG pipelines, used with LangChain
Pinecone Cloud Fully managed, scalable vector DB
Weaviate Cloud/Local With metadata + search filters
Qdrant Cloud/Local Fast and open-source
Typical Workflow in a Vector DB App
markdown
CopyEdit
1. Split documents (PDF, website, etc.)
2. Generate embeddings (OpenAI/BERT)
3. Store vectors in vector DB
4. On user query:
→ Convert query to vector
→ Search DB for top-k similar vectors
→ Return matching docs
Interview Highlights (say these!):
• “Vector DBs store and search embeddings instead of plain text.”
• “They are essential for building fast and meaningful search in RAG systems.”
• “I’ve used FAISS and ChromaDB with OpenAI embeddings for document retrieval.”
Memory Trick
Term Think of...
Vector DB Search based on meaning, not exact text
Embedding A numerical fingerprint of content
Top-k search “Give me 3 most similar documents”
FAISS (Facebook AI Similarity Search)
What is it?
FAISS is a vector database and library developed by Facebook AI.
It helps you store and search high-dimensional vectors quickly and efficiently.
machine learning Page 12
It helps you store and search high-dimensional vectors quickly and efficiently.
Think of FAISS as a search engine for embeddings (like those from OpenAI, BERT, etc.).
✅ Why Use FAISS?
Feature Benefit
⚡ Fast similarity search Finds similar vectors in milliseconds
Memory efficient Can handle millions of vectors
Real-time use Perfect for RAG or chatbot retrieval
Open-source Free and widely used in production
Where FAISS is used?
Use Case Example
RAG with LLMs Retrieve relevant chunks before GPT answers
Image similarity search Find similar images based on embeddings
Document search Vectorize PDF text → retrieve top-k paragraphs
Recommendation systems Similar products/items based on embeddings
How It Works:
1. Convert text to vector (embedding) using a model (e.g., OpenAI, HuggingFace)
2. Store all vectors in FAISS index
3. Query with a vector → FAISS returns similar vectors (nearest neighbors)
✅ Code Example (Text Search)
python
CopyEdit
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
# Step 1: Text → Embedding
texts = ["apple", "banana", "mango", "car", "bike"]
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts)
# Step 2: Create FAISS index
d = embeddings.shape[1] # vector dimension
index = faiss.IndexFlatL2(d)
index.add(np.array(embeddings)) # Add vectors to index
# Step 3: Query
query = model.encode(["fruit"])
D, I = index.search(np.array([query]), k=2)
# Output result
print("Most similar:", [texts[i] for i in I[0]])
machine learning Page 13
Common FAISS Index Types
Index Type Description
IndexFlatL2 Exact nearest neighbor (slow but accurate)
IndexIVFFlat Fast, approximate nearest neighbor
IndexHNSW Graph-based (very fast for large data)
IndexPQ Compressed, memory-efficient
✅ FAISS vs Other Vector DBs
Feature FAISS ChromaDB / Pinecone
Host Local or server Cloud (Chroma, Pinecone)
Speed Extremely fast Fast + scalable
Use Case Fast search Production & distributed use
Persistence Not by default Yes (in Chroma, etc.)
Interview Highlights (say these!):
• “FAISS lets me do fast similarity search on embeddings for NLP and LLM tasks.”
• “I used FAISS in a RAG pipeline to retrieve top-k chunks from documents.”
• “It supports many indexing strategies like IndexFlatL2, IVF, and PQ for speed-memory tradeoffs.”
Memory Trick
Concept Think of...
FAISS Facebook AI Search System
Embedding Numeric representation of meaning
IndexFlatL2 Raw, accurate search
IVF Approximate but fast for large datasets
Great, Supriya! Let’s understand Hugging Face — a very popular platform in the AI world, especially for
NLP, CV, LLMs, and even deploying models. It’s a common term in interviews for AI, MLOps, and GenAI
roles.
What is Hugging Face?
Hugging Face is an AI platform that provides:
• Pretrained models
• Datasets
• APIs and tools for NLP, CV, speech, and generative AI
• Hosting and sharing ML models
Hugging Face is to AI models what GitHub is to code.
✅ What Can You Do with Hugging Face?
machine learning Page 14
Use Case Tool/Library
Use pretrained transformers transformers library
Upload/share models huggingface.co/models
Work with datasets datasets library
Deploy models with API Hugging Face Inference Endpoints
Chat with models Spaces / Transformers Chat
Fine-tune LLMs or CV models Trainer / AutoClasses
Popular Hugging Face Libraries
Library Purpose
transformers Access to thousands of pretrained NLP/vision models
datasets Load, share, and process datasets easily
accelerate Train on multiple GPUs/TPUs
diffusers Generative models (like Stable Diffusion)
peft Low-rank fine-tuning (LoRA, QLoRA) for LLMs
Hugging Face Spaces
• A free platform to deploy models with:
○ Gradio
○ Streamlit
○ HTML/JS
• Just push your app to GitHub and link it to a Hugging Face Space!
Think of Spaces like "mini web apps for AI demos".
✅ Interview Highlights (say these!):
• “Hugging Face provides a hub of pretrained models and tools to make deploying and using AI very
easy.”
• “I’ve used the transformers library to work with models like BERT, ViT, and CLIP.”
• “Spaces let me deploy and demo models using Gradio or Streamlit without heavy infrastructure.”
Memory Trick
Word Think of...
Hugging Face A friendly open-source platform for AI models
Transformers BERT, GPT, ViT, CLIP — pretrained & ready-to-use
Spaces Deploy & show your models like a portfolio
1. TensorFlow
• Developed by Google.
• Used for deep learning (CNNs, RNNs, LLMs).
• Offers low-level control + high-level APIs (like Keras).
• Widely used in production and mobile/edge AI (TF Lite).
✅ Example:
machine learning Page 15
✅ Example:
python
CopyEdit
import tensorflow as tf
model = tf.keras.Sequential([...])
2. Keras
• High-level API built on TensorFlow.
• Lets you build deep learning models easily.
• Good for beginners + prototyping.
✅ Interview line “Keras is the frontend, TensorFlow is the engine.”
3. PyTorch
• Developed by Facebook.
• More Pythonic and flexible than TensorFlow.
• Preferred in research and academia.
• Uses dynamic computation graph (eager execution).
✅ Example:
python
CopyEdit
import torch.nn as nn
model = nn.Sequential(nn.Linear(10, 1))
• Classic machine learning (not deep learning).
• Used for algorithms like SVM, Decision Trees, KNN, PCA, etc.
• Also has preprocessing, pipelines, and model selection tools.
✅ Example:
python
CopyEdit
from sklearn.ensemble import RandomForestClassifier
5. Hugging Face
• Platform + libraries for pretrained models (NLP, CV, speech).
• Use models like BERT, ViT, CLIP, Whisper, etc.
• Includes Transformers, Datasets, Diffusers, Spaces.
✅ Example:
python
CopyEdit
from transformers import pipeline
pipe = pipeline("sentiment-analysis")
6. NumPy
• Fundamental library for numerical computing in Python.
• Used for arrays, matrices, linear algebra, etc.
• Backbone of TensorFlow, PyTorch, scikit-learn.
machine learning Page 16
• Backbone of TensorFlow, PyTorch, scikit-learn.
✅ Example:
python
CopyEdit
import numpy as np
a = np.array([1, 2, 3])
7. Pandas
• Used for data analysis and manipulation.
• Provides DataFrame structure (like Excel or SQL table).
• Excellent for cleaning and preprocessing tabular data.
✅ Example:
python
CopyEdit
import pandas as pd
df = pd.read_csv("data.csv")
✅ Summary Table:
Library Main Use Example Use Case
TensorFlow Deep learning (production) CNN, RNN, TF Lite, deployment
Keras Easy DL modeling (high-level) Quick CNN or MLP modeling
PyTorch Deep learning (research) ResNet, ViT, LSTM in research
Scikit-learn Traditional ML SVM, KNN, Random Forest
Hugging Face Pretrained models BERT, CLIP, ViT, generative AI
NumPy Numerical computation Matrix ops, vectorization, algebra
Pandas Data analysis Cleaning CSVs, EDA, grouping
machine learning Page 17