0% found this document useful (0 votes)

45 views24 pages

Chapter 6

Uploaded by

Steven Ayare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views24 pages

Chapter 6

Uploaded by

Steven Ayare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Chapter 6

MLOPs Infrastructural Components and Issues

--------------------------------------------------------------------------------------------------------------------

Contents:
6.1 Containerization

Containerization is a technology that allows developers to package applications and their dependencies into
standardized units called containers. Containerization is a software deployment process that bundles an application’s
code with all the files and libraries it needs to run on any infrastructure. This approach ensures that the application
runs consistently across different environments, whether it's on a developer's machine, a testing server, or a
production environment. Containers are lightweight, isolated, and share the host operating system's kernel, which
makes them more efficient than traditional virtual machines.

5.1.1 Introduction to Docker

Docker simplifies the process of managing and deploying applications, making it easier to ensure that they run the
same way on a developer's local machine, in staging environments, and in production.

Key Concepts of Docker

1. Containers:
o Containers are lightweight, isolated environments that bundle an application and its dependencies
together.
o They are portable, meaning the same container can run on any platform (e.g., Linux, Windows,
cloud servers).
o Unlike virtual machines, containers share the host OS kernel but are isolated from each other and
the host system.
2. Docker Image:
o A Docker image is a read-only template that contains the instructions for creating a Docker
container.
o It includes everything needed to run an application: code, libraries, environment variables, and
configuration files.
o Docker images can be pulled from a Docker registry like Docker Hub or created locally using a
Dockerfile.
3. Dockerfile:
o A Dockerfile is a text file that contains the instructions to build a Docker image. It specifies the
base image, the software to install, environment variables, and commands to run within the
container.
o Example:

dockerfile
Copy code
FROM python:3.8-slim
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

4. Docker Engine:
o The Docker Engine is the runtime that runs and manages Docker containers. It can be installed on
different platforms and provides the necessary tools to build, run, and manage containers.
5. Docker Compose:
o Docker Compose is a tool used for defining and running multi-container Docker applications. It
allows you to configure multiple containers and services in a single file (usually docker-
compose.yml) and launch them with a single command.
o Example of a docker-compose.yml:

yaml
Copy code
version: '3'
services:
web:
image: my_web_app
ports:
- "5000:5000"
db:
image: postgres:latest

6.1.2 Container Lifecycle

The lifecycle of a container typically involves several stages, which can be managed using commands in
containerization tools like Docker. Here are the key stages of the container lifecycle along with relevant commands:

1. Image Creation:

o Command: docker build

o Description: This command creates a container image from a Dockerfile, which contains
instructions on how to build the image (e.g., base image, dependencies, configuration).

2. Image Listing:

o Command: docker images

o Description: This command lists all the images available on the local machine.

3. Container Creation:

o Command: docker create

o Description: This command creates a new container from an image but does not start it
immediately.

4. Container Starting:

o Command: docker run (also implicitly starts a container created with docker create)
o Description: This command creates and starts a container in one step. It can also include options to
allocate resources, set environment variables, or bind ports.

5. Container Management:

o Starting: docker start [container_id]

o Stopping: docker stop [container_id]

o Restarting: docker restart [container_id]

o Description: These commands manage the running state of a container, allowing you to start, stop,
or restart it as needed.

6. Container Monitoring:

o Command: docker logs [container_id]

o Description: This command retrieves logs from a running or stopped container, helping in
monitoring and debugging.

7. Container Interaction:

o Command: docker exec -it [container_id] /bin/bash

o Description: This command allows you to run commands inside a running container interactively.

8. Container Stopping:

o Command: docker stop [container_id]

o Description: This stops a running container gracefully.

9. Container Removal:

o Command: docker rm [container_id]

o Description: This command removes a stopped container from the system.

10. Image Removal:

o Command: docker rmi [image_id]

o Description: This removes an image from the local system.

11. Container Cleanup:

o Command: docker system prune

o Description: This command removes all stopped containers, unused networks, dangling images,
and build cache, helping to free up disk space.

6.1.3 Docker: An Overview

Docker is an open-source platform that enables developers to automate the deployment, scaling, and management of
applications using containers. It allows applications to be packaged with all their dependencies (such as libraries,
configurations, and binaries) into a container that can run consistently across different computing environments.
Docker simplifies the process of managing and deploying applications, making it easier to ensure that they run the
same way on a developer's local machine, in staging environments, and in production.

Key Concepts of Docker

1. Containers:

o Containers are lightweight, isolated environments that bundle an application and its dependencies
together.

o They are portable, meaning the same container can run on any platform (e.g., Linux, Windows,
cloud servers).

o Unlike virtual machines, containers share the host OS kernel but are isolated from each other and
the host system.

2. Docker Image:

o A Docker image is a read-only template that contains the instructions for creating a Docker
container.

o It includes everything needed to run an application: code, libraries, environment variables, and
configuration files.

o Docker images can be pulled from a Docker registry like Docker Hub or created locally using a
Dockerfile.

3. Dockerfile:

o A Dockerfile is a text file that contains the instructions to build a Docker image. It specifies the
base image, the software to install, environment variables, and commands to run within the
container.

o Example:

dockerfile

Copy code

FROM python:3.8-slim

WORKDIR /app

COPY . /app

RUN pip install -r requirements.txt

CMD ["python", "app.py"]

5. Docker Compose:

o Docker Compose is a tool used for defining and running multi-container Docker applications. It
allows you to configure multiple containers and services in a single file (usually docker-
compose.yml) and launch them with a single command.

o Example of a docker-compose.yml:

yaml

Copy code

version: '3'

services:

web:

image: my_web_app

ports:

- "5000:5000"

db:

image: postgres:latest

How Docker Works

1. Build:

o Using a Dockerfile, developers create an image. This image contains everything required to run an
application.

o The image can be built using the docker build command.

2. Run:

o After the image is built, a container can be launched from the image using the docker run
command.

o The container is an instance of the image and includes the application's code and environment.

3. Ship:

o Docker images can be shared across environments using a registry. Developers can push images to
public or private registries (e.g., Docker Hub, AWS ECR, Google Container Registry), and pull
them to deploy the application in different environments.
4. Scale:

o Containers are lightweight and can be replicated easily to scale an application. Orchestration tools
like Kubernetes or Docker Swarm can be used to manage large-scale container deployments and
provide auto-scaling, load balancing, and service discovery.

Advantages of Docker

1. Portability:

o Containers abstract away the underlying system, ensuring that applications run consistently across
any environment (development, staging, production).

2. Isolation:

o Containers provide isolated environments, preventing conflicts between different applications or

services running on the same system.

3. Resource Efficiency:

o Docker containers share the host OS kernel, making them more lightweight and faster than virtual
machines. They use fewer system resources and are more efficient to run.

4. Simplified Deployment:

o Docker allows you to package everything your application needs (dependencies, libraries,
configurations) into a single container, simplifying deployment and avoiding the “works on my
machine” problem.

5. Version Control and Rollback:

o Docker images can be versioned, allowing teams to keep track of changes and easily roll back to
previous versions of an application.

6. Scalability:

o Containers are ideal for microservices architecture. Multiple instances of containers can be created
and managed to handle increased traffic, and they can be orchestrated using tools like Kubernetes.

Docker Use Cases

1. Microservices:

o Docker is commonly used in microservices architectures to isolate individual services within

containers, making it easier to deploy and scale each service independently.

2. DevOps and CI/CD:

o Docker is an integral part of DevOps and Continuous Integration/Continuous Deployment
(CI/CD) pipelines. It automates the deployment of applications and ensures consistency between
development, testing, and production environments.

3. Testing and Development:

o Developers use Docker to create isolated environments for testing applications. This prevents
dependency conflicts and makes it easier to test across different environments.

4. Cloud and Hybrid Environments:

o Docker containers are widely used in cloud-based environments (e.g., AWS, Azure, Google
Cloud) for running applications in a scalable, efficient manner.

6.1.4 Docker vs Virtual Machines (VMs)

Feature Docker Virtual Machines (VMs)

Resource Efficiency Lightweight, shares host OS kernel Heavyweight, includes its own OS kernel

Startup Time Fast (milliseconds to seconds) Slow (minutes to start)

Isolation Isolated application environments Fully isolated environments with separate OS

Resource Usage Lower overhead, more efficient Higher overhead due to running full OS

Portability High portability across environments Less portable due to OS dependencies

Management Complexity Simple, easy to scale and orchestrate More complex due to OS management

5.2 Container orchestration

Container orchestration refers to the automated management, coordination, and deployment of containerized
applications across a cluster of machines. It involves managing the lifecycle of containers, scaling them based on
demand, balancing loads, ensuring high availability, and handling failures. In short, it helps automate many of the
manual tasks involved in managing containers, particularly when running large-scale containerized applications.

In an environment where multiple containers are running, orchestration tools ensure the containers work together
efficiently, without manual intervention. This is especially important for applications that require high availability,
fault tolerance, and scalability.

Key Functions of Container Orchestration

1. Automated Deployment:
o Orchestrators automate the deployment of containers across multiple machines or nodes in a
cluster.
o Containers are deployed based on a defined configuration (such as Docker Compose files or
Kubernetes YAML manifests).
2. Scaling:
o Orchestrators enable horizontal scaling, where more containers (instances) of an application are
started or stopped based on load or resource utilization.
o If traffic increases, new containers can be spun up automatically, and if the traffic decreases,
unnecessary containers are stopped.
3. Load Balancing:
o Orchestrators distribute traffic across containers to ensure no single container is overloaded.
o Load balancing is vital for ensuring efficient resource usage and maintaining application
performance.
4. Self-Healing:
o Orchestration tools monitor container health and restart any failed containers automatically to
ensure minimal downtime.
o If a container crashes, the orchestrator ensures that a new one is created to replace it.
5. Service Discovery:
o Orchestrators provide automatic service discovery, enabling containers to discover and
communicate with each other even if they are constantly being scaled up or down.
o This is typically achieved through internal DNS systems.
6. Networking:
o Orchestrators manage container networking, ensuring that containers can securely and reliably
communicate with each other, both within the same host and across different hosts in a cluster.
7. Storage Management:
o Orchestrators provide persistent storage solutions, enabling containers to store and retrieve data
even when they are destroyed or rescheduled on different nodes.
8. Configuration Management:
o Orchestrators manage configurations and secrets (e.g., environment variables, database
credentials) for containers to access securely.
9. Rolling Updates:
o Orchestration tools allow for rolling updates, meaning containers can be updated incrementally
without downtime. Old containers are replaced by new ones gradually to ensure that the
application remains available during the update process.
10. Monitoring and Logging:
o Orchestration tools often provide logging and monitoring features to track the performance and
health of the containers, helping identify potential issues early.

Popular Container Orchestration Tools

1. Kubernetes:
o The most widely used container orchestration tool, developed by Google and maintained by the
Cloud Native Computing Foundation (CNCF).
o Kubernetes automates deployment, scaling, load balancing, and management of containerized
applications across clusters.
o It has a rich ecosystem and is supported by most cloud providers (e.g., AWS EKS, Azure AKS,
Google GKE).
o Kubernetes uses pods, which are groups of containers deployed together.

Key Features:

o Self-healing (auto-restarting containers)

o Horizontal scaling and load balancing
o Service discovery and automatic DNS
o Rolling updates and rollback
o Extensive plugin and ecosystem support
2. Docker Swarm:
o Docker Swarm is Docker’s own container orchestration solution.
o It is simpler than Kubernetes and integrates seamlessly with Docker, making it easier for
developers who are already familiar with Docker.
o Docker Swarm supports high availability, service discovery, scaling, and load balancing.

Key Features:

o Native integration with Docker

o Simplified setup compared to Kubernetes
o Load balancing
o Easy scaling and rolling updates
3. Apache Mesos:
o Apache Mesos is a distributed systems kernel that provides resource isolation and sharing across
applications and frameworks.
o It can orchestrate both containerized and non-containerized applications and is used in large-scale
environments.
o Mesos is often used in combination with Marathon as its container orchestration layer.

Key Features:

o High scalability and fault tolerance

o Can manage both containerized and non-containerized workloads
o Advanced resource management
4. Amazon ECS (Elastic Container Service):
o ECS is a fully managed container orchestration service by Amazon Web Services (AWS) that
supports Docker containers.
o ECS integrates tightly with other AWS services like load balancers, EC2 instances, and AWS
Fargate for serverless containers.
o ECS supports automatic scaling, load balancing, and container monitoring.

Key Features:

o Deep integration with the AWS ecosystem

o Managed service (no need to set up the infrastructure)
o Support for both EC2 and Fargate-based containers

5.2 Kubernetes

Kubernetes (often abbreviated as K8s) is an open-source container orchestration platform designed to automate the
deployment, scaling, and management of containerized applications. It was originally developed by Google and is
now maintained by the Cloud Native Computing Foundation (CNCF).

Kubernetes simplifies the complex tasks associated with managing containers in a microservices architecture, such
as scaling, load balancing, service discovery, self-healing, and rolling updates. It is one of the most popular tools for
managing large-scale applications in the cloud or on-premise environments.

Key Features of Kubernetes (K8s)

1. Container Orchestration:

o Kubernetes enables you to manage a fleet of containers running on multiple machines, handling
their deployment, scaling, and networking automatically.

2. Automated Deployment:

o Kubernetes supports automatic deployment and rollback of applications, ensuring that the desired
state of the application is always maintained.

3. Self-Healing:

o Kubernetes automatically restarts failed containers, replaces containers, and reschedules them on
healthy nodes if necessary.

4. Scaling:

o Kubernetes can automatically scale applications up or down based on traffic or resource usage.
This scaling can be done at the container level (horizontal scaling).

5. Load Balancing:

o Kubernetes provides internal load balancing to distribute traffic across containers, ensuring that no
single container is overloaded.

6. Service Discovery and Networking:

o Kubernetes manages internal service discovery, allowing containers to find and communicate with
each other without requiring hardcoded IP addresses. It provides a DNS-based mechanism for
service discovery.

7. Storage Orchestration:

o Kubernetes can mount storage volumes from a variety of sources (local storage, network-attached
storage, cloud-based storage, etc.) and manage persistent data for applications.

8. Automated Rollouts and Rollbacks:

o Kubernetes ensures that applications can be updated or rolled back without downtime, using
rolling updates. It ensures that the system is always in a stable state during updates.

9. Declarative Configuration:

o Kubernetes operates on a declarative configuration model. You specify the desired state of your
system (e.g., the number of replicas of a service, CPU/memory limits), and Kubernetes works to
ensure that the system matches that desired state.

10. Extensibility:

o Kubernetes supports extensions like custom resource definitions (CRDs), enabling users to create
their own resources and automate tasks in Kubernetes clusters.

5.2.1 Architecture of Kubernetes

Kubernetes is an open-source platform that helps you manage, scale, and deploy containerized applications.
Originally developed by Google, it provides a framework to run distributed systems resiliently. Key features of
Kubernetes include:

1. Container Management: Kubernetes can manage containers, ensuring they are running, healthy, and
properly configured.
2. Scaling and Load Balancing: It can automatically scale applications up or down based on demand and
distribute network traffic to maintain performance.
3. Self-Healing: Kubernetes automatically replaces or restarts containers that fail or become unresponsive.
4. Service Discovery and Load Balancing: It can expose a container to the internet or other containers using
a single DNS name or IP address.
5. Automated Rollouts and Rollbacks: Kubernetes can manage the deployment of new versions of
applications, allowing for smooth updates and the ability to revert if issues arise.
6. Configuration Management: It allows for the management of configuration settings, secrets, and
environment variables separately from the application code.

Architecture of kubernetes

 Kubernetes follows a master-slave architecture. Master Node: The master node is the control plane of
Kubernetes. It makes global decisions about the cluster (like scheduling), and it detects and responds to
cluster events (like starting up a new pod when a deployment’s replicas field is unsatisfied).

 Worker Nodes: Worker nodes are the machines where your applications run. Each worker node runs at
least:
- Kubelet is a process responsible for communication between the Kubernetes Master and the node; it
manages the pods and the containers running on a machine.
- A container runtime (like Docker, rkt), is responsible for pulling the container image from a registry,
unpacking the container, and running the application.

The master node communicates with worker nodes and schedules pods to run on specific nodes.

Here are the main components of Kubernetes:

1. Pods: A Pod is the smallest and simplest unit in the Kubernetes object model that you create or deploy. A
Pod represents a running process on your cluster and can contain one or more containers.

2. Services: A Kubernetes Service is an abstraction that defines a logical set of Pods and a policy by which to
access them - sometimes called a micro-service.

3. Volumes: A Volume is essentially a directory accessible to all containers running in a pod. It can be used
to store data and the state of applications.

4. Namespaces: Namespaces are a way to divide cluster resources between multiple users. They provide a
scope for names and can be used to divide cluster resources between multiple users.

5. Deployments: A Deployment controller provides declarative updates for Pods and ReplicaSets. You
describe a desired state in a Deployment, and the Deployment controller changes the actual state to the
desired state at a controlled rate.

A) Master Components
In Kubernetes, the master components make global decisions about the cluster, and they detect and respond to
cluster events. Let’s discuss each of these components in detail.

API Server

The API Server is the front end of the Kubernetes control plane. It exposes the Kubernetes API, which is used by
external users to perform operations on the cluster. The API Server processes REST operations validates them, and
updates the corresponding objects in etcd.

etcd

etcd is a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data. It’s a
database that stores the configuration information of the Kubernetes cluster, representing the state of the cluster at
any given point of time. If any part of the cluster changes, etcd gets updated with the new state.

Scheduler

The Scheduler is a component of the Kubernetes master that is responsible for selecting the best node for the pod to
run on. When a pod is created, the scheduler decides which node to run it on based on resource availability,
constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.
Controller Manager

The Controller Manager is a daemon that embeds the core control loops shipped with Kubernetes. In other words, it
regulates the state of the cluster and performs routine tasks to maintain the desired state. For example, if a pod goes
down, the Controller Manager will notice this and start a new pod to maintain the desired number of pods.

B) Node Components
Kubernetes worker nodes host the pods that are the components of the application workload. The key components of
a worker node include the Kubelet, the main Kubernetes agent on the node, the Kube-proxy, the network proxy, and
the container runtime, which runs the containers. Let’s discuss them in detail.

Kubelet

Kubelet is the primary "node agent" that runs on each node. Its main job is to ensure that containers are running in a
Pod. It watches for instructions from the Kubernetes Control Plane (the master components) and ensures the
containers described in those instructions are running and healthy.

The Kubelet takes a set of PodSpecs (which are YAML or JSON files describing a pod) and ensures that the
containers described in those PodSpecs are running and healthy.

Kube-proxy

Kube-proxy is a network proxy that runs on each node in the cluster, implementing part of the Kubernetes Service
concept. It maintains network rules that allow network communication to your Pods from network sessions inside or
outside of your cluster.

Kube-proxy ensures that the networking environment (routing and forwarding) is predictable and accessible, but
isolated where necessary.

Container Runtime

Container runtime is the software responsible for running containers. Kubernetes supports several container
runtimes, including Docker, containerd, CRI-O, and any implementation of the Kubernetes CRI (Container Runtime
Interface). Each runtime offers different features, but all must be able to run containers according to a specification
provided by Kubernetes.

5.3 Data distribution shifts

Data distribution shift refers to the phenomenon where the statistical properties (distribution) of the data used for
training a machine learning model differ from the data encountered during deployment or inference. These shifts can
lead to a model's performance degradation, as the model was trained on data that no longer reflects the current real-
world scenario. Understanding and addressing data distribution shifts is crucial for maintaining the accuracy and
reliability of machine learning models over time.
Types of Data Distribution Shifts

1. Covariate Shift:
o Definition: The distribution of the input features (independent variables) changes between the
training and test data, but the conditional distribution of the target variable given the input
remains the same.
o Example: In a model predicting house prices, if the distribution of features like square footage,
number of rooms, etc., shifts between training and deployment (e.g., due to changes in the real
estate market), but the relationship between these features and house price stays the same, it is
a covariate shift.
2. Prior Probability Shift:
o Definition: The distribution of the target variable (dependent variable) changes, but the
distribution of the input features stays the same. This can affect the model's predictions if the
model was trained on data with a certain distribution of classes.
o Example: In a binary classification problem where the proportion of positive and negative
samples changes in the test data compared to the training data, but the feature distribution
remains the same, it is a prior probability shift.
3. Concept Shift (Concept Drift):
o Definition: The relationship between the input features and the target variable changes over
time. This is the most challenging type of shift, as the very concept the model is learning has
evolved.
o Example: In fraud detection, the tactics used by fraudsters may evolve over time, causing the
relationship between transaction features and the likelihood of fraud to change.
4. Label Shift:
o Definition: This is a form of prior probability shift where the distribution of the labels (the target
variable) changes, but the conditional distribution of the input features given the labels remains
unchanged.
o Example: In a medical diagnosis task, if the frequency of certain diseases changes over time, but
the relationship between patient characteristics (input features) and diseases (labels) stays the
same, this is a label shift.

Challenges of Data Distribution Shifts

1. Model Degradation:
o If the distribution of the input features or target labels changes, the model may no longer make
accurate predictions, leading to poor performance in real-world settings.
2. Bias:
o Shifts can introduce bias if the model is trained on data that does not represent the current
population or environment, leading to unfair or inaccurate outcomes.
3. Adaptation:
o Models need to adapt to changes in the data distribution to remain accurate. This requires
retraining, fine-tuning, or continual learning strategies.

Handling Data Distribution Shifts

1. Monitoring:
o Continuously monitor model performance in production and compare the data distribution of
incoming data with the training data to detect shifts.
2. Retraining:
o Periodically retrain the model with updated data that reflects the new distribution, ensuring that
the model stays aligned with the current data.
3. Domain Adaptation:
o Techniques like transfer learning and domain adaptation can help models generalize to new
domains when there is a shift in data distribution.
4. Data Augmentation:
o Use techniques like data augmentation to artificially create examples that account for the
expected shifts in the data, making the model more robust to changes.
5. Ensemble Methods:
o Implement ensemble models that combine predictions from multiple models trained on
different distributions, reducing the impact of shifts.

5.4 Model drift

Model drift refers to the deterioration in the performance of a machine learning model over time due to changes in
the underlying data or environment. This is a result of shifts in the relationships between input data and the target
predictions, causing the model's assumptions to no longer hold. Model drift can manifest in different forms, leading
to reduced accuracy and reliability of the model.

Model drift is often caused by factors like changes in user behavior, market conditions, or external events, which
alter the data in ways that the original model wasn't trained to handle.

Types of Model Drift

1. Concept Drift (Real Drift):

o Definition: Concept drift occurs when the relationship between the features and the target
variable changes over time. This means that the underlying concept or the "pattern" the model
learned during training no longer applies.
o Example: A predictive model for customer churn might be trained based on historical data where
factors like customer behavior and service usage patterns were stable. However, if there’s a shift
in customer preferences or a new competitor emerges, the relationship between features (like
customer behavior) and churn (the target) may change.
2. Data Drift (Covariate Drift):
o Definition: Data drift happens when the statistical properties (distribution) of the input features
change over time, but the relationship between input and output remains constant. In other
words, the model's training data no longer represents the current state of the world.
o Example: If the feature distribution of customer demographics changes in a marketing campaign
(e.g., older customers become more frequent, or new products are introduced), the model might
start to perform poorly because it was trained on a different distribution of data.

Causes of Model Drift

1. Changing Data:
o Data characteristics evolve over time due to seasonality, user behavior, market changes, or other
external factors. This causes a shift in the data the model encounters, leading to degraded
performance.
2. Environmental Changes:
o New business conditions, regulatory changes, or technological advancements can alter how data
is generated or how decisions are made, which may cause drift.
3. Conceptual Shifts:
o Changes in the underlying relationships in the data—such as new features becoming more
important or old features becoming less predictive—can lead to concept drift.
4. Feedback Loops:
o In some applications, the predictions of a model can directly influence the data the model later
encounters, creating a feedback loop. For example, if a recommendation system encourages
certain user behaviors, the data it receives will change based on those behaviors, affecting future
model performance.

Impact of Model Drift

1. Decreased Model Accuracy:

o As the data and relationships shift, the model may become less accurate or make incorrect
predictions, resulting in business inefficiencies or poor user experience.
2. Increased Errors:
o In situations like fraud detection, model drift can lead to increased false positives or false
negatives, which could have serious consequences.
3. Unreliable Decision-Making:
o Decisions based on outdated models can lead to suboptimal or even harmful business outcomes,
such as targeting the wrong customer segment or making wrong predictions about product
demand.

Detecting Model Drift

1. Monitoring Model Performance:

o Continuously track the model’s accuracy, precision, recall, and other performance metrics to
identify any degradation over time. This can be done using statistical tests or comparing the
model’s predictions with actual outcomes.
2. Data Distribution Monitoring:
o Use data drift detection techniques to monitor the distribution of incoming data. If the data
diverges significantly from the training data, it could signal a potential drift.
3. Drift Detection Algorithms:
o There are specialized techniques such as ADaptive WINdowing (ADWIN) or Drift Detection
Method (DDM) that can be used to automatically detect when model drift occurs by comparing
the performance of the model over time.

5.5 Feature Store

A Feature Store is a centralized repository for storing, managing, and sharing machine learning (ML) features
across different models and teams within an organization. It is designed to standardize the feature engineering
process, making it easier to reuse, track, and serve features in both the training and production environments. A
Feature Store helps streamline workflows, ensuring that the same features are consistently used in both training and
inference, improving model quality, reproducibility, and scalability.

Key Objectives of a Feature Store

1. Consistency: Ensure the same features are used for both training and serving (inference), preventing
discrepancies in model predictions.
2. Reusability: Allow data scientists to reuse the same features across multiple models and projects,
improving efficiency and reducing redundant work.
3. Collaboration: Facilitate collaboration between data engineers and data scientists by making features
easily accessible and shareable across teams.
4. Data Lineage and Tracking: Track the origin and transformation of features, providing transparency and
reproducibility in the ML lifecycle.
5. Serving Features in Real-Time: Enable features to be served in real-time or batch mode, supporting both
online and offline ML workloads.

Core Components of a Feature Store

1. Feature Repository:
o The central storage where all features are defined, stored, and versioned. This is typically a
database or a managed service designed to hold both raw and processed features.
2. Feature Engineering Pipeline:
o The system or processes used to create and transform raw data into usable features for ML
models. It can include data cleaning, normalization, aggregation, and encoding processes.
3. Feature Serving Layer:
o This layer is responsible for serving features to models in real-time (for online inference) or batch
(for training). It ensures that the correct, up-to-date features are provided to the model at
inference time.
4. Metadata Management:
o Stores information about the features, such as their schema, transformation rules, and lineage. It
helps ensure features are used correctly and consistently.
5. Feature Versioning:
o Tracks and manages changes in the features over time. Versioning is important to maintain
consistency between training and production and to avoid issues arising from feature drift.

Benefits of a Feature Store

1. Reduced Duplication of Efforts:

o Data scientists don’t need to re-engineer the same features repeatedly. They can access pre-
defined features that have been shared across teams or projects.
2. Improved Collaboration:
o A centralized system helps align teams around a shared understanding of features, making it
easier to collaborate and avoid inconsistencies between training and production environments.
3. Faster Model Development:
o By providing ready-to-use, clean, and consistent features, feature stores significantly speed up
the model development process.
4. Consistency Between Training and Inference:
o Feature stores ensure that the same features are used in both the model training and inference
pipelines, reducing the risk of training-serving skew (i.e., inconsistencies between the data used
for training and the data used in production).
5. Model Monitoring and Reproducibility:
o With versioned features and metadata tracking, feature stores enable better model monitoring
and troubleshooting. They also support the reproducibility of experiments and results.

Common Use Cases of Feature Stores

1. Real-Time Predictive Systems:

o For applications like recommendation engines, fraud detection, and customer segmentation,
where features need to be served and processed in real-time.
2. Model Versioning and A/B Testing:
o Feature stores can facilitate A/B testing by managing different feature sets used in experimental
models and tracking their performance.
3. Cross-Model Consistency:
o A feature store ensures that multiple ML models across different teams use the same set of
features, avoiding inconsistency in model training and prediction.

• Promote feature reuse, consistency, and real-time access.

5.5.1 Architecture of Feature store

The diagram represents a Feature Store Architecture in a modern Machine Learning (ML) system, demonstrating
how data flows across various layers and components involved in serving, processing, and managing features for
ML models. Here's a breakdown of the diagram:

1. Data Infrastructure Layer

 Stream Data Sources (Kafka, etc.): This section handles real-time data ingestion from various sources
like Kafka, databases, and other streaming services.
 Batch Data Sources (PostgreSQL, MongoDB, CSV, etc.): This involves handling large-scale, historical
data batches that are stored in data lakes or databases.
 Transformations:
o Stream Transformations (Spark): This involves real-time data processing using Spark (or
similar frameworks) to perform transformations on the incoming data.
o Batch Transformations (Spark): Batch data transformations for feature engineering,
aggregation, or preparation for ML models.
 Online Store (Redis): Real-time feature store used for serving features in production for low-latency
predictions.
 Offline Store (S3, Delta/Iceberg): Stores historical data or preprocessed features that are often used for
training models in batch processes. This data may include features like Delta Lake or Iceberg for better
version control and consistency.

2. Serving Layer

 Feature Serving API: This layer provides APIs to query features required for inference in real-time
models, enabling the ML system to request up-to-date features for prediction.
 Feature Lookup: The process of querying and retrieving features (either from the real-time store like
Redis or offline store like S3) for the real-time model.

3. SDK (Software Development Kit)

 Provides tools for data scientists and engineers to interact with the feature store and query historical or real-
time features directly, through custom scripts or Jupyter notebooks.

4. Application Layer

 Job Orchestrator (Airflow): Airflow is used to manage and orchestrate the scheduling of data pipelines,
ensuring the correct processing and flow of features from source to storage and serving.
 Feature Registry: Manages the metadata for the features, ensuring consistency and traceability across
different versions of features. It helps in registering new features or modifications.
 SQL Metadata Store: Stores metadata about the feature engineering process, transformations, and data
lineage.
 Compliance, Access Control, Logging, and Monitoring: This section ensures governance by managing
access control policies, compliance checks, and continuous monitoring of feature usage. It helps ensure that
features are properly logged and tracked.
6. Control Panel & UI

 Control Panel: Provides an interface for managing features, setting access controls, and monitoring feature
usage.
 Logging & Monitoring: Logs feature usage, performance, and potential issues to maintain operational
integrity and ensure transparency in ML pipelines.

The Architecture of Feature store constitutes following concepts

1. Data Ingestion:

• Collect raw data from various sources, like databases and streaming platforms.

• Transform and preprocess data into feature vectors.

2. Feature Repository:

• Store feature data and metadata in a structured manner.

• Support versioning and traceability for features.

3. Feature Serving Layer:

• Expose APIs for retrieving features in real-time.

• Can include caching and load balancing.

4. Metadata Management:

• Store metadata about features, such as descriptions and data sources.

• Enable discovery and governance.

6. Feature Engineering:

• Support feature engineering pipelines for creating new features.

• May involve transformation and aggregation steps.

6. Model Integration:

• Connect with machine learning models for feature input.

• Provide a consistent interface for model training and serving.

7. Monitoring and Governance:

• Track feature usage and performance.

• Ensure data lineage and compliance with policies.

Feature Stores simplify feature management, enhance model development, and facilitate collaboration in MLOps.
6.6 Cloud Computing

Cloud computing is the delivery of computing services (such as storage, databases, networking, software, and
analytics) over the internet ("the cloud") to offer faster innovation, flexible resources, and cost savings. It eliminates
the need for organizations to own and maintain physical hardware. "the cloud," rather than relying on local servers
or personal computers. Cloud computing offers flexibility, scalability, and remote access, allowing businesses and
individuals to access computing resources on-demand, without the need for physical infrastructure.

The key models of cloud computing are:

1. Infrastructure as a Service (IaaS): Provides virtualized computing resources over the internet (e.g., AWS
EC2, Google Compute Engine).

2. Platform as a Service (PaaS): Offers hardware and software tools for app development without the
complexity of maintaining the infrastructure (e.g., Google App Engine, Microsoft Azure).

3. Software as a Service (SaaS): Delivers software applications over the internet (e.g., Google Workspace,
Salesforce).

Cloud computing allows businesses to scale up or down as needed, ensuring cost-effectiveness and flexibility.

Role of Cloud in MLOps:

 Scalability: Cloud platforms like AWS, Azure, and Google Cloud provide scalable resources to
accommodate the computational demands of MLOps, such as model training, serving, and monitoring.

 Elasticity: Cloud resources can be easily scaled up or down to adapt to varying workloads, allowing for
efficient use of resources and cost savings.

 Managed Services: Cloud providers offer managed services for MLOps components like data storage,
container orchestration, and CI/CD pipelines, reducing the operational overhead.

 Collaboration: Cloud-based collaboration tools and storage enable cross-team cooperation, data sharing,
and project management in MLOps workflows.

 Flexibility: Cloud platforms offer a diverse set of services that can be integrated into MLOps pipelines,
including AI/ML tools and data analytics services.

 Global Reach: Cloud providers have data centers worldwide, supporting global deployment and
accessibility of machine learning models and applications.

The cloud's capabilities are integral to implementing MLOps by providing the necessary infrastructure, tools, and
flexibility for efficient model development, deployment, and management.

5.6 Microservices

Microservices is an architectural style that structures an application as a collection of loosely coupled,

independently deployable services. Each microservice is designed to perform a specific business function and
communicates with other services using lightweight protocols (typically HTTP or message brokers). This approach
contrasts with the monolithic architecture, where all components of the application are tightly integrated and often
deployed as a single unit. •Architectural approach where a large application is broken down into small, independent
services. Each service focuses on a specific function and communicates through APIs. Promotes flexibility,
scalability, and ease of maintenance.

Key Characteristics of Microservices:

1. Independent Deployability: Each microservice can be developed, tested, and deployed independently.
This enables faster release cycles, easier updates, and quicker bug fixes without disrupting the entire
system.
2. Loose Coupling: Microservices are loosely coupled, meaning they are independent of each other. This
reduces the interdependencies, making the system more resilient and flexible.
3. Domain-Driven Design: Microservices align with business domains. Each service encapsulates a specific
business function or domain, such as payment processing, user authentication, or order management.
4. Scalability: Because microservices are independently deployable, they can be scaled horizontally based on
demand. If one service experiences high traffic, it can be scaled independently of others, improving overall
performance.
5. Technology Agnostic: Each microservice can use its own technology stack, allowing teams to choose the
best tools for each specific service. For example, one service might use Python, while another could be
implemented in Java or Go.
6. Fault Isolation: Since services are isolated from each other, failures in one service do not necessarily affect
the others, improving the overall system’s resilience.
7. Continuous Delivery and DevOps: Microservices are often deployed using continuous delivery (CD)
practices and DevOps pipelines. This facilitates automation in testing, integration, and deployment, making
it easier to deliver updates and maintain consistency across services.

Advantages of Microservices:

 Flexibility and Agility: Microservices enable faster development cycles, as teams can focus on smaller, isolated
functionalities.
 Resilience: The failure of one service does not affect the entire application, providing better fault tolerance.
 Scalability: Microservices can be scaled independently, allowing efficient use of resources based on demand.
 Technology Diversity: Teams can choose the most appropriate technology stack for each microservice.
 Easier Maintenance: Since each service is small and isolated, maintaining and upgrading microservices is easier.

Disadvantages of Microservices:

 Complexity in Management: Managing multiple services can become complex, especially as the number of
microservices grows.
 Network Latency: Communication between microservices typically involves network calls, which may introduce
latency compared to monolithic applications.
 Data Consistency: Achieving data consistency across services is more challenging in microservices, especially when
they maintain their own databases.
 Deployment Overhead: While microservices can be deployed independently, the orchestration of multiple services
can introduce complexity, often requiring tools like Kubernetes or Docker.

Example Use Cases:

 E-commerce Platforms: Microservices allow different aspects of an e-commerce platform (such as inventory
management, user profiles, payments, and recommendations) to be developed, deployed, and scaled independently.
 Streaming Services: Microservices can be used to manage user data, content delivery, and recommendation
algorithms independently.
 Banking and Finance: Microservices can handle different banking functions like account management, transactions,
and fraud detection separately, improving flexibility and scalability.
Orchestration:

The automated coordination and management of various components or services in a system. Commonly used in
cloud computing and container orchestration platforms like Kubernetes. Ensures optimal resource
utilization and scalability. Orchestration in IT and software development refers to the automated configuration,
management, and coordination of complex systems, processes, or workflows. The goal of orchestration is to
streamline and simplify the execution of multiple tasks, making the system more efficient and scalable by handling
dependencies and integrating various services or components.

Key Concepts of Orchestration:

1. Automation of Tasks: Orchestration automates the management of services, workflows, and operations
across different systems. It ensures that tasks are executed in the correct order, reducing manual
intervention.

2. Workflow Coordination: Orchestration manages complex workflows by connecting different processes

and services in a seamless, automated flow. This is particularly useful in environments that require multiple
steps or involve multiple systems.

3. Managing Dependencies: It handles dependencies between various tasks or services, ensuring that each
service is started, completed, or escalated in the right sequence.

4. Centralized Control: Orchestration provides a centralized control mechanism to manage the entire
workflow, simplifying the administration of distributed systems. This ensures all components work
together cohesively.

Common Use Cases of Orchestration:

1. Container Orchestration: In cloud computing, Kubernetes is a popular tool for container orchestration,
where it automatically handles the deployment, scaling, and operation of containers across clusters. This
helps manage microservices architecture, which can be distributed across various systems.

2. CI/CD Pipelines: In software development, orchestration tools are used to automate the entire process of
code integration, testing, and deployment. Tools like Jenkins or GitLab CI manage tasks like building
code, running tests, and deploying to production in a seamless flow.

3. Cloud Infrastructure Management: Orchestration is widely used in cloud environments to manage

virtual machines, storage, networking, and other infrastructure components. AWS CloudFormation and
Azure Resource Manager are tools that enable orchestration in the cloud to manage resources and
deployments.

4. Data Pipeline Orchestration: In data engineering, orchestration tools like Apache Airflow automate
workflows for ETL (Extract, Transform, Load) processes, ensuring that data is collected, processed, and
transferred efficiently across systems.

Some Important Questions

[1] Describe Feature Stores with its architecture in detail.

[2] Write a short note on Data Distribution Shifts and Model Drift
[3] Explain Types of Model Drift
[4] What are Data Distribution Shifts and Model Drift with respect to MLOPs?
[5] What are different types of Types of Data Distribution Shifts
[6] What is Kubernetes? Explain its architectural components in brief with diagram
[7] What are Key Features of Kubernetes (K8s)
[8] Enlist Popular Container Orchestration Tools
[9] What is Container orchestration ?
[10] Describe a Docker container’s lifecycle with commands.
[11] What is Cloud Computing? Explain the role of cloud in implementing MLOps Process.
[12] You have been hired as an MLOps engineer in an organization and assigned the task of creating Classical
machine learning operations architecture and process flow. Please sketch it and explain in details.
[13] Write a short note on Microservices
[14] Explain Orchestration in brief

Docker Notes-1
No ratings yet
Docker Notes-1
17 pages
Docker Notes
No ratings yet
Docker Notes
17 pages
Containers Shyam
No ratings yet
Containers Shyam
33 pages
Docker Basics and Commands Guide
No ratings yet
Docker Basics and Commands Guide
16 pages
What Is Docker
No ratings yet
What Is Docker
13 pages
L&T Assignment
No ratings yet
L&T Assignment
10 pages
L&T Assignment
No ratings yet
L&T Assignment
10 pages
Wa0016
No ratings yet
Wa0016
10 pages
L&T Assignment
No ratings yet
L&T Assignment
10 pages
CND - Docker - Unit - IV-I
No ratings yet
CND - Docker - Unit - IV-I
28 pages
Dockerday 1
No ratings yet
Dockerday 1
4 pages
Install Docker and Deploy Container in Docker
No ratings yet
Install Docker and Deploy Container in Docker
8 pages
Docker
No ratings yet
Docker
14 pages
Docker Notess
No ratings yet
Docker Notess
7 pages
DOCKER - SWARM - ECS Notes
No ratings yet
DOCKER - SWARM - ECS Notes
105 pages
DockerwithKubernetes19112018 PDF
No ratings yet
DockerwithKubernetes19112018 PDF
173 pages
Docker Introduction
No ratings yet
Docker Introduction
10 pages
Docker
No ratings yet
Docker
6 pages
Docker Notes
No ratings yet
Docker Notes
76 pages
Cloud 2
No ratings yet
Cloud 2
5 pages
Docker
No ratings yet
Docker
22 pages
Docker Guide
No ratings yet
Docker Guide
23 pages
Docker Basics and Commands Guide
No ratings yet
Docker Basics and Commands Guide
19 pages
Step: Installation: Docker Introduction Docker Installation Linux
No ratings yet
Step: Installation: Docker Introduction Docker Installation Linux
4 pages
Docker Notes
0% (1)
Docker Notes
15 pages
Assignment-7
No ratings yet
Assignment-7
14 pages
Docker Material
No ratings yet
Docker Material
57 pages
Docker Unt 5
No ratings yet
Docker Unt 5
9 pages
Docker Notes
No ratings yet
Docker Notes
1 page
The Ultimate Docker Cheat Sheet - Dockerlabs
100% (1)
The Ultimate Docker Cheat Sheet - Dockerlabs
23 pages
The Ultimate Docker Handbook
0% (1)
The Ultimate Docker Handbook
12 pages
Docker Training
No ratings yet
Docker Training
49 pages
Docker Kubernetes
No ratings yet
Docker Kubernetes
20 pages
Lecture 13
No ratings yet
Lecture 13
50 pages
Virtualization (MSA)
No ratings yet
Virtualization (MSA)
14 pages
Docker Command Reference Guide
No ratings yet
Docker Command Reference Guide
2 pages
Docker Commands Cheat Sheet
No ratings yet
Docker Commands Cheat Sheet
2 pages
Docker Sheet
No ratings yet
Docker Sheet
15 pages
Docker
No ratings yet
Docker
4 pages
1757083512215
No ratings yet
1757083512215
16 pages
Docker Evolution & Commands Guide
100% (1)
Docker Evolution & Commands Guide
92 pages
Docker Basics
No ratings yet
Docker Basics
10 pages
Docker Lab
No ratings yet
Docker Lab
11 pages
01.container and Docker
No ratings yet
01.container and Docker
18 pages
Docker Doc Notes
No ratings yet
Docker Doc Notes
53 pages
Container& Docker1
No ratings yet
Container& Docker1
27 pages
SEPM Lab Exp8,9,10
No ratings yet
SEPM Lab Exp8,9,10
33 pages
Kubernetes + Docker Cheatsheet
No ratings yet
Kubernetes + Docker Cheatsheet
105 pages
Docker Fundamentals Day 1 1702198839
No ratings yet
Docker Fundamentals Day 1 1702198839
7 pages
Docker
No ratings yet
Docker
5 pages
Lpic Devops 702 1
No ratings yet
Lpic Devops 702 1
10 pages
Docker Last Moments
No ratings yet
Docker Last Moments
5 pages
Devops
No ratings yet
Devops
5 pages
Docker Part2
No ratings yet
Docker Part2
6 pages
08 Harvard Containers
No ratings yet
08 Harvard Containers
44 pages
Docker For Beginners
No ratings yet
Docker For Beginners
12 pages
MIP Guidelines Batch 2026 - Extracts - Interim Evaluation
No ratings yet
MIP Guidelines Batch 2026 - Extracts - Interim Evaluation
4 pages
SAMPLE
No ratings yet
SAMPLE
2 pages
Laudon Mis17 PPT ch02
No ratings yet
Laudon Mis17 PPT ch02
35 pages
Annual Report
No ratings yet
Annual Report
207 pages
Chapter 4
No ratings yet
Chapter 4
14 pages
MLOps: ML System Fundamentals
No ratings yet
MLOps: ML System Fundamentals
15 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
PNG
No ratings yet
PNG
2 pages
WB835 Unit06 Script
No ratings yet
WB835 Unit06 Script
41 pages
Java Developer Resume - Devendra Singh
No ratings yet
Java Developer Resume - Devendra Singh
1 page
Job Quest
No ratings yet
Job Quest
109 pages
Flutter Widget Cheat Sheet Guide
No ratings yet
Flutter Widget Cheat Sheet Guide
25 pages
Practical 6
No ratings yet
Practical 6
4 pages
JavaScript Summary Quiz on Coursera
No ratings yet
JavaScript Summary Quiz on Coursera
1 page
Practical 3 (IOT)
No ratings yet
Practical 3 (IOT)
10 pages
Oic Dump
100% (1)
Oic Dump
15 pages
Bugzilla Interview Questions
No ratings yet
Bugzilla Interview Questions
71 pages
PDF - MVC Courses Syllabus PDF
No ratings yet
PDF - MVC Courses Syllabus PDF
3 pages
Think Before You Click
No ratings yet
Think Before You Click
2 pages
Graphic Design Software Guide
No ratings yet
Graphic Design Software Guide
9 pages
IBM Data Virtualization Manager For z/OS
No ratings yet
IBM Data Virtualization Manager For z/OS
62 pages
CS420 - Solved Sample - Paper 2024 by DL
0% (1)
CS420 - Solved Sample - Paper 2024 by DL
6 pages
Novellae
No ratings yet
Novellae
19 pages
Biotechnology Presentation - Google Drive
No ratings yet
Biotechnology Presentation - Google Drive
1 page
Jsignpdf Quick Start Guide: Josef Cacek
No ratings yet
Jsignpdf Quick Start Guide: Josef Cacek
23 pages
Hands-On Investigation & Threat Hunting Workshop Guide V2 - October 2021 - W - o Machine
100% (1)
Hands-On Investigation & Threat Hunting Workshop Guide V2 - October 2021 - W - o Machine
81 pages
Interview Prep Tracker 2024
No ratings yet
Interview Prep Tracker 2024
6 pages
RWPD Theory Practical Seprate
100% (1)
RWPD Theory Practical Seprate
134 pages
Mathworks Installation Help
No ratings yet
Mathworks Installation Help
54 pages
Brian Cooksey - An Introduction To APIs-Zapier Inc (2014)
No ratings yet
Brian Cooksey - An Introduction To APIs-Zapier Inc (2014)
100 pages
Guideline To Judge Edge Discover Info Card
No ratings yet
Guideline To Judge Edge Discover Info Card
9 pages
Intermediate JavaScript Course Overview
No ratings yet
Intermediate JavaScript Course Overview
712 pages
DWM1001 Gateway Quick Deployment Guide
No ratings yet
DWM1001 Gateway Quick Deployment Guide
29 pages
HTML Notes
No ratings yet
HTML Notes
13 pages
Advance JS
No ratings yet
Advance JS
6 pages
Gaurav Rathore
No ratings yet
Gaurav Rathore
29 pages
Budget Approval Workflow System
No ratings yet
Budget Approval Workflow System
5 pages

Chapter 6

Uploaded by

Chapter 6

Uploaded by

Chapter 6

MLOPs Infrastructural Components and Issues

5.1.1 Introduction to Docker

Key Concepts of Docker

6.1.2 Container Lifecycle

o Command: docker build

o Command: docker images

o Command: docker create

o Starting: docker start [container_id]

o Stopping: docker stop [container_id]

o Restarting: docker restart [container_id]

o Command: docker logs [container_id]

o Command: docker exec -it [container_id] /bin/bash

o Command: docker stop [container_id]

o Description: This stops a running container gracefully.

o Command: docker rm [container_id]

o Description: This command removes a stopped container from the system.

10. Image Removal:

o Command: docker rmi [image_id]

o Description: This removes an image from the local system.

11. Container Cleanup:

o Command: docker system prune

6.1.3 Docker: An Overview

Key Concepts of Docker

RUN pip install -r requirements.txt

CMD ["python", "app.py"]

How Docker Works

o The image can be built using the docker build command.

o Containers provide isolated environments, preventing conflicts between different applications or

5. Version Control and Rollback:

Docker Use Cases

o Docker is commonly used in microservices architectures to isolate individual services within

2. DevOps and CI/CD:

3. Testing and Development:

4. Cloud and Hybrid Environments:

6.1.4 Docker vs Virtual Machines (VMs)

Feature Docker Virtual Machines (VMs)

Startup Time Fast (milliseconds to seconds) Slow (minutes to start)

Isolation Isolated application environments Fully isolated environments with separate OS

Portability High portability across environments Less portable due to OS dependencies

5.2 Container orchestration

Key Functions of Container Orchestration

Popular Container Orchestration Tools

o Self-healing (auto-restarting containers)

o Native integration with Docker

o High scalability and fault tolerance

o Deep integration with the AWS ecosystem

Key Features of Kubernetes (K8s)

6. Service Discovery and Networking:

8. Automated Rollouts and Rollbacks:

5.2.1 Architecture of Kubernetes

Here are the main components of Kubernetes:

5.3 Data distribution shifts

Challenges of Data Distribution Shifts

Handling Data Distribution Shifts

5.4 Model drift

Types of Model Drift

1. Concept Drift (Real Drift):

Causes of Model Drift

Impact of Model Drift

1. Decreased Model Accuracy:

Detecting Model Drift

1. Monitoring Model Performance:

5.5 Feature Store

Key Objectives of a Feature Store

Core Components of a Feature Store

Benefits of a Feature Store

1. Reduced Duplication of Efforts:

Common Use Cases of Feature Stores

1. Real-Time Predictive Systems:

Popular Feature Store Solutions

• Promote feature reuse, consistency, and real-time access.

5.5.1 Architecture of Feature store

1. Data Infrastructure Layer

3. SDK (Software Development Kit)

The Architecture of Feature store constitutes following concepts