KEMBAR78
MLOps | PDF | Version Control | Metadata
0% found this document useful (0 votes)
48 views21 pages

MLOps

This document discusses MLOps and how to operate a machine learning system productively. It covers topics like source control, CI/CD tools, feature stores, data and ML pipelines, model registries, metadata stores, and performance monitoring, which are important components of MLOps.

Uploaded by

project111995
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views21 pages

MLOps

This document discusses MLOps and how to operate a machine learning system productively. It covers topics like source control, CI/CD tools, feature stores, data and ML pipelines, model registries, metadata stores, and performance monitoring, which are important components of MLOps.

Uploaded by

project111995
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

MLOps

How to operate a ML system productively?

groups/mlopsvn
About me

➢ Working profile
○ 2016-2017: Machine Learning Engineer Freelancer
○ 2017-2019: Machine Learning Researcher @ University of Aizu
○ 2019-2020: Machine Learning Engineer @ Heligate
○ 2020-2022: Senior Machine Learning Engineer @ One Mount
○ 2022-present:
■ Expert Machine Learning Engineer @ MSB
■ Admin of MLOps VN

➢ Contact: https://www.linkedin.com/in/quan-dang/

groups/mlopsvn
Ordinary ML workflow

Manual process

groups/mlopsvn 3
❖ Manual executions
❖ Disconnection b/w DS & Ops engineers
Any problems? ❖

Infrequent release iterations
No CI/CD
❖ No monitoring

Manual process

groups/mlopsvn 4
A new design with
new components
➢ Source control
➢ CI/CD tool
➢ Feature Store
➢ Data/ML Pipelines
➢ Model Registry
➢ ML Metadata Store
➢ Performance Monitoring

Automation
(https://ml-ops.org/content/mlops-principles)
groups/mlopsvn 5
Source control

● Git workflow
● Code version control

● Data version control

groups/mlopsvn 6
CI/CD tool

CI/CD user workflow


(https://docs.gitlab.com/ee/ci/introduction/)
groups/mlopsvn 7
Feature store
● Why?
○ Feature reuse
○ Single source for both training
and serving (consistency)
○ Monitor for drift and quality
issues

https://www.tecton.ai/blog/what-is-a-feature-store

groups/mlopsvn 8
Data pipelines

9
groups/mlopsvn
ML pipelines

10
groups/mlopsvn
ML metadata store
● Run information
○ Description
○ Start/end time, duration
○ Executor
○ Parameters
● Artifacts in each step
MLFlow experiment tracking
○ Input/output data
○ Figures
○ HTML files
● Metrics
○ ML related: mae, r2, .etc.
○ Data related: completeness,
ks-test, .etc.

Kubeflow Pipeline
Metadata
(Trevor Grant, et. al,
Kubeflow for
groups/mlopsvn Machine Learning) 11
Model registry
UI and a set of APIs to manage model:

● Model lineage (which experiment/run


produced the model)
● Model version
● Model state transitions
Model lineage and description
i.e., from staging to production
● Model description/documentation
● Model validation results/metrics

Model version
groups/mlopsvn and state 12
Performance monitoring (1) System health:

Performance level: ● Number of requests (throughput)


● Average response time (latency)
● Data integrity: inspect volume, variety, ● Number of failure requests
veracity, velocity of incoming & outgoing ● IO/Memory/CPU usage
data for detecting outliers and anomaly ● Disk utilization
● Model drift:
● System uptime
Entities Drift

X: Inputs (features) Features drift API outgoing data:

y: Outputs (labels) Target drift ● Model metadata: name, version, docker


Relationship between X Concept drift image
and y
● Input data (features)
● Business metrics and ROI ● Predictions
e.g. Click-through Rate (CTR) and ● System actions
engagement and engagement metrics in ● Explanation
13
social network company groups/mlopsvn
Performance monitoring (2)

Seldon Core API dashboard


(https://github.com/SeldonIO/seldon-core/blob/master/README.md)

groups/mlopsvn 14
Performance monitoring (3)

Payload logging with ELK stack


(https://github.com/SeldonIO/seldon-core/blob/master/README.md)

groups/mlopsvn 15
Additional: Experimentation platform
A custom platform/on premises: On cloud:

● EDA: JupyterHub, Jupyter Notebook ● AWS Sagemaker Studio


● IDE: Code server ● Google Vertex AI
● Pre-built docker images with common ● Azure Machine Learning Studio
libraries shared among team members
● Try to build your own libraries, e.g., a
custom AutoML library with data in and
model out
● Git for code versioning
● Data Versioning (e.g. DVC)
● Experiment Tracking (e.g. MLFlow)

groups/mlopsvn 16
Additional: Model serving frameworks (1)

General purpose Model agnostic Framework specific

groups/mlopsvn
Additional: Model serving frameworks (2)
Features KServe Seldon Core BentoML Triton Serve Ray Serve
(0.9.0) (1.14.0) (1.0.0) (2.23.0) (1.13.0)

GPU

Micro (adaptive) batching

Offline batch serving

Autoscaling

Scale to 0

Canary deployments

A/B tests/MAB deployments

Native Kafka Integration

GRPC

Tracing

Prometheus metrics

groups/mlopsvn
A comparison of model agnostic serving frameworks
Skill set

Roles and their intersections contributing to


the MLOps paradigm
(https://arxiv.org/pdf/2205.02302.pdf)
groups/mlopsvn 19
Study materials
● Books

● Courses
○ https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops
○ https://stanford-cs329s.github.io
○ https://fullstackdeeplearning.com
○ https://github.com/DataTalksClub/mlops-zoomcamp
○ https://github.com/alexeygrigorev/mlbookcamp-code
● Blogs
○ https://madewithml.com
○ https://mlops.community/blog 20
groups/mlopsvn
To sum up

groups/mlopsvn 21

You might also like