KEMBAR78
MLOps Notes | PDF | Machine Learning | Data
0% found this document useful (0 votes)
5 views19 pages

MLOps Notes

The document provides notes on the Introduction to Machine Learning in Production (MLOps1) course, covering key concepts such as the ML project lifecycle, data and concept drift, model selection, and performance auditing. It emphasizes the importance of monitoring, error analysis, and maintaining high-quality data throughout the ML lifecycle. Additionally, the notes discuss strategies for labeling data, establishing baselines, and addressing challenges in both small and large datasets.

Uploaded by

ht23resch14004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views19 pages

MLOps Notes

The document provides notes on the Introduction to Machine Learning in Production (MLOps1) course, covering key concepts such as the ML project lifecycle, data and concept drift, model selection, and performance auditing. It emphasizes the importance of monitoring, error analysis, and maintaining high-quality data throughout the ML lifecycle. Additionally, the notes discuss strategies for labeling data, establishing baselines, and addressing challenges in both small and large datasets.

Uploaded by

ht23resch14004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.

io/post/mlops1/

Notes for Introduction to


Machine Learning in Production
(MLOps1) on Coursera/
Deeplearning.ai
2021-06-07 · 0 Comments

Week 1� Steps of an ML project


The ML project lifecycle
MLOps (Machine learning Operations) comprises a set of tools and principles to
support progress through the ML project lifcycle.

• Decide to work on speech recognition for voice search


• Decide on key metrics:
◦ Acc, latency, throughput
• Estimate resources and timeline

• Is the data labeled consistently


• How much silence before/after each clip?
• How to perform volume normalization?

1 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

Data drift: the input data has changed. The distribution of the variables is
meaningfully different. As a result, the trained model is not relevant for this new
data.

Concept drift occurs when the patterns the model learned no longer hold.

In contrast to the data drift, the distributions (such as user demographics,


frequency of words, etc.) might even remain the same. Instead, the relationships
between the model inputs and outputs change.

• Realtime of Batch
• Cloud vs. Edge/Browser
• Compute resources (CPU/GPU/memory)
• Latency, throughput (QPS)
• Logging
• Security and privacy

1. New product/capability
2. Automate/assist with manual task
3. Replace previous ML system

Key idesa:

• Gradual ramp up with monitoring


• Rollback

• ML system shadows the human and runs in parallel.


• Ml system’s output not used for any decisions during this phase.

• Roll out to small fraction (say 5%) of tra�c initially.


• Monitor system and ramp up tra�c gradually.

• Easy way to enable rollback

2 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• Brainstorm the things that could go wrong.


• Brainstorm a few statistics/metrics that will detect the problem.
• It is ok to use many metrics initially and gradually remove the ones you �nd
not useful.

• Software metrics: memory, compute, latency, throughout, server load


• Input metrics: avg input length, avg input volume, num missing values, avg
image brightness
• Output metrics # times return ‘'(null), # times user

• Set thresholds for alarms


• Adapt metrics and thresholds over time

• Manual retraining
• Automatic retraining

3 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• Monitor
◦ Software metrics
◦ Input metrics
◦ Output metrics
• How quickly do they change?
◦ User data generally has slower drift.
◦ enterprise data (B2B applications) can shift fast.

• Machine Learning in Production: Why You Should Care About Data and
Concept Drift
• Monitoring Machine Learning Models in Production
• A Chat with Andrew on MLOps: From Model-centric to Data-centric AI

Week 2� Select and train model


Selecting and Training a Model

1. Doing well on training set (usually measured by average training error)


2. Doing well on dev/test sets.
3. Doing well on business metrics/project goals.

Web search example

• Informational and transactional queries


• Navigational queries

• Example: ML for loan approval: makes sure not to discriminate by ethnicity,


gender, location, language of other protected attributes.
• Example: Product recommendations from retailers: Be careful to treat
fairly all major user, retailer, and product categories.

• Skewed data distribution


• Accuracy in rare classes

• I did well on the test set


• But this doesn’t work for my application

4 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• But this doesn’t work for my application

• Unstructured data: Image, Audio, Text (HLP is important)


• Structured data: a data frame

• Human level performance


• Literature search for state-of-the-art/open source
• Quick-and-dirty implementation
• Performance of older system

Baseline helps to indicates what might be possible. In some cases (such as


HLP) is also gives a sense of what is irreducible error/Bayes error.

• Literature search to see what’s possible (courses, blogs, open-source


projects).
• Find open-source implementations if available.
• A reasonable algorithm with good data will often outperform a great
algorithm with no so good data.

Should you take into account deployment constraints when picking a model?

• Yes, if baseline is already established and goal is to build and deploy.


• No (or not necessarily), if purpose is to establish a baseline and determine
what is possible and might be worth pursuing.

• Try to over�t a small training dataset before training on a large one.

Error analysis and performance auditing

5 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• What fraction of errors has that tag?


• Of all data with that tag, what fraction is misclassi�ed?
• What fraction of all the data has that tag?
• How much room for improvement is there on data with that tag?

Prioritizing what to work on

Decide on most important categories to work on based on:

• How much room for improvement there is


• How frequently that category appears
• How easy is to improve accuracy in that category
• How important it is to improve in that category

For categories you want to prioritize:

• Collect more data


• Use data augmentation to get more data
• Improve label accuracy/data quality

Skewed datasets

6 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

Performance auditing

Check for accuracy, fairness/bias, and other problems.

7 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

1. Brainstorm the ways the system might go wrong.


◦ Performance on subsets of data (e.g., ethnicity, gender).
◦ How common are certain errors (e.g., FP, FN).
◦ Performance on rare classes.
2. Establish metrics to assess performance against these issues on
appropriate slices of data.
3. Get business/product owner buy-in.

1. Brainstorm the ways the system might go wrong.


◦ Accuracy on different genders and ethnicities.
◦ Accuracy on different devices.
◦ Prevalence of rude mis-transcriptions.
2. Establish metrics to assess performance against these issues on
appropriate slices of data.
◦ Mean accuracy for different genders and major accents.
◦ Mean accuracy on different devices.
◦ Check for prevalence of offensive words in the output.

Data iteration

• Model-centric view: take the data you have, and develop a model that does
as well as possible on it.
◦ Hold the data �xed and iteratively improve the code/model.
• Data-centric view: the quality of the data is paramount. Use tools to
improve the data quality; this will allow multiple models to do well.
◦ Hold the code �xed and iteratively improve the data.

• Goal:
◦ Create realistic examples that (i) the algorithm does poorly on, but (ii)
humans (or other baseline) do well on
• Checklist:
◦ Does it sound realistic?
◦ Is the x→ y mapping clear? (e.g. can humans recognize speech?)
◦ Is the algorithm currently doing poorly on it?

8 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

For unstructured data problems, if:

• The model is large (low bias).


• The mapping x→y is clear (e.g., given only the input x, humans can make
accurate predictions).

Then, adding data rarely hurts accuracy.

• Restaurant recommendation example:


◦ Vegan are frequently recommended restaurants with only meat
options.
◦ Possible features to add?
▪ Is person vegan (based on past orders)?
▪ Does restaurant have vegan options (based on menu)?
• Other food delivery examples
◦ Only tea/coffee and only pizza
◦ What are the added features that can help make a decision?
◦ Product recommendation:

9 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

◦ Product recommendation:
Collaborative �ltering ——> Content based �ltering (cold-start)

• Error analysis can be harder if there is not good baseline (such as HLP) to
compare to.
• Error analysis, user feedback and benchmarking to competitors can all
provide inspiration for features to add.

1. What to track?
◦ Algorithm/code versioning
◦ Dataset used
◦ Hyperparameters
◦ Results
2. Tracking tools
◦ Text �les
◦ spreadsheet
◦ Experiment tracking system
3. Desirable features
◦ Information needed to replicate results
◦ Experiment results, ideally with summary metrics/analysis
◦ Perhaps also: resource monitoring, visualization, model error analysis

Try to ensure consistently high-quality data in all phases of the ML project


lifecycly

Good data:

• Covers important cases (good coverage of inputs x)


• Is de�ned consistently (de�nition of labels y is unambiguous)
• Has timely feedback from production data (distribution covers data drift
and concept drift)
• Is sized appropriately

Reading Material Week 2:

• Establishing a baseline
• Error analysis

10 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

Error analysis
• Experiment tracking

Week 3� Data De�nition and


Baseline
De�ne Data and Establish Baseline

• What is the input x?


◦ lightning? contrast? resolution?
◦ What features need to be in included?
• What is the target label y?
◦ How can we ensure labelers give consistent labels?

• Unstructured data
◦ may or may not have huge collection of unlabeled examples x.
◦ Humans can label more data.
◦ Data augmentation more likely to be helpful.
• Structured data
◦ May be more di�cult to obtain more data.
◦ Human labelling may not be possible (with some exceptions)

• Small data
◦ Clean labels are critical
◦ Can manually look through dataset and �x labels
◦ Can get all the labelers to talk to each other
• Big data
Emphasis data process

11 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

◦ Emphasis data process

Problems with a large dataset but where there’s a long tail or rare events in the
input will have small data challenges too.

• Web search
• Self-driving cars
• Product recommendation systems

• Have multiple labelers label same example.


• When there is disagreement, have MLE, subject matter expert (SME) and/
or labelers discuss de�nition of y to reach agreement.
• If labelers believe that x doesn’t contain enough information, consider
changing x.
• Iterate until it is hard to signi�cantly increase agreement

12 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• Small data
◦ Usually small number of labelers
◦ Can ask labelers to discuss speci�c labels
• Big data
◦ Get to consistent de�nition with a small group.
◦ Then send labeling instructions to labelers.
◦ Can consider having multiple labelers label every example and using
voting or consensus labels to increase accuracy.

Estimate Bayes error / irreducible error to help with error analysis and
prioritization.

• In academia, establish and beat a respectable benchmark to support


publication.
• Business or product owner asks for 99% accuracy. HLP helps establish a
more reasonable target.
• “Prove” the ML system is superior to humans doing the job and thus the
business or product owner should adopt it. (Use with caution)

13 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

When the ground truth label is externally de�ned, HLP gives an estimate for
Bayes error / irreducible error.

But often ground truth is just anther human label.

• When the label y comes from a human label, HLP « 100% may indicate
ambiguous labeling instructions.
• Improving label consistency will raise HLP
• This makes it harder for ML to beat HLP. But the more consistent labels
will raise ML performance, which is ultimately likely to bene�t the actual
application performance.

Structured data problems are less likely to involve human labelers, thus HLP is
less frequently used.

Some exceptions:

• User ID merging: same person?


• Based on network tra�c, is the computer hacked?
• Is the transaction fraudulent?
• Spam account? Bot?
• From GPS, what is the mode transportation - on foot, bike, car, bus?

Label and Organize Data

• Get into this iteration loop as quickly as possible.


• Instead of asking: how long it would take to obtain m examples? ask: How
much data can we obtain in k days.
• Exception: if you have worked on the problem before and from experience
you know you need m examples.

Brainstorm list of data sources

14 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

Other factors: data quality, privacy, regulatory constrains

• Options: in-house vs. outsourced vs. crowdsourced


• Having MLEs label data expensive. But doing this for just a few days is
usually �ne
• Who is quali�ed to label?
◦ Speech recognition - any reasonable �uent speaker
◦ Factory inspection, medical image diagnosis - SME (subject matter
expert)
◦ Recommender systems - maybe impossible to label well
• Don’t increase data by more than 10x at a time

• POC(proof-of-concept):
◦ Goal is to decide if the application is workable and worth deploying.
◦ Focus on getting the prototype to work
◦ It’s ok if data pre-processing is manual. But take extensive notes/
comments
• Production phase:
• After project utility is established, use more sophisticated tools to make
sure the data pipeline is replicable.
• E.g., Tensor Flow Transform, Apache Beam, Air�ow, …

15 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• Examples:
◦ Manufacturing visual inspection: time, factory, line #, camera
settings, phone model, inspector ID,…
◦ Speech recognition: device type, labeler ID, VAD model ID,…
• Useful for:
◦ Error analysis. Spotting unexpected effects.
◦ Keeping track of data provenance.

Visual inspection example: 100 examples, 30 positive (defective)

• Train/dev/test: 60%/20%/20%
• Random split: positive example: 21/2/7 (35%/10%/35%)→ dev set is not
representative
• Want: 18/6/6 (30%/30%/30%) →balanced split
• No need to worry about this with large datasets - a random split will be
representative

Scooping

16 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• Use external benchmark (literature, other company, competitor)

People are very good on unstructured data tasks

Criteria: can a human, given the same data, perform the task?

• Given past purchases, predict future purchases �


• Given weather, predict shopping mall foot tra�c �
• Given DNA info, predict heart disease �
• Given social media chatter, predict demand for a clothing style �
• Given history of stock’s price, predict future price of that stock �

17 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

• Is this project creating net positive societal value?


• Is this project reasonable fair and free from bias?
• Have any ethical concerns been openly aired and debated?

Key speci�cations:

• ML metrics (accuracy, precision/recall, etc.)


• Software metrics (latency, throughput, etc. given compute resources)
• Business metrics (revenue, etc.)
• Resources needed (data, personnel, help from other teams)
• Timeline

If unsure, consider benchmarking to other projects, or building a POC (Proof of


Concept) �rst.

Reading Material Week 3:

Label ambiguity

https://arxiv.org/pdf/1706.06969.pdf

Data pipelines

Data lineage

MLops

Overall resources:

Konstantinos, Katsiapis, Karmarkar, A., Altay, A., Zaks, A., Polyzotis, N., … Li, Z.
(2020). Towards ML Engineering: A brief history of TensorFlow Extended (TFX).
http://arxiv.org/abs/2010.02013

Paleyes, A., Urma, R.-G., & Lawrence, N. D. (2020). Challenges in deploying


machine learning: A survey of case studies. http://arxiv.org/abs/2011.09926

Python Machine learning

     

PhD Candidate

   

18 of 19 7/22/24, 12:11
Notes for Introduction to Machine Learning in Production (MLOps1) on Coursera/De... https://zhaoyufei.rbind.io/post/mlops1/

   

0 Comments 
1 Login

G Start the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Name

 Share Best Newest Oldest

Be the �rst to comment.

Subscribe Privacy Do Not Sell My Data

• Building API for Predicting Mando-pop Popularity


• Notes for Machine Learning Data Lifecycle in Production (MLOps2) on
Coursera/Deeplearning.ai
• Decoding memory content from human parietal cortex: VGG16 application
on memory research
• Books/video courses recommendation for data science related coding/
machine learning/stats
• E�cient way of the brain for resolving similar memory interference

Yufei Zhao ©2022 · Code with � and �. Powered by R Blogdown and the Academic theme for Hugo.

19 of 19 7/22/24, 12:11

You might also like