Platform for Complete Machine
Learning Lifecycle
Jules S. Damji
@2twitme
San Francisco| May 13, 2020: Part 2 of 3 Series
Outline – Introduction to MLflow: Understanding MLflow
Projects and Models- Part 2
§ Review & Recap Part 1: MLflow Tracking
▪ https://youtu.be/x3cxvsUFVZA
§ MLFlow Component
▪ MLflow Projects & Models
▪ Concepts and Motivations
▪ MLflow on Databricks Community Edition (DCE)
▪ Explore MLflow UI
▪ Tutorials
§ Q&A
https://dbricks.co/mlflow-part-2
https://github.com/dmatrix/mlflow-workshop-project-expamle-1
Machine Learning
Development is Complex
Traditional Software vs. Machine Learning
Traditional Software Machine Learning
§ Goal: Meet a functional specification § Goal: Optimize metric(e.g., accuracy.
§ Quality depends only on code Constantly experiment to improve it
§ Typically pick one software stack w/ § Quality depends on input data and
fewer libraries and tools tuning parameters
§ Compare + combine many libraries,
model
Machine Learning Lifecycle
μ
λθ Tuning
Scale
Data Prep
μ
Model λθ Tuning
Delta Raw Data Exchange Training
Scale
Scale
Deploy
Governance
Scale
MLflow Components
w
ne
Tracking Projects Models Model
Record and query Package data Deploy machine Registry
experiments: code, science code in a learning models in
Store, annotate
data, config, and results format that enables diverse serving
and manage
reproducible runs environments
models in a
on any platform environments
central repository
databricks.com
mlflow.org github.com/mlflow twitter.com/MLflow
/mlflow
Model Development with MLflow is Simple!
data = load_text(file) $ mlflow ui
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
with mlflow.start_run() as run:
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learn_rate”, lr)
mlflow.log_metric(“score”, score) Track parameters, metrics,
mlflow.sklearn.log_model(model) output files & code version
Search using UI or API
MLflow Tracking
Python,
Java, R or
REST API
Notebooks Tracking Server UI
Local Apps Parameters Metrics Artifacts
API
Spark
Cloud Jobs Metadata Models
Data Source
$ export MLFLOW_TRACKING_URI <URI>
mlflow.set_tracking_uri(URI)
MLflow Components
w
ne
Tracking Projects Models Model
Record and query Package data Deploy machine Registry
experiments: code, science code in a learning models in
Store, annotate
data, config, and results format that enables diverse serving
and manage
reproducible runs environments
models in a
on any platform environments
central repository
databricks.com
mlflow.org github.com/mlflow twitter.com/MLflow
/mlflow
MLflow Projects Motivation
Diverse set of tools
Projects
Package data science
Diverse set of environments code in a format that
enables reproducible runs
on any platform
Challenge: ML results difficult to reproduce
MLflow Projects
Local Execution
Project Spec
Code Config
Remote Execution
Dependencies Data
1. Example MLflow Project File
my_projectject/
├── MLproject conda_env: conda.yaml
│ entry_points:
│ main:
parameters:
│ training_data: path
│ lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
│
├── conda.yaml
├── main.py $ mlflow run git://<my_project>.git -P lambda=0.2
└── model.py
mlflow.run(“git://<my_project>”, parameters={..})
...
mlflow run . –e main –P lambda=0.2
2. Example Conda.yaml
my_project/
├── MLproject
channels:
│ - defaults
│ dependencies:
│ - python=3.7.3
- scikit-learn=0.20.3
│ - pip:
│ - mlflow
├── conda.yaml - cloudpickle==0.8.0
├── main.py name: mlflow-env
└── model.py
….
MLflow Projects
Packaging format for reproducible ML runs
• Any code folder or GitHub repository
• MLproject file with project configuration
Defines dependencies for reproducibility
• Conda (+ R, Docker, …) dependencies can be specified in MLproject
• Reproducible in (almost) any environment
Execution API for running projects
§ CLI / Python / R / Java mml
directory paths to
§ Supports local and remote execution MLproject file
▪ mlflow run –help (CLI)
▪ mlflow run https://github.com/dmatrix/jsd-mlflow-examples.git#keras/imdbclassifier (CLI)
▪ mlflow.run (<project_uri>, parameters={}) or mlflow.projects.run((<project_uri>, parameters={}) (API)
Anatomy of MLflow Project Execution
1 2 3
$ mlflow run Fetch the GitHub project into
https://github.com/mlflow- /var/folders/xxx directory Create conda env & activate
d
project-example-1 d mlflow-runidd
4 5
Install packages & dependencies from In the activated conda environment
conda.yaml d mlflow-runid d
Execute your entry point:
python train.py args, …,args
How to build an MLflow Project
1 2
• Create an MLproject file • Create a conda.yaml file
• Populate with entry points • Populate with dependencies
d and
and default type • Copy from yourd mlflow ui
parameters artifacts ->Model->conda.yaml
3 • Test it
4
• Create a GitHub repository
• Populate or upload • mlflow run git://URI –P arg.. –P args
d
MLProject, conda.yaml, • d params-{})
mlflow.run(URI,
data, src files… etc. • Share it …
MLflow Project: Create Multi-Step Workflow
https://github.com/mlflow/mlflow/tree/master/examples/multistep_workflow
MLflow Components
w
ne
Tracking Projects Models Model
Record and query Package data Deploy machine Registry
experiments: code, science code in a learning models in
Store, annotate
data, config, and results format that enables diverse serving
and manage
reproducible runs environments
models in a
on any platform environments
central repository
databricks.com
mlflow.org github.com/mlflow twitter.com/MLflow
/mlflow
MLflow Model Motivations
Inference Code
NxM
Combination of
Model support for
all Serving tools
Batch & Stream Scoring
ML Frameworks Serving Tools
MLflow Model Motivation
MLflow Models
Inference Code
Model Format
Flavor 1 Flavor 2
Batch & Stream
Scoring
Standard for ML models
ML Frameworks Serving Tools
Example MLflow Model
Example MLflow Model
mlflow.tensorflow.log_model(...)
my_model/
├── MLmodel run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
│ flavors:
│ tensorflow:
Usable by tools that understand
saved_model_dir: estimator
│ signature_def_key: predict TensorFlow model format
│ python_function: Usable by any tool that can run
loader_module: mlflow.tensorflow
│ Python (Docker, Spark, etc!)
└── estimator/
├── saved_model.pb
└── variables/
...
Model Keras Flavor Example
mlflow.keras.log_model(…)
Train a model
predict = mlflow.pyfunc.load_model(…)
Flavor 1:
Pyfunc predict(pandas.input_dataframe)
Model Flavor 2:
Format Keras
model = mlflow.keras.load_model(…)
model.predict(keras.Input(…))
Model Flavors Example
predict = mlflow.pyfunc.load_model(model_uri)
predict(pandas.input_dataframe)
MLflow Models
Packaging format for ML Models
• Any directory with MLmodel file
Defines dependencies for reproducibility
• Conda environment can be specified in MLmodel configuration
Model creation and loading utilities
• mlflow.<model_flavor>.save_model(…) or log_model(…)
• mlflow.<model_flavor>.load_model(…)
Deployment APIs
• CLI / Python / R / Java
• mlflow models [OPTIONS] COMMAND [ARGS]...
• mlflow models serve [OPTIONS [ARGS] ….
• mlflow models predict [OPTIONS [ARGS] ...
MLflow Project & Models
Tutorials
Tutorials: https://github.com/dmatrix/mlflow-workshop-part-2
MLflow Project Keras Example:
https://github.com/dmatrix/mlflow-workshop-project-expamle-1
Learning More About MLflow
§ pip install mlflow to get started
§ Find docs & examples at mlflow.org
§ Peruse code at MLflow Github
§ Join the Slack channel
§ More MLflow tutorials
Thank you! J
Q&A
jules@databricks.com
@2twitme
https://www.linkedin.com/in/dmatrix/