0% found this document useful (0 votes)

313 views12 pages

Comparison Kubeflow TFX

Uploaded by

Pepe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

313 views12 pages

Comparison Kubeflow TFX

Uploaded by

Pepe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

A comparison of

Kubeflow & TFX

www.springml.com
© 2019
Introduction
In this report, we compare two technologies that have come out of Google for managing machine learning
pipelines.
The first is Kubeflow, which has been in development since 2018 and was originated as a way of bringing the
ideas of TFX (used only internally at Google at the time) to the public via open source tools and is in the process
of changing as many developments as open source tools come and go.
The second is TensorFlow Extended (TFX) itself. Google announced that it would be making TFX available to the
public at the end of 2018.

Kubeflow
The main mission of Kubeflow is to make scaling machine learning (ML) models and deploying them to
production as simple as possible across production environments, by letting Kubernetes do what it’s great at:
• Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster
<-> production cluster)
• Deploying and managing loosely-coupled microservices
• Scaling based on demand

At a high level, the execution of a pipeline proceeds as follows:

Python SDK : DSL compiler : Pipeline Service :

You create components or specify a The DSL compiler transforms your You call the Pipeline Service to
pipeline using the Kubeflow Pipelines pipeline’s Python code into a static create a pipeline run from the
domain-specific language ( DSL ) configuration (YAML) static configuration

Orchestration controllers : A set of orchestration controllers execute

the containers needed to complete the pipeline execution specified by Kubernetes resources : The Pipeline Service calls the
the Kubernetes resources ( CRDs ). The containers execute within Kubernetes API server to create the necessary
Kubernetes Pods on virtual machines. An example controller is the Kubernetes resources ( CRDs ) to run the pipeline
Argo Workflow controller, which orchestrates task-driven workflows

Artifact storage: The Pods store two kinds of data:

Metadata: Experiments, jobs, runs, etc. Also single scalar metrics, generally aggregated for the purposes of sorting and filtering.
Kubeflow Pipelines stores the metadata in a MySQL database.
Artifacts: Pipeline packages, views, etc. Also large-scale metrics like time series, usually used for investigating an individual run’s
performance and for debugging. Kubeflow Pipelines stores the artifacts in an artifact store like Minio server or Cloud Storage

Persistence agent and ML metadata: The Pipeline Persistence Agent watches

the Kubernetes resources created by the Pipeline Service and persists the state The MySQL database and the Minio server
of these resources in the ML Metadata Service. The Pipeline Persistence Agent are both backed by the Kubernetes
records the set of containers that executed as well as their inputs and outputs. PersistentVolume (PV) subsystem
The input/output consists of either container parameters or data artifact URIs

Pipeline web server: The Pipeline web server gathers data from various services to display relevant views: the list of pipelines
currently running, the history of pipeline execution, the list of data artifacts, debugging information about individual pipeline runs,
execution status about individual pipeline runs

Page 2 www.springml.com
Deploying Kubeflow Pipelines
The steps to deploy Kubeflow Pipelines are enumerated in our “Deploying Kubeflow Pipelines” document.
What are some pros and cons of Kubeflow?

Pros Cons
Kubeflow makes it very easy for data scientists to Ksonnet which is the prime component for
get distributed training and serving on a cluster configuration on Kubeflow will be discontin-
without worrying about infrastructure ued in further versions, and the community
will need to replace it. This will cause some
major changes in Kubeflow code
A variety of open-source tools have been There are several issues with sample pipe-
combined to bring together several good ideas lines and notebooks present on kubeflow
website. These issues are still getting fixed
Kubeflow introduced the concept of fairing which The variety of options of open-source tools
makes it easier for data scientists to implement means the user must be aware and informed
jobs directly in jupyter notebooks along with about the tools available and make informed
model implementations decisions about which components to use in
any given deployment, which leads to a less
clearly defined, universal pipeline
Kubeflow supports multiple frameworks such as The variety of tools also means the user needs
Tensorflow, Pytorch, MXNet, etc. to learn multiple technologies with different
languages to utilize all of their desired tools
within Kubeflow

TensorFlow Extended (TFX)

TensorFlow Extended is Google’s internal pipeline for managing the full Machine Learning process from raw data
ingest through training, evaluation, and deployment across multiple platforms, now made available to the public.
In a nutshell, TFX allows teams to:
• Divide work on a machine learning pipeline into discrete modules with well-defined interfaces for
interacting with one another
• At the same time, keep all high-level instructions for the model in one place for transparency through-
out the process
• Approach machine learning model development not as a "one and done" process, but as a process
like other software development that considers iteration and improvement part of its core
• Makes it (relatively) easy to ingest and build from Google and other organization's trained models
• Makes it (relatively) easy to deploy models to the web (server or client browser), edge devices, contribute
them back to the community, etc.

Page 3 www.springml.com
What runs the pipeline?
The pipeline can be run by Airflow or Kubeflow Pipelines. Airflow seems to be more prominent
positioned in Google's messaging, but that may be because of current adoption in the field.

What components make up the

pipeline?
The pipeline is made up of a series of 'modules' that each do a specific task. In addition to allowing
customizability where you need it, this helps reduce the cognitive load of considering the entire pipeline at
once, and lets Data Engineering teams focus on one aspect of the pipeline at a time (or provides for conve-
niently clear boundaries for distributing work on various aspects of the pipeline across engineers).

ExampleValidator
StatisticsGen

SchemaGen
CsvExampleGen Evaluator

Transform Trainer
Pusher
ModelValidator

The above diagram shows how the Pipeline appears within an Airflow job.

Those modules are:

CsvExampleGen which ingests data from raw sources like CSVs, BigQuery, TFRecords, splits that data
into training and evaluation sets

StatisticsGen uses TensorFlow Data Validation to provide a window into the raw dataset, calculating
descriptive statistics to identify missing data, values outside expected ranges, etc. This
view also allows a data scientist to examine the features and check for other data
quality problems like post-treatment variables

is one of the simpler components of the pipeline. It examines the statistics created
SchemaGen
from StatisticsGen and creates a data schema to represent the data types of each
feature column, and whether they are required fields or not. It creates the schema
automatically, but developers are expected to review and modify the schema to
confirm its accuracy

ExampleValidator looks for anomalies and missing values in the dataset. For example, it can detect
training-serving skew and data drift as it compares training data to evaluation data or
new data coming in during serving

Page 4 www.springml.com
Transform performs consistent feature engineering on all subsets of the dataset (i.e. train, dev,
test). Here we engineer our features, vocabularies, embeddings, etc and can define
features as categorical, dense, and bucket features for use in different kinds of com-
plex models. The same Transform module can apply to both training data and incom-
ing data in production in order to ensure that the transformations are consistent
across both datasets
Trainer trains the model using TensorFlow Estimators

Evaluator performs deep analysis of the training results. In addition to standard model evalua-
tion statistics, it can show statistics on user-defined slices of the data. This can help
test whether your model is performing too poorly on a particular subgroup of input
data, as well as providing insight into what additional data you need to gather or
additional features you need to engineer in order to improve. Evaluator also helps
trackperformance over time to see how model iteration is improving performance.
This module is based on TF Model Analysis

ModelValidator is like an automated gatekeeper that ensures that new versions of the model are
"good enough" to be pushed to production, especially when new models are trained
and served automatically. It takes two models: the last good evaluated model, and the
current model coming from training. It "blesses" the new model if performance is the
same or better. Fails if not 'better' than the last one. It also does this evaluation on
new data, which helps make sure the model is ready for production data rather than
simply overfitting the training data

Pusher deploys the model to serving infrastructure, like TensorFlow.js, TensorFlow Serving
and TF Hub

Example code to deploy a TFX pipeline to Airflow can be found in this taxi_pipeline.py example. That pipeline is
supported by this taxi_utils.py library. Those files come from Google’s official TFX Developer Tutorial.

StatisticsGen Output: This is an example of the metadata you can see on each feature output by the
StatisticsGen module.

Page 5 www.springml.com
That all sounds very cool.
What are some other pros and cons of TFX?

Pros Cons
Organization & Version Control - The biggest pro Challenging for Quick, Individual One-Off
is the organization that it brings to the pipeline Projects - This is a complex system that an organi-
with a clear, version-controllable approach to the zation would need to really adopt at minimum on
whole machine learning process an entire project level
Focus with Integration - It also allows developers Learning Curve Does Not Start at Ground Level -
to focus on one aspect of the pipeline at a time While we believe discretizing the pipeline into
(or different developers can own different different modules actually makes things easier to
elements) in an organized, coherent fashion understand, this is still starting at "Floor 3" rather
than the "Ground Floor" and thus needs teams
Data Lineage - It provides data lineage features,
able to understand and work with all the compo-
so you always know what transformations have
nents. You're not just building a simple test
been performed on your data by the time of
example here, but an entire production pipeline
model training or other stages in the pipeline.
Data lineage, FTW! New Software - It is still new, so no doubt we will
see some changes over the next year
Consistent Transforms at Different Stages with
Powerful Feature Types - The Transform capabili Underlying Architecture in Development - It uses
ties are very empowering, and the template's Apache Beam, which isn't fully released for
demonstrated use of _CATEGORICAL_FEA Python 3 yet (although it looks like we are close),
TURE_KEYS, _DENSE_FLOAT_FEATURE_KEYS, so there may be some environment reconfigura
and _BUCKET_FEATURE_KEYS allow data scien tions required as it develops
tists/engineers to easily identify feature types and
build some really sophisticated models without Supports Only TensorFlow Framework - This
even realizing it. Concretely, TensorFlow’s DNN seems to currently only be applicable to
LinearCombinedClassifier can treat different TensorFlow, so models built in PyTorch, Caffe,
features differently in order to build a model that MXNet, etc. may not be able to fit in well with an
passes appropriate features (like continuous organization that adopts this methodology
numerical features) through a neural network
while letting other features (like categorical
dummy variables) bypass that network

NLP Feature-Engineering Tools - Great NLP

vocab tools like multiple buckets into which to
hash out-of-vocab words, easy definitions of
vocab size

Versatile SavedModel Format + Separate Eval

SavedModel - The pipeline provides for separate
SavedModel for production and and EvalSaved
Model for evaluation, which can help make
production models more lightweight and allow
for evaluation on less expensive hardware than
TPUs used for training, for example

Page 6 www.springml.com
Deployment Methodology
We do not recommend teams start with a blank file when coding a TFX pipeline. There are lots of interdepencies
and calls to the metadata store as each module looks up the details it needs to interact with the model. We
recommend always beginning with one of Google's example pipelines, and modifying it for the current model
as needed.

Thoughts on TFX
TFX seems very powerful for machine learning teams.
One metaphor that the Google team has used to describe TensorFlow Hub, which is part of this new
TensorFlow ecosystem, is that these tools are adding to the machine learning community what adding
version control was for software engineering. That may be more literally true of TensorFlow Hub, but the
concept still applies here. This turns a human machine learning engineer’s actions into code and allows
the entire pipeline to be documented and checked into version control.

A Comparison Between Kubeflow

and TFX
Kubeflow TFX

Ease of Use / Documentation Documentation is up-to date. A steep learning curve to

However, Kubeflow samples are still understand the full pipeline,
not running properly and there are but it is straightforward once
open issues on github you do. Documentation is
up-to-date and templates to
work from are provided, but
full documentation on the
modules is still being added,
so seeing all your options
requires some digging

Underlying Philosophy & A suite of open-source tools from Google’s internal tools for
Architecture different teams that come packaged managing ML pipelines are
to run machine learning pipelines now being released publically.
on a Kubernetes cluster. It includes a They run on an Apache Beam
pipeline server (Kubeflow Pipelines) Pipeline with a supporting SQL
and metadata storage databases MetadataStore database and
are deployed by Airflow or
Kubeflow Pipelines

Page 7 www.springml.com
Data cleanup / Ingest Nuclio functions, Minio StatisticsGen module uses
TensorFlow Data Validation

Data lineage / tracking Nuclio functions MetadataStore provides data

lineage information to clarify
what transforms have been
applied to data

Model training Training using modules MPINet, The Trainer module makes
MXNet , PyTorch, Tensorflow heavy use of TensorFlow API
Training for training

Hyperparameter tuning Available via the Katib component Still in development. Possible
to call AI Platform (by setting
the Trainer’s `executor_class`
to an AI Platform Executor),
but plumbing to connect that
to the Trainer module sounds
like it is still in the works.

Distributed training TFJob CRD in Kubernates cluster. Still in development. Possible

Configuration of nodes can be done to call AI Platform (by setting
by ksonnet the Trainer’s `executor_class`
to an AI Platform Executor),
but plumbing to connect that
to the Trainer module sounds
like it is still in the works.

Iterative Training Supports iterative training by different Purpose-built to facilitate

docker versions using ksonnet iterative training and
automatic deployment (with
automatic tests to ensure new
models are improving)

Model Output Model output saved Outputs SavedModel format

which can be further trained
or deployed to production

Model Serving Serves in containers via Seldon Pushes to TF Serving for

(For Inference at Scale) serving in containers as well
as TF.js, AI Platform & TF Hub

Page 8 www.springml.com
Accelerated Hardware GPU Support (Supports TensorRT). TF Serving supports
Chainer Training component can be GPU-acceleration by converting
used for CUDA operations model to TensorRT

Version control of Nuclio functions, version control Entire infrastructure is code

infrastructure & code using ksonnet. and able to be checked into
Versions of models can be managed version control
using ModelDB

Monitoring UI Argo UI , TFJobs Dashboard, Katib UI Monitoring of training via

(Requires Ambassador) TensorBoard & Pipeline
status via Airflow / Kubeflow
Pipelines UI

Model Performance Tensorboard, Grafana dashboard, Training is monitored via

Monitoring Istio integration TensorBoard & comparison
between model versions is
done via the Evaluator module
(using TF Model Analysis)

Platforms supported Tensorflow, PyTorch, XGBoost TensorFlow, Keras (with

TensorFlow backend) in
the works

Future Outlook A core module, ksonnet, is being Minor changes likely due to
replaced, so we expect significant its newness
changes

Model Operationalization
Outlines things to consider when putting a model pipeline in production.
How do you know when it’s time to retrain your model?
The right time to retrain your model will naturally depend upon your specific business needs, but there are a few
signs that it might be time:

YOU HAVE ACCESS TO MORE DATA

More data has often been shown to have a stronger impact on model performance than the selection of a more
advanced algorithm. If you have significantly more data from your model’s initial deployment, and you can label
that data (if your model requires it), retraining may lead to a significant boost. This is another good reason to
save your training curves from all training runs, as examining those could help a data scientist tell you whether
more data will help in your case.

Page 9 www.springml.com
YOU HAVE ACCESS TO MORE DIVERSE DATA THAT IS MORE REPRESENTATIVE OF THE BREADTH OF DATA ON WHICH
YOU WANT TO BE ABLE TO PREDICT
If you’re getting new data from users, it is very likely more diverse than the original data on which you trained
your model. Taking a computer vision mode as an example, new data may include more lighting conditions,
more races and ages of people, and many more device types capturing the images. Retraining on more diverse
data that represents the kind of data on which you would like to be able predict can help your model perform
better in those real-world cases.

THE VOCABULARY OF THE DATA SOURCE HAS CHANGED OVER TIME

Language changes all the time, and slang changes even more quickly, so a Natural Language Processing model
focusing on a colloquial datasource like Twitter, online reviews, or even news sources needs to be updated with
some regularity. Even in more industrial settings, new terms are introduced as new technologies are developed
and new products are introduced. If your model is based on data that has some changing characteristics or
vocabulary, choosing the right interval at which to retrain your model will help you keep up with what the kids
are really saying or how those new hyperwidgets are being described in maintenance reports

YOUR DESIRED LEVEL OF GRANULARITY HAS CHANGED

Many features are binned before being fed into a model, and if you need to bin your numerical features in a new
way, such as turning quarterly data into monthly data for more specific seasonality predictions, this might also
prompt a retraining of your model. This doesn’t mean that you should always start with the most fine-grained
bins. Often, if you didn’t have enough data upon initially training your model, binning can help improve accuracy.
But as you collect more data, you can make more fine-grained predictions.

YOU’D LIKE TO ADD ADDITIONAL INPUTS OR OUTPUTS

If you’d like to add the ability to identify a new class or take in a new feature for your market predictions, you’ll
need to retrain.

When should a model be promoted to production as the “live” version?

What criteriashould be applied to determine that a new model is now production ready?
How does TFX compare with Kubeflow in this regard?

When training a new version of a model, it should always need to “pass inspection” before being pushed to
production. Additionally, keeping track of how your model is improving across different versions is also
important. Here’s how the systems compare in how they approach this:

Page 10 www.springml.com
Kubeflow
Kubeflow pipelines have an excellent feature to compare accuracy metrics of several runs such as roc-auc-score,
accuracy score, etc. With this regards, one of the best components included with Kubeflow is Seldon. Seldon
allows us to compare two models through A/B testing as well as well as allow. Best model based on results push
for serving.

TFX
TensorFlow Extended has two modules dedicated to this concept of model evaluation.
Evaluator
is designed to provide graphs and statistics for humans to interpret the model. As mentioned in the TFX
module descriptions above, it uses TF Model Analysis to allow you to dive into your model performance to
see on which subsets of your data it is performing well, and where it can be improved. This would be
where a data scientist or engineer evaluating a model (or a series of versions of models) would focus.
ModelValidator
Which is more of an automated gatekeeper. The ModelValidator compares each newly trained model to
the last approved model and only pushes it to production if it is equal or better according to the metric
you are tracking. This is what would be used by an automated system that pushes new models out if they
are being retrained regularly.

How does model inference scale in production, especially for real time
predictions via API?

Both Kubeflow and TFX produce models that are ready to be hosted in a number of deployment scenarios. For
most online use cases, that will mean serving a model as a pod in a Kubernetes cluster which can be autoscale
to support increased traffic.

Kubeflow initially supported the containerization of models through a component called Seldon. It now
supports deploying models to containers through Seldon and TensorFlow Serving (TF Serving), Google’s newer
model-containerization system. TF Serving models can be accessed via REST or gRPC calls.

TensorFlow Extended also outputs models that are easily deployed to TF Serving containers. It can also deploy
models to TensorFlow.js to serve via Javascript in a browser. TensorFlow.js works surprisingly well on even
computer vision models using the webcam and would be a very fast option for many models served online.

Tensorflow Serving pods can be deployed via Kubernetes so that autoscaling kicks in as CPU utilization creeps
up. Both pipelines also support running containerized models on the GPU as well for hardware-accelerated
predictions, at which point a custom metric can be defined to trigger autoscaling based on GPU utilization.

Page 11 www.springml.com
Final Recommendations
We think both of these technologies are very interesting, and that as they continue to develop, it will become
easier for teams to incorporate machine learning into even more aspects of their businesses.

That being said, some critical development is currently in progress for both frameworks in the months to come
(Kubeflow replacing ksonnet with Kustomize and TFX getting full support for Python 3 on Apache Beam and
reaching the official launch of TensorFlow 2.0). We do feel like both platforms are ready for more experimental
R&D projects, however, and that learning one or both would help teams be prepared for the stable launches to
come.

We think some exceptions where we would deploy one or the other (or both together) right now are the
following:
• We would use Kubeflow if we needed to train a large, distributed model across many machines,
particularly if it were on-premises
• We would use Kubeflow if we were considering an infrastructure change and wanted to make sure our
pipelines were well-prepared to run on another cloud provider or hardware deployment
• We would use TFX if our system were designed to automatically retrain and deploy a new model at
regular intervals, such as nightly, as it learns from additional user data
• We would use TFX if we trained many versions of similar models within our organization (for example
many topic clustering models focusing on different domains or types of text input) and wanted a
smooth, repeatable process for training

Both technologies are interesting, and we expect to see much more development and maturation of this
machine learning pipeline space in the years to come.

Page 12 www.springml.com

What Is Kubeflow To Run ML On GCP
No ratings yet
What Is Kubeflow To Run ML On GCP
2 pages
Build ML Products with Kubeflow
No ratings yet
Build ML Products with Kubeflow
38 pages
Ansari H. Mastering TensorFlow. Unleashing The Power of Deep Learning... 2024
No ratings yet
Ansari H. Mastering TensorFlow. Unleashing The Power of Deep Learning... 2024
134 pages
Embuk
No ratings yet
Embuk
36 pages
A Review Paper On Big Data Database'S: Cassandra, Hbase, Hive
No ratings yet
A Review Paper On Big Data Database'S: Cassandra, Hbase, Hive
6 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
37 pages
OWASP Top 10 For LLMs
No ratings yet
OWASP Top 10 For LLMs
14 pages
Machine Learning & DevOps Courses Guide
No ratings yet
Machine Learning & DevOps Courses Guide
1 page
Containers or VMs - Deploy AI Workloads With Ease - 1647197291151001kben
No ratings yet
Containers or VMs - Deploy AI Workloads With Ease - 1647197291151001kben
26 pages
Spark & Scala for Developers
No ratings yet
Spark & Scala for Developers
40 pages
AWS Data Lake Lab: Athena & QuickSight
No ratings yet
AWS Data Lake Lab: Athena & QuickSight
22 pages
LLM Test Case Generation for Software
No ratings yet
LLM Test Case Generation for Software
6 pages
Dags: The Definitive Guide: Everything You Need To Know About Airflow Dags
100% (1)
Dags: The Definitive Guide: Everything You Need To Know About Airflow Dags
72 pages
Google Cloud Skills Badge Courses
No ratings yet
Google Cloud Skills Badge Courses
12 pages
Spark Lab
No ratings yet
Spark Lab
6 pages
Recommend Courses, Books, Projects, Certification...
No ratings yet
Recommend Courses, Books, Projects, Certification...
2 pages
TF On Spark
No ratings yet
TF On Spark
35 pages
CICD Process Flow: Code Build Test Deploy Provision
No ratings yet
CICD Process Flow: Code Build Test Deploy Provision
1 page
Databricks Cloud How To Log Analysis Example
No ratings yet
Databricks Cloud How To Log Analysis Example
9 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
30 pages
Devops For Freshers
No ratings yet
Devops For Freshers
63 pages
Summary of Vaswani - Attention Is All You Need Paper
No ratings yet
Summary of Vaswani - Attention Is All You Need Paper
5 pages
Ec2 Ug PDF
No ratings yet
Ec2 Ug PDF
722 pages
Airflow 101 Mobile
No ratings yet
Airflow 101 Mobile
48 pages
Databricks: Revolutionizing Data Science
No ratings yet
Databricks: Revolutionizing Data Science
5 pages
Getting Started With GKE
No ratings yet
Getting Started With GKE
44 pages
Cours 1 - Intro To Deep Learning
100% (1)
Cours 1 - Intro To Deep Learning
38 pages
(Omran) Introduction To Google Cloud Platform
No ratings yet
(Omran) Introduction To Google Cloud Platform
45 pages
Alteryx Certification Program Guide
No ratings yet
Alteryx Certification Program Guide
8 pages
Aws Lambda
No ratings yet
Aws Lambda
23 pages
Airflow CLI Guide for Developers
No ratings yet
Airflow CLI Guide for Developers
10 pages
Rancher 2.0: Technical Architecture
No ratings yet
Rancher 2.0: Technical Architecture
11 pages
Airflow & Kubernetes for Data Engineers
100% (1)
Airflow & Kubernetes for Data Engineers
47 pages
LLM Chaining & Indexing Workshop
No ratings yet
LLM Chaining & Indexing Workshop
19 pages
FastAPI Prometheus Integration Guide
No ratings yet
FastAPI Prometheus Integration Guide
15 pages
Airflow Sensors & Executors Guide
No ratings yet
Airflow Sensors & Executors Guide
31 pages
Troubleshooting Spark Challenges
No ratings yet
Troubleshooting Spark Challenges
7 pages
Architecture Design and Principles
No ratings yet
Architecture Design and Principles
18 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
Docker DocumentationV1
100% (1)
Docker DocumentationV1
16 pages
Senior Big Data Engineer Profile
No ratings yet
Senior Big Data Engineer Profile
6 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Data Engineer (Azure) Curriculum
No ratings yet
Data Engineer (Azure) Curriculum
3 pages
A Computational View of Autism Using Virtual Reality Technologies in Autism Intervention Uttama Lahiri PDF Download
No ratings yet
A Computational View of Autism Using Virtual Reality Technologies in Autism Intervention Uttama Lahiri PDF Download
109 pages
Machine Learning + Devops Using Azure ML Services
No ratings yet
Machine Learning + Devops Using Azure ML Services
17 pages
Docker Command Reference Guide
No ratings yet
Docker Command Reference Guide
2 pages
Outbox Pattern with Debezium CDC
No ratings yet
Outbox Pattern with Debezium CDC
11 pages
Google Data Engineer Certification Guide
No ratings yet
Google Data Engineer Certification Guide
4 pages
Informatica TDM Architecture
100% (1)
Informatica TDM Architecture
1 page
Google Kubernetes Engine by Google
0% (1)
Google Kubernetes Engine by Google
7 pages
Cloudera Administration PDF
100% (1)
Cloudera Administration PDF
476 pages
MS Azure AWS Comparison Ebook v2
No ratings yet
MS Azure AWS Comparison Ebook v2
37 pages
Aws General
No ratings yet
Aws General
750 pages
Infrastructure SRE-Teams@Datadog
No ratings yet
Infrastructure SRE-Teams@Datadog
7 pages
Mlops Productionalization Brochure
No ratings yet
Mlops Productionalization Brochure
7 pages
OpenSearch for Developers & Analysts
No ratings yet
OpenSearch for Developers & Analysts
6 pages
TensorFlow Extended Part 2 - Model Build - Analysis - and - Serving
No ratings yet
TensorFlow Extended Part 2 - Model Build - Analysis - and - Serving
47 pages
Kubeflow Pipelines for B.Sc. Students
No ratings yet
Kubeflow Pipelines for B.Sc. Students
12 pages
M4 - Production ML Pipelines With Kubeflow Slides
No ratings yet
M4 - Production ML Pipelines With Kubeflow Slides
28 pages
Linux Foundation Certified Kubernetes Administrator (CKA) Program - CKA Exam Questions (2025)
No ratings yet
Linux Foundation Certified Kubernetes Administrator (CKA) Program - CKA Exam Questions (2025)
5 pages
Iot Analytics
No ratings yet
Iot Analytics
14 pages
Java Developer Resume with 6+ Years Experience
No ratings yet
Java Developer Resume with 6+ Years Experience
4 pages
Service - Level - Agreement Complete Guide PDF
No ratings yet
Service - Level - Agreement Complete Guide PDF
126 pages
Web Development Essentials Guide
No ratings yet
Web Development Essentials Guide
4 pages
NetBackup101 AdminGuideI Server
No ratings yet
NetBackup101 AdminGuideI Server
1,283 pages
Project Context: Has A Many Applications or Games That Installed To It. Game Is One of It
No ratings yet
Project Context: Has A Many Applications or Games That Installed To It. Game Is One of It
20 pages
Aws Serverless Complete Guide
100% (4)
Aws Serverless Complete Guide
397 pages
Zimbra NE NG Modules First Step
No ratings yet
Zimbra NE NG Modules First Step
9 pages
Top 5 Skills For 2025 - Part 1
No ratings yet
Top 5 Skills For 2025 - Part 1
12 pages
Introduction To Ethical Hacking
No ratings yet
Introduction To Ethical Hacking
21 pages
No Results Found Meme Generator
No ratings yet
No Results Found Meme Generator
1 page
Privacy Guide for Catholic Schools
No ratings yet
Privacy Guide for Catholic Schools
3 pages
B.tech Operating System (PCC-CS-403)
No ratings yet
B.tech Operating System (PCC-CS-403)
3 pages
Classwise Timetable SE 2024
No ratings yet
Classwise Timetable SE 2024
8 pages
Global It Security Services Provider 2022 23
No ratings yet
Global It Security Services Provider 2022 23
30 pages
Snowflake Schema Explained
No ratings yet
Snowflake Schema Explained
8 pages
NOI 1.4 Integration
No ratings yet
NOI 1.4 Integration
257 pages
Simple JAVA (Russian)
No ratings yet
Simple JAVA (Russian)
231 pages
SMB Unified Communications Solution
No ratings yet
SMB Unified Communications Solution
3 pages
Unit 2
No ratings yet
Unit 2
16 pages
Othman 2020 E R PDF
No ratings yet
Othman 2020 E R PDF
19 pages
MP Ist 115 16
No ratings yet
MP Ist 115 16
16 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
4 pages
TOGAF® Template - Architecture Definition
No ratings yet
TOGAF® Template - Architecture Definition
12 pages
FBPM2 Chapter 11 ProcessMonitoring
No ratings yet
FBPM2 Chapter 11 ProcessMonitoring
82 pages
Opmanager Datasheet
No ratings yet
Opmanager Datasheet
5 pages
LinuxFoundation CKS v2021-09-20 q9
No ratings yet
LinuxFoundation CKS v2021-09-20 q9
10 pages
Information Security Maintenance Guide
No ratings yet
Information Security Maintenance Guide
62 pages
Contactus
No ratings yet
Contactus
2 pages

Comparison Kubeflow TFX

Uploaded by

Comparison Kubeflow TFX

Uploaded by

A comparison of

Kubeflow & TFX

At a high level, the execution of a pipeline proceeds as follows:

Python SDK : DSL compiler : Pipeline Service :

Orchestration controllers : A set of orchestration controllers execute

Artifact storage: The Pods store two kinds of data:

Persistence agent and ML metadata: The Pipeline Persistence Agent watches

TensorFlow Extended (TFX)

What components make up the

Those modules are:

NLP Feature-Engineering Tools - Great NLP

Versatile SavedModel Format + Separate Eval

A Comparison Between Kubeflow

Ease of Use / Documentation Documentation is up-to date. A steep learning curve to

Data lineage / tracking Nuclio functions MetadataStore provides data

Distributed training TFJob CRD in Kubernates cluster. Still in development. Possible

Iterative Training Supports iterative training by different Purpose-built to facilitate

Model Output Model output saved Outputs SavedModel format

Model Serving Serves in containers via Seldon Pushes to TF Serving for

Version control of Nuclio functions, version control Entire infrastructure is code

Monitoring UI Argo UI , TFJobs Dashboard, Katib UI Monitoring of training via

Model Performance Tensorboard, Grafana dashboard, Training is monitored via

Platforms supported Tensorflow, PyTorch, XGBoost TensorFlow, Keras (with

YOU HAVE ACCESS TO MORE DATA

THE VOCABULARY OF THE DATA SOURCE HAS CHANGED OVER TIME

YOUR DESIRED LEVEL OF GRANULARITY HAS CHANGED

YOU’D LIKE TO ADD ADDITIONAL INPUTS OR OUTPUTS

When should a model be promoted to production as the “live” version?

You might also like