Comparison Kubeflow TFX
Comparison Kubeflow TFX
www.springml.com
© 2019
Introduction
In this report, we compare two technologies that have come out of Google for managing machine learning
pipelines.
The first is Kubeflow, which has been in development since 2018 and was originated as a way of bringing the
ideas of TFX (used only internally at Google at the time) to the public via open source tools and is in the process
of changing as many developments as open source tools come and go.
The second is TensorFlow Extended (TFX) itself. Google announced that it would be making TFX available to the
public at the end of 2018.
Kubeflow
The main mission of Kubeflow is to make scaling machine learning (ML) models and deploying them to
production as simple as possible across production environments, by letting Kubernetes do what it’s great at:
• Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster
<-> production cluster)
• Deploying and managing loosely-coupled microservices
• Scaling based on demand
Pipeline web server: The Pipeline web server gathers data from various services to display relevant views: the list of pipelines
currently running, the history of pipeline execution, the list of data artifacts, debugging information about individual pipeline runs,
execution status about individual pipeline runs
Page 2 www.springml.com
Deploying Kubeflow Pipelines
The steps to deploy Kubeflow Pipelines are enumerated in our “Deploying Kubeflow Pipelines” document.
What are some pros and cons of Kubeflow?
Pros Cons
Kubeflow makes it very easy for data scientists to Ksonnet which is the prime component for
get distributed training and serving on a cluster configuration on Kubeflow will be discontin-
without worrying about infrastructure ued in further versions, and the community
will need to replace it. This will cause some
major changes in Kubeflow code
A variety of open-source tools have been There are several issues with sample pipe-
combined to bring together several good ideas lines and notebooks present on kubeflow
website. These issues are still getting fixed
Kubeflow introduced the concept of fairing which The variety of options of open-source tools
makes it easier for data scientists to implement means the user must be aware and informed
jobs directly in jupyter notebooks along with about the tools available and make informed
model implementations decisions about which components to use in
any given deployment, which leads to a less
clearly defined, universal pipeline
Kubeflow supports multiple frameworks such as The variety of tools also means the user needs
Tensorflow, Pytorch, MXNet, etc. to learn multiple technologies with different
languages to utilize all of their desired tools
within Kubeflow
Page 3 www.springml.com
What runs the pipeline?
The pipeline can be run by Airflow or Kubeflow Pipelines. Airflow seems to be more prominent
positioned in Google's messaging, but that may be because of current adoption in the field.
ExampleValidator
StatisticsGen
SchemaGen
CsvExampleGen Evaluator
Transform Trainer
Pusher
ModelValidator
The above diagram shows how the Pipeline appears within an Airflow job.
CsvExampleGen which ingests data from raw sources like CSVs, BigQuery, TFRecords, splits that data
into training and evaluation sets
StatisticsGen uses TensorFlow Data Validation to provide a window into the raw dataset, calculating
descriptive statistics to identify missing data, values outside expected ranges, etc. This
view also allows a data scientist to examine the features and check for other data
quality problems like post-treatment variables
is one of the simpler components of the pipeline. It examines the statistics created
SchemaGen
from StatisticsGen and creates a data schema to represent the data types of each
feature column, and whether they are required fields or not. It creates the schema
automatically, but developers are expected to review and modify the schema to
confirm its accuracy
ExampleValidator looks for anomalies and missing values in the dataset. For example, it can detect
training-serving skew and data drift as it compares training data to evaluation data or
new data coming in during serving
Page 4 www.springml.com
Transform performs consistent feature engineering on all subsets of the dataset (i.e. train, dev,
test). Here we engineer our features, vocabularies, embeddings, etc and can define
features as categorical, dense, and bucket features for use in different kinds of com-
plex models. The same Transform module can apply to both training data and incom-
ing data in production in order to ensure that the transformations are consistent
across both datasets
Trainer trains the model using TensorFlow Estimators
Evaluator performs deep analysis of the training results. In addition to standard model evalua-
tion statistics, it can show statistics on user-defined slices of the data. This can help
test whether your model is performing too poorly on a particular subgroup of input
data, as well as providing insight into what additional data you need to gather or
additional features you need to engineer in order to improve. Evaluator also helps
trackperformance over time to see how model iteration is improving performance.
This module is based on TF Model Analysis
ModelValidator is like an automated gatekeeper that ensures that new versions of the model are
"good enough" to be pushed to production, especially when new models are trained
and served automatically. It takes two models: the last good evaluated model, and the
current model coming from training. It "blesses" the new model if performance is the
same or better. Fails if not 'better' than the last one. It also does this evaluation on
new data, which helps make sure the model is ready for production data rather than
simply overfitting the training data
Pusher deploys the model to serving infrastructure, like TensorFlow.js, TensorFlow Serving
and TF Hub
Example code to deploy a TFX pipeline to Airflow can be found in this taxi_pipeline.py example. That pipeline is
supported by this taxi_utils.py library. Those files come from Google’s official TFX Developer Tutorial.
StatisticsGen Output: This is an example of the metadata you can see on each feature output by the
StatisticsGen module.
Page 5 www.springml.com
That all sounds very cool.
What are some other pros and cons of TFX?
Pros Cons
Organization & Version Control - The biggest pro Challenging for Quick, Individual One-Off
is the organization that it brings to the pipeline Projects - This is a complex system that an organi-
with a clear, version-controllable approach to the zation would need to really adopt at minimum on
whole machine learning process an entire project level
Focus with Integration - It also allows developers Learning Curve Does Not Start at Ground Level -
to focus on one aspect of the pipeline at a time While we believe discretizing the pipeline into
(or different developers can own different different modules actually makes things easier to
elements) in an organized, coherent fashion understand, this is still starting at "Floor 3" rather
than the "Ground Floor" and thus needs teams
Data Lineage - It provides data lineage features,
able to understand and work with all the compo-
so you always know what transformations have
nents. You're not just building a simple test
been performed on your data by the time of
example here, but an entire production pipeline
model training or other stages in the pipeline.
Data lineage, FTW! New Software - It is still new, so no doubt we will
see some changes over the next year
Consistent Transforms at Different Stages with
Powerful Feature Types - The Transform capabili Underlying Architecture in Development - It uses
ties are very empowering, and the template's Apache Beam, which isn't fully released for
demonstrated use of _CATEGORICAL_FEA Python 3 yet (although it looks like we are close),
TURE_KEYS, _DENSE_FLOAT_FEATURE_KEYS, so there may be some environment reconfigura
and _BUCKET_FEATURE_KEYS allow data scien tions required as it develops
tists/engineers to easily identify feature types and
build some really sophisticated models without Supports Only TensorFlow Framework - This
even realizing it. Concretely, TensorFlow’s DNN seems to currently only be applicable to
LinearCombinedClassifier can treat different TensorFlow, so models built in PyTorch, Caffe,
features differently in order to build a model that MXNet, etc. may not be able to fit in well with an
passes appropriate features (like continuous organization that adopts this methodology
numerical features) through a neural network
while letting other features (like categorical
dummy variables) bypass that network
Page 6 www.springml.com
Deployment Methodology
We do not recommend teams start with a blank file when coding a TFX pipeline. There are lots of interdepencies
and calls to the metadata store as each module looks up the details it needs to interact with the model. We
recommend always beginning with one of Google's example pipelines, and modifying it for the current model
as needed.
Thoughts on TFX
TFX seems very powerful for machine learning teams.
One metaphor that the Google team has used to describe TensorFlow Hub, which is part of this new
TensorFlow ecosystem, is that these tools are adding to the machine learning community what adding
version control was for software engineering. That may be more literally true of TensorFlow Hub, but the
concept still applies here. This turns a human machine learning engineer’s actions into code and allows
the entire pipeline to be documented and checked into version control.
Underlying Philosophy & A suite of open-source tools from Google’s internal tools for
Architecture different teams that come packaged managing ML pipelines are
to run machine learning pipelines now being released publically.
on a Kubernetes cluster. It includes a They run on an Apache Beam
pipeline server (Kubeflow Pipelines) Pipeline with a supporting SQL
and metadata storage databases MetadataStore database and
are deployed by Airflow or
Kubeflow Pipelines
Page 7 www.springml.com
Data cleanup / Ingest Nuclio functions, Minio StatisticsGen module uses
TensorFlow Data Validation
Model training Training using modules MPINet, The Trainer module makes
MXNet , PyTorch, Tensorflow heavy use of TensorFlow API
Training for training
Hyperparameter tuning Available via the Katib component Still in development. Possible
to call AI Platform (by setting
the Trainer’s `executor_class`
to an AI Platform Executor),
but plumbing to connect that
to the Trainer module sounds
like it is still in the works.
Page 8 www.springml.com
Accelerated Hardware GPU Support (Supports TensorRT). TF Serving supports
Chainer Training component can be GPU-acceleration by converting
used for CUDA operations model to TensorRT
Future Outlook A core module, ksonnet, is being Minor changes likely due to
replaced, so we expect significant its newness
changes
Model Operationalization
Outlines things to consider when putting a model pipeline in production.
How do you know when it’s time to retrain your model?
The right time to retrain your model will naturally depend upon your specific business needs, but there are a few
signs that it might be time:
Page 9 www.springml.com
YOU HAVE ACCESS TO MORE DIVERSE DATA THAT IS MORE REPRESENTATIVE OF THE BREADTH OF DATA ON WHICH
YOU WANT TO BE ABLE TO PREDICT
If you’re getting new data from users, it is very likely more diverse than the original data on which you trained
your model. Taking a computer vision mode as an example, new data may include more lighting conditions,
more races and ages of people, and many more device types capturing the images. Retraining on more diverse
data that represents the kind of data on which you would like to be able predict can help your model perform
better in those real-world cases.
When training a new version of a model, it should always need to “pass inspection” before being pushed to
production. Additionally, keeping track of how your model is improving across different versions is also
important. Here’s how the systems compare in how they approach this:
Page 10 www.springml.com
Kubeflow
Kubeflow pipelines have an excellent feature to compare accuracy metrics of several runs such as roc-auc-score,
accuracy score, etc. With this regards, one of the best components included with Kubeflow is Seldon. Seldon
allows us to compare two models through A/B testing as well as well as allow. Best model based on results push
for serving.
TFX
TensorFlow Extended has two modules dedicated to this concept of model evaluation.
Evaluator
is designed to provide graphs and statistics for humans to interpret the model. As mentioned in the TFX
module descriptions above, it uses TF Model Analysis to allow you to dive into your model performance to
see on which subsets of your data it is performing well, and where it can be improved. This would be
where a data scientist or engineer evaluating a model (or a series of versions of models) would focus.
ModelValidator
Which is more of an automated gatekeeper. The ModelValidator compares each newly trained model to
the last approved model and only pushes it to production if it is equal or better according to the metric
you are tracking. This is what would be used by an automated system that pushes new models out if they
are being retrained regularly.
How does model inference scale in production, especially for real time
predictions via API?
Both Kubeflow and TFX produce models that are ready to be hosted in a number of deployment scenarios. For
most online use cases, that will mean serving a model as a pod in a Kubernetes cluster which can be autoscale
to support increased traffic.
Kubeflow initially supported the containerization of models through a component called Seldon. It now
supports deploying models to containers through Seldon and TensorFlow Serving (TF Serving), Google’s newer
model-containerization system. TF Serving models can be accessed via REST or gRPC calls.
TensorFlow Extended also outputs models that are easily deployed to TF Serving containers. It can also deploy
models to TensorFlow.js to serve via Javascript in a browser. TensorFlow.js works surprisingly well on even
computer vision models using the webcam and would be a very fast option for many models served online.
Tensorflow Serving pods can be deployed via Kubernetes so that autoscaling kicks in as CPU utilization creeps
up. Both pipelines also support running containerized models on the GPU as well for hardware-accelerated
predictions, at which point a custom metric can be defined to trigger autoscaling based on GPU utilization.
Page 11 www.springml.com
Final Recommendations
We think both of these technologies are very interesting, and that as they continue to develop, it will become
easier for teams to incorporate machine learning into even more aspects of their businesses.
That being said, some critical development is currently in progress for both frameworks in the months to come
(Kubeflow replacing ksonnet with Kustomize and TFX getting full support for Python 3 on Apache Beam and
reaching the official launch of TensorFlow 2.0). We do feel like both platforms are ready for more experimental
R&D projects, however, and that learning one or both would help teams be prepared for the stable launches to
come.
We think some exceptions where we would deploy one or the other (or both together) right now are the
following:
• We would use Kubeflow if we needed to train a large, distributed model across many machines,
particularly if it were on-premises
• We would use Kubeflow if we were considering an infrastructure change and wanted to make sure our
pipelines were well-prepared to run on another cloud provider or hardware deployment
• We would use TFX if our system were designed to automatically retrain and deploy a new model at
regular intervals, such as nightly, as it learns from additional user data
• We would use TFX if we trained many versions of similar models within our organization (for example
many topic clustering models focusing on different domains or types of text input) and wanted a
smooth, repeatable process for training
Both technologies are interesting, and we expect to see much more development and maturation of this
machine learning pipeline space in the years to come.
Page 12 www.springml.com