KEMBAR78
{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2024-04-30T10:32:27.699899Z", "iopub.status.busy": "2024-04-30T10:32:27.699312Z", "iopub.status.idle": "2024-04-30T10:32:27.703221Z", "shell.execute_reply": "2024-04-30T10:32:27.702625Z" }, "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# Better ML Engineering with ML Metadata\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on TensorFlow.org\n", " \n", " Run in Google Colab\n", " \n", " View source on GitHub\n", "\n", "Download notebook
" ] }, { "cell_type": "markdown", "metadata": { "id": "xHxb-dlhMIzW" }, "source": [ "Assume a scenario where you set up a production ML pipeline to classify penguins. The pipeline ingests your training data, trains and evaluates a model, and pushes it to production.\n", "\n", "However, when you later try using this model with a larger dataset that contains different kinds of penguins, you observe that your model does not behave as expected and starts classifying the species incorrectly.\n", "\n", "At this point, you are interested in knowing:\n", "\n", "* What is the most efficient way to debug the model when the only available artifact is the model in production?\n", "* Which training dataset was used to train the model?\n", "* Which training run led to this erroneous model?\n", "* Where are the model evaluation results?\n", "* Where to begin debugging?\n", "\n", "[ML Metadata (MLMD)](https://github.com/google/ml-metadata) is a library that leverages the metadata associated with ML models to help you answer these questions and more. A helpful analogy is to think of this metadata as the equivalent of logging in software development. MLMD enables you to reliably track the artifacts and lineage associated with the various components of your ML pipeline.\n", "\n", "In this tutorial, you set up a TFX Pipeline to create a model that classifies penguins into three species based on the body mass and the length and depth of their culmens, and the length of their flippers. You then use MLMD to track the lineage of pipeline components." ] }, { "cell_type": "markdown", "metadata": { "id": "3rGF8hLibz6p" }, "source": [ "## TFX Pipelines in Colab\n", "\n", "Colab is a lightweight development environment which differs significantly from a production environment. In production, you may have various pipeline components like data ingestion, transformation, model training, run histories, etc. across multiple, distributed systems. For this tutorial, you should be aware that siginificant differences exist in Orchestration and Metadata storage - it is all handled locally within Colab. Learn more about TFX in Colab [here](https://www.tensorflow.org/tfx/tutorials/tfx/components_keras#background).\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## Setup\n", "\n", "First, we install and import the necessary packages, set up paths, and download data." ] }, { "cell_type": "markdown", "metadata": { "id": "lko0xn8JxI6F" }, "source": [ "### Upgrade Pip\n", "\n", "To avoid upgrading Pip in a system when running locally, check to make sure that we're running in Colab. Local systems can of course be upgraded separately." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:27.707482Z", "iopub.status.busy": "2024-04-30T10:32:27.707004Z", "iopub.status.idle": "2024-04-30T10:32:27.715175Z", "shell.execute_reply": "2024-04-30T10:32:27.714530Z" }, "id": "7pXW--mlxQhY" }, "outputs": [], "source": [ "try:\n", " import colab\n", " !pip install --upgrade pip\n", "except:\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "id": "mQV-Cget1S8t" }, "source": [ "### Install and import TFX" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:27.718738Z", "iopub.status.busy": "2024-04-30T10:32:27.718283Z", "iopub.status.idle": "2024-04-30T10:32:37.922014Z", "shell.execute_reply": "2024-04-30T10:32:37.920818Z" }, "id": "82jOhrcA36YA" }, "outputs": [], "source": [ " !pip install -q tfx" ] }, { "cell_type": "markdown", "metadata": { "id": "q5p3LRwkZRbj" }, "source": [ "### Import packages" ] }, { "cell_type": "markdown", "metadata": { "id": "w1oayJjlQZxS" }, "source": [ "#### Did you restart the runtime?\n", "\n", "If you are using Google Colab, the first time that you run\n", "the cell above, you must restart the runtime by clicking\n", "above \"RESTART RUNTIME\" button or using \"Runtime > Restart\n", "runtime ...\" menu. This is because of the way that Colab\n", "loads packages." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:37.926675Z", "iopub.status.busy": "2024-04-30T10:32:37.926343Z", "iopub.status.idle": "2024-04-30T10:32:42.578343Z", "shell.execute_reply": "2024-04-30T10:32:42.577517Z" }, "id": "zknUh9LrZZf2" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-04-30 10:32:39.287985: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-04-30 10:32:39.288034: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-04-30 10:32:39.289482: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n" ] } ], "source": [ "import os\n", "import tempfile\n", "import urllib\n", "import pandas as pd\n", "\n", "import tensorflow_model_analysis as tfma\n", "from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext" ] }, { "cell_type": "markdown", "metadata": { "id": "OD2cRhwM3ez2" }, "source": [ "Check the TFX, and MLMD versions." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:42.583193Z", "iopub.status.busy": "2024-04-30T10:32:42.582222Z", "iopub.status.idle": "2024-04-30T10:32:43.832615Z", "shell.execute_reply": "2024-04-30T10:32:43.831906Z" }, "id": "z1ut9Wy_Qf1Q" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TFX version: 1.15.0\n", "MLMD version: 1.15.0\n" ] } ], "source": [ "from tfx import v1 as tfx\n", "print('TFX version: {}'.format(tfx.__version__))\n", "import ml_metadata as mlmd\n", "print('MLMD version: {}'.format(mlmd.__version__))" ] }, { "cell_type": "markdown", "metadata": { "id": "UhNtHfuxCGVy" }, "source": [ "## Download the dataset\n", "\n", "In this colab, we use the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) which can be found on [Github](https://github.com/allisonhorst/palmerpenguins). We processed the dataset by leaving out any incomplete records, and drops `island` and `sex` columns, and converted labels to `int32`. The dataset contains 334 records of the body mass and the length and depth of penguins' culmens, and the length of their flippers. You use this data to classify penguins into one of three species." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:43.836302Z", "iopub.status.busy": "2024-04-30T10:32:43.835663Z", "iopub.status.idle": "2024-04-30T10:32:43.977164Z", "shell.execute_reply": "2024-04-30T10:32:43.976398Z" }, "id": "B_NibNnjzGHu" }, "outputs": [ { "data": { "text/plain": [ "('/tmpfs/tmp/tfx-data4bx2jr3d/penguins_processed.csv',\n", " )" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'\n", "_data_root = tempfile.mkdtemp(prefix='tfx-data')\n", "_data_filepath = os.path.join(_data_root, \"penguins_processed.csv\")\n", "urllib.request.urlretrieve(DATA_PATH, _data_filepath)" ] }, { "cell_type": "markdown", "metadata": { "id": "8NXg2bGA19HJ" }, "source": [ "## Create an InteractiveContext\n", "\n", "To run TFX components interactively in this notebook, create an `InteractiveContext`. The `InteractiveContext` uses a temporary directory with an ephemeral MLMD database instance. Note that calls to `InteractiveContext` are no-ops outside the Colab environment.\n", "\n", "In general, it is a good practice to group similar pipeline runs under a `Context`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:43.980477Z", "iopub.status.busy": "2024-04-30T10:32:43.980232Z", "iopub.status.idle": "2024-04-30T10:32:43.985196Z", "shell.execute_reply": "2024-04-30T10:32:43.984584Z" }, "id": "bytrDFKh40mi" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le as root for pipeline outputs.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/metadata.sqlite.\n" ] } ], "source": [ "interactive_context = InteractiveContext()" ] }, { "cell_type": "markdown", "metadata": { "id": "e-58fa9S6Nao" }, "source": [ "## Construct the TFX Pipeline\n", "\n", "A TFX pipeline consists of several components that perform different aspects of the ML workflow. In this notebook, you create and run the `ExampleGen`, `StatisticsGen`, `SchemaGen`, and `Trainer` components and use the `Evaluator` and `Pusher` component to evaluate and push the trained model. \n", "\n", "Refer to the [components tutorial](https://www.tensorflow.org/tfx/tutorials/tfx/components_keras) for more information on TFX pipeline components." ] }, { "cell_type": "markdown", "metadata": { "id": "urh3FTb81yyM" }, "source": [ "Note: Constructing a TFX Pipeline by setting up the individual components involves a lot of boilerplate code. For the purpose of this tutorial, it is alright if you do not fully understand every line of code in the pipeline setup. " ] }, { "cell_type": "markdown", "metadata": { "id": "bnnq7Gf8CHZJ" }, "source": [ "### Instantiate and run the ExampleGen Component" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:43.988338Z", "iopub.status.busy": "2024-04-30T10:32:43.988114Z", "iopub.status.idle": "2024-04-30T10:32:45.014587Z", "shell.execute_reply": "2024-04-30T10:32:45.013975Z" }, "id": "H9zaBZh3C_9x" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.\n" ] }, { "data": { "application/javascript": [ "\n", " if (typeof window.interactive_beam_jquery == 'undefined') {\n", " var jqueryScript = document.createElement('script');\n", " jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n", " jqueryScript.type = 'text/javascript';\n", " jqueryScript.onload = function() {\n", " var datatableScript = document.createElement('script');\n", " datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n", " datatableScript.type = 'text/javascript';\n", " datatableScript.onload = function() {\n", " window.interactive_beam_jquery = jQuery.noConflict(true);\n", " window.interactive_beam_jquery(document).ready(function($){\n", " \n", " });\n", " }\n", " document.head.appendChild(datatableScript);\n", " };\n", " document.head.appendChild(jqueryScript);\n", " } else {\n", " window.interactive_beam_jquery(document).ready(function($){\n", " \n", " });\n", " }" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.\n" ] }, { "data": { "text/html": [ "\n", "\n", "
ExecutionResult at 0x7f843034df10
.execution_id1
.component\n", "\n", "
CsvExampleGen at 0x7f82e19042b0
.inputs{}
.outputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
.exec_properties
['input_base']/tmpfs/tmp/tfx-data4bx2jr3d
['input_config']{\n", " "splits": [\n", " {\n", " "name": "single_split",\n", " "pattern": "*"\n", " }\n", " ]\n", "}
['output_config']{\n", " "split_config": {\n", " "splits": [\n", " {\n", " "hash_buckets": 2,\n", " "name": "train"\n", " },\n", " {\n", " "hash_buckets": 1,\n", " "name": "eval"\n", " }\n", " ]\n", " }\n", "}
['output_data_format']6
['output_file_format']5
['custom_config']None
['range_config']None
['span']0
['version']None
['input_fingerprint']split:single_split,num_files:1,total_bytes:25648,xor_checksum:1714473163,sum_checksum:1714473163
.component.inputs{}
.component.outputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
" ], "text/plain": [ "ExecutionResult(\n", " component_id: CsvExampleGen\n", " execution_id: 1\n", " outputs:\n", " examples: OutputChannel(artifact_type=Examples, producer_component_id=CsvExampleGen, output_key=examples, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False))" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "example_gen = tfx.components.CsvExampleGen(input_base=_data_root)\n", "interactive_context.run(example_gen)" ] }, { "cell_type": "markdown", "metadata": { "id": "nqxye_p1DLmf" }, "source": [ "### Instantiate and run the StatisticsGen Component" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:45.018341Z", "iopub.status.busy": "2024-04-30T10:32:45.017734Z", "iopub.status.idle": "2024-04-30T10:32:48.091605Z", "shell.execute_reply": "2024-04-30T10:32:48.090914Z" }, "id": "s67sHU_vDRds" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
ExecutionResult at 0x7f82e04edac0
.execution_id2
.component\n", "\n", "
StatisticsGen at 0x7f82e1904580
.inputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
.outputs
['statistics']\n", "\n", "
Channel of type 'ExampleStatistics' (1 artifact) at 0x7f82e1904460
.type_nameExampleStatistics
._artifacts
[0]\n", "\n", "
Artifact of type 'ExampleStatistics' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2) at 0x7f82e03122e0
.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2
.span0
.split_names["train", "eval"]
.exec_properties
['stats_options_json']None
['exclude_splits'][]
.component.inputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
.component.outputs
['statistics']\n", "\n", "
Channel of type 'ExampleStatistics' (1 artifact) at 0x7f82e1904460
.type_nameExampleStatistics
._artifacts
[0]\n", "\n", "
Artifact of type 'ExampleStatistics' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2) at 0x7f82e03122e0
.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2
.span0
.split_names["train", "eval"]
" ], "text/plain": [ "ExecutionResult(\n", " component_id: StatisticsGen\n", " execution_id: 2\n", " outputs:\n", " statistics: OutputChannel(artifact_type=ExampleStatistics, producer_component_id=StatisticsGen, output_key=statistics, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False))" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "statistics_gen = tfx.components.StatisticsGen(\n", " examples=example_gen.outputs['examples'])\n", "interactive_context.run(statistics_gen)" ] }, { "cell_type": "markdown", "metadata": { "id": "xib9oRb_ExjJ" }, "source": [ "### Instantiate and run the SchemaGen Component" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:48.095578Z", "iopub.status.busy": "2024-04-30T10:32:48.094904Z", "iopub.status.idle": "2024-04-30T10:32:48.131204Z", "shell.execute_reply": "2024-04-30T10:32:48.130532Z" }, "id": "csmD4CSUE3JT" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
ExecutionResult at 0x7f82e1e22070
.execution_id3
.component\n", "\n", "
SchemaGen at 0x7f82dbb62c10
.inputs
['statistics']\n", "\n", "
Channel of type 'ExampleStatistics' (1 artifact) at 0x7f82e1904460
.type_nameExampleStatistics
._artifacts
[0]\n", "\n", "
Artifact of type 'ExampleStatistics' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2) at 0x7f82e03122e0
.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2
.span0
.split_names["train", "eval"]
.outputs
['schema']\n", "\n", "
Channel of type 'Schema' (1 artifact) at 0x7f82dbb62be0
.type_nameSchema
._artifacts
[0]\n", "\n", "
Artifact of type 'Schema' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3) at 0x7f83b7952670
.type<class 'tfx.types.standard_artifacts.Schema'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3
.exec_properties
['infer_feature_shape']1
['exclude_splits'][]
.component.inputs
['statistics']\n", "\n", "
Channel of type 'ExampleStatistics' (1 artifact) at 0x7f82e1904460
.type_nameExampleStatistics
._artifacts
[0]\n", "\n", "
Artifact of type 'ExampleStatistics' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2) at 0x7f82e03122e0
.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/StatisticsGen/statistics/2
.span0
.split_names["train", "eval"]
.component.outputs
['schema']\n", "\n", "
Channel of type 'Schema' (1 artifact) at 0x7f82dbb62be0
.type_nameSchema
._artifacts
[0]\n", "\n", "
Artifact of type 'Schema' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3) at 0x7f83b7952670
.type<class 'tfx.types.standard_artifacts.Schema'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3
" ], "text/plain": [ "ExecutionResult(\n", " component_id: SchemaGen\n", " execution_id: 3\n", " outputs:\n", " schema: OutputChannel(artifact_type=Schema, producer_component_id=SchemaGen, output_key=schema, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "infer_schema = tfx.components.SchemaGen(\n", " statistics=statistics_gen.outputs['statistics'], infer_feature_shape=True)\n", "interactive_context.run(infer_schema)" ] }, { "cell_type": "markdown", "metadata": { "id": "_pYNlw7BHUjP" }, "source": [ "### Instantiate and run the Trainer Component\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:48.134580Z", "iopub.status.busy": "2024-04-30T10:32:48.134027Z", "iopub.status.idle": "2024-04-30T10:32:48.136981Z", "shell.execute_reply": "2024-04-30T10:32:48.136381Z" }, "id": "MTxf8xs_kKfG" }, "outputs": [], "source": [ "# Define the module file for the Trainer component\n", "trainer_module_file = 'penguin_trainer.py'" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:48.140230Z", "iopub.status.busy": "2024-04-30T10:32:48.139772Z", "iopub.status.idle": "2024-04-30T10:32:48.144902Z", "shell.execute_reply": "2024-04-30T10:32:48.144322Z" }, "id": "f3nLHEmUkRUw" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing penguin_trainer.py\n" ] } ], "source": [ "%%writefile {trainer_module_file}\n", "\n", "# Define the training algorithm for the Trainer module file\n", "import os\n", "from typing import List, Text\n", "\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "\n", "from tfx import v1 as tfx\n", "from tfx_bsl.public import tfxio\n", "\n", "from tensorflow_metadata.proto.v0 import schema_pb2\n", "\n", "# Features used for classification - culmen length and depth, flipper length,\n", "# body mass, and species.\n", "\n", "_LABEL_KEY = 'species'\n", "\n", "_FEATURE_KEYS = [\n", " 'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g'\n", "]\n", "\n", "\n", "def _input_fn(file_pattern: List[Text],\n", " data_accessor: tfx.components.DataAccessor,\n", " schema: schema_pb2.Schema, batch_size: int) -> tf.data.Dataset:\n", " return data_accessor.tf_dataset_factory(\n", " file_pattern,\n", " tfxio.TensorFlowDatasetOptions(\n", " batch_size=batch_size, label_key=_LABEL_KEY), schema).repeat()\n", "\n", "\n", "def _build_keras_model():\n", " inputs = [keras.layers.Input(shape=(1,), name=f) for f in _FEATURE_KEYS]\n", " d = keras.layers.concatenate(inputs)\n", " d = keras.layers.Dense(8, activation='relu')(d)\n", " d = keras.layers.Dense(8, activation='relu')(d)\n", " outputs = keras.layers.Dense(3)(d)\n", " model = keras.Model(inputs=inputs, outputs=outputs)\n", " model.compile(\n", " optimizer=keras.optimizers.Adam(1e-2),\n", " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", " metrics=[keras.metrics.SparseCategoricalAccuracy()])\n", " return model\n", "\n", "\n", "def run_fn(fn_args: tfx.components.FnArgs):\n", " schema = schema_pb2.Schema()\n", " tfx.utils.parse_pbtxt_file(fn_args.schema_path, schema)\n", " train_dataset = _input_fn(\n", " fn_args.train_files, fn_args.data_accessor, schema, batch_size=10)\n", " eval_dataset = _input_fn(\n", " fn_args.eval_files, fn_args.data_accessor, schema, batch_size=10)\n", " model = _build_keras_model()\n", " model.fit(\n", " train_dataset,\n", " epochs=int(fn_args.train_steps / 20),\n", " steps_per_epoch=20,\n", " validation_data=eval_dataset,\n", " validation_steps=fn_args.eval_steps)\n", " model.save(fn_args.serving_model_dir, save_format='tf')" ] }, { "cell_type": "markdown", "metadata": { "id": "qcmSNiqq5QaV" }, "source": [ "Run the `Trainer` component." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:48.147866Z", "iopub.status.busy": "2024-04-30T10:32:48.147607Z", "iopub.status.idle": "2024-04-30T10:32:57.658216Z", "shell.execute_reply": "2024-04-30T10:32:57.657423Z" }, "id": "4AzsMk7oflMg" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "running bdist_wheel\n", "running build\n", "running build_py\n", "creating build\n", "creating build/lib\n", "copying penguin_trainer.py -> build/lib\n", "installing to /tmpfs/tmp/tmp2bjhph4h\n", "running install\n", "running install_lib\n", "copying build/lib/penguin_trainer.py -> /tmpfs/tmp/tmp2bjhph4h\n", "running install_egg_info\n", "running egg_info\n", "creating tfx_user_code_Trainer.egg-info\n", "writing tfx_user_code_Trainer.egg-info/PKG-INFO\n", "writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt\n", "writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt\n", "writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'\n", "reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'\n", "writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'\n", "Copying tfx_user_code_Trainer.egg-info to /tmpfs/tmp/tmp2bjhph4h/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3.9.egg-info\n", "running install_scripts\n", "creating /tmpfs/tmp/tmp2bjhph4h/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL\n", "creating '/tmpfs/tmp/tmp1r3ydm1_/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl' and adding '/tmpfs/tmp/tmp2bjhph4h' to it\n", "adding 'penguin_trainer.py'\n", "adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/METADATA'\n", "adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL'\n", "adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/top_level.txt'\n", "adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/RECORD'\n", "removing /tmpfs/tmp/tmp2bjhph4h\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.\n", "!!\n", "\n", " ********************************************************************************\n", " Please avoid running ``setup.py`` directly.\n", " Instead, use pypa/build, pypa/installer or other\n", " standards-based tools.\n", "\n", " See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.\n", " ********************************************************************************\n", "\n", "!!\n", " self.initialize_options()\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Processing /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/_wheels/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Installing collected packages: tfx-user-code-Trainer\n", "Successfully installed tfx-user-code-Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx_bsl/tfxio/tf_example_record.py:343: parse_example_dataset (from tensorflow.python.data.experimental.ops.parsing_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use `tf.data.Dataset.map(tf.io.parse_example(...))` instead.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx_bsl/tfxio/tf_example_record.py:343: parse_example_dataset (from tensorflow.python.data.experimental.ops.parsing_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use `tf.data.Dataset.map(tf.io.parse_example(...))` instead.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "I0000 00:00:1714473175.420733 172568 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " 1/20 [>.............................] - ETA: 26s - loss: 1.0610 - sparse_categorical_accuracy: 0.1000" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "19/20 [===========================>..] - ETA: 0s - loss: 0.9684 - sparse_categorical_accuracy: 0.6842 " ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "20/20 [==============================] - 2s 17ms/step - loss: 0.9629 - sparse_categorical_accuracy: 0.7000 - val_loss: 0.8934 - val_sparse_categorical_accuracy: 0.7600\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 2/5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " 1/20 [>.............................] - ETA: 0s - loss: 0.7837 - sparse_categorical_accuracy: 0.9000" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "20/20 [==============================] - 0s 9ms/step - loss: 0.7868 - sparse_categorical_accuracy: 0.7650 - val_loss: 0.7069 - val_sparse_categorical_accuracy: 0.7700\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 3/5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " 1/20 [>.............................] - ETA: 0s - loss: 0.6424 - sparse_categorical_accuracy: 0.9000" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "20/20 [==============================] - 0s 9ms/step - loss: 0.5864 - sparse_categorical_accuracy: 0.8150 - val_loss: 0.5397 - val_sparse_categorical_accuracy: 0.7800\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 4/5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " 1/20 [>.............................] - ETA: 0s - loss: 0.6114 - sparse_categorical_accuracy: 0.7000" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "20/20 [==============================] - 0s 10ms/step - loss: 0.4492 - sparse_categorical_accuracy: 0.8150 - val_loss: 0.4520 - val_sparse_categorical_accuracy: 0.7800\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 5/5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " 1/20 [>.............................] - ETA: 0s - loss: 0.2346 - sparse_categorical_accuracy: 0.9000" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "20/20 [==============================] - 0s 9ms/step - loss: 0.4016 - sparse_categorical_accuracy: 0.7900 - val_loss: 0.3730 - val_sparse_categorical_accuracy: 0.8200\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4/Format-Serving/assets\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4/Format-Serving/assets\n" ] }, { "data": { "text/html": [ "\n", "\n", "
ExecutionResult at 0x7f84302e7d60
.execution_id4
.component\n", "\n", "
Trainer at 0x7f82dbc13220
.inputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
['schema']\n", "\n", "
Channel of type 'Schema' (1 artifact) at 0x7f82dbb62be0
.type_nameSchema
._artifacts
[0]\n", "\n", "
Artifact of type 'Schema' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3) at 0x7f83b7952670
.type<class 'tfx.types.standard_artifacts.Schema'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3
.outputs
['model']\n", "\n", "
Channel of type 'Model' (1 artifact) at 0x7f82dbb62580
.type_nameModel
._artifacts
[0]\n", "\n", "
Artifact of type 'Model' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4) at 0x7f84302a0eb0
.type<class 'tfx.types.standard_artifacts.Model'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4
['model_run']\n", "\n", "
Channel of type 'ModelRun' (1 artifact) at 0x7f82dbb624f0
.type_nameModelRun
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelRun' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model_run/4) at 0x7f82dbb642b0
.type<class 'tfx.types.standard_artifacts.ModelRun'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model_run/4
.exec_properties
['train_args']{\n", " "num_steps": 100\n", "}
['eval_args']{\n", " "num_steps": 50\n", "}
['module_file']None
['run_fn']None
['trainer_fn']None
['custom_config']null
['module_path']penguin_trainer@/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/_wheels/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl
.component.inputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
['schema']\n", "\n", "
Channel of type 'Schema' (1 artifact) at 0x7f82dbb62be0
.type_nameSchema
._artifacts
[0]\n", "\n", "
Artifact of type 'Schema' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3) at 0x7f83b7952670
.type<class 'tfx.types.standard_artifacts.Schema'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3
.component.outputs
['model']\n", "\n", "
Channel of type 'Model' (1 artifact) at 0x7f82dbb62580
.type_nameModel
._artifacts
[0]\n", "\n", "
Artifact of type 'Model' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4) at 0x7f84302a0eb0
.type<class 'tfx.types.standard_artifacts.Model'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4
['model_run']\n", "\n", "
Channel of type 'ModelRun' (1 artifact) at 0x7f82dbb624f0
.type_nameModelRun
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelRun' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model_run/4) at 0x7f82dbb642b0
.type<class 'tfx.types.standard_artifacts.ModelRun'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model_run/4
" ], "text/plain": [ "ExecutionResult(\n", " component_id: Trainer\n", " execution_id: 4\n", " outputs:\n", " model: OutputChannel(artifact_type=Model, producer_component_id=Trainer, output_key=model, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)\n", " model_run: OutputChannel(artifact_type=ModelRun, producer_component_id=Trainer, output_key=model_run, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainer = tfx.components.Trainer(\n", " module_file=os.path.abspath(trainer_module_file),\n", " examples=example_gen.outputs['examples'],\n", " schema=infer_schema.outputs['schema'],\n", " train_args=tfx.proto.TrainArgs(num_steps=100),\n", " eval_args=tfx.proto.EvalArgs(num_steps=50))\n", "interactive_context.run(trainer)" ] }, { "cell_type": "markdown", "metadata": { "id": "gdCq5c0f5MyA" }, "source": [ "### Evaluate and push the model\n", "\n", "Use the `Evaluator` component to evaluate and 'bless' the model before using the `Pusher` component to push the model to a serving directory." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:57.662473Z", "iopub.status.busy": "2024-04-30T10:32:57.661655Z", "iopub.status.idle": "2024-04-30T10:32:57.665851Z", "shell.execute_reply": "2024-04-30T10:32:57.665130Z" }, "id": "NDx-fTUb6RUU" }, "outputs": [], "source": [ "_serving_model_dir = os.path.join(tempfile.mkdtemp(),\n", " 'serving_model/penguins_classification')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:57.669565Z", "iopub.status.busy": "2024-04-30T10:32:57.668981Z", "iopub.status.idle": "2024-04-30T10:32:57.673777Z", "shell.execute_reply": "2024-04-30T10:32:57.673159Z" }, "id": "PpS4-wCf6eLR" }, "outputs": [], "source": [ "eval_config = tfma.EvalConfig(\n", " model_specs=[\n", " tfma.ModelSpec(label_key='species', signature_name='serving_default')\n", " ],\n", " metrics_specs=[\n", " tfma.MetricsSpec(metrics=[\n", " tfma.MetricConfig(\n", " class_name='SparseCategoricalAccuracy',\n", " threshold=tfma.MetricThreshold(\n", " value_threshold=tfma.GenericValueThreshold(\n", " lower_bound={'value': 0.6})))\n", " ])\n", " ],\n", " slicing_specs=[tfma.SlicingSpec()])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:32:57.677062Z", "iopub.status.busy": "2024-04-30T10:32:57.676804Z", "iopub.status.idle": "2024-04-30T10:33:02.545378Z", "shell.execute_reply": "2024-04-30T10:33:02.544764Z" }, "id": "kFuH1YTh8vSf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:112: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use eager execution and: \n", "`tf.data.TFRecordDataset(path)`\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:112: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use eager execution and: \n", "`tf.data.TFRecordDataset(path)`\n" ] }, { "data": { "text/html": [ "\n", "\n", "
ExecutionResult at 0x7f84302f69a0
.execution_id5
.component\n", "\n", "
Evaluator at 0x7f82dbdbbc70
.inputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
['model']\n", "\n", "
Channel of type 'Model' (1 artifact) at 0x7f82dbb62580
.type_nameModel
._artifacts
[0]\n", "\n", "
Artifact of type 'Model' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4) at 0x7f84302a0eb0
.type<class 'tfx.types.standard_artifacts.Model'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4
['schema']\n", "\n", "
Channel of type 'Schema' (1 artifact) at 0x7f82dbb62be0
.type_nameSchema
._artifacts
[0]\n", "\n", "
Artifact of type 'Schema' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3) at 0x7f83b7952670
.type<class 'tfx.types.standard_artifacts.Schema'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3
.outputs
['evaluation']\n", "\n", "
Channel of type 'ModelEvaluation' (1 artifact) at 0x7f82dbda8af0
.type_nameModelEvaluation
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelEvaluation' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/evaluation/5) at 0x7f82e052f280
.type<class 'tfx.types.standard_artifacts.ModelEvaluation'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/evaluation/5
['blessing']\n", "\n", "
Channel of type 'ModelBlessing' (1 artifact) at 0x7f82dbda89d0
.type_nameModelBlessing
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelBlessing' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5) at 0x7f82dbc01b50
.type<class 'tfx.types.standard_artifacts.ModelBlessing'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5
.exec_properties
['eval_config']{\n", " "metrics_specs": [\n", " {\n", " "metrics": [\n", " {\n", " "class_name": "SparseCategoricalAccuracy",\n", " "threshold": {\n", " "value_threshold": {\n", " "lower_bound": 0.6\n", " }\n", " }\n", " }\n", " ]\n", " }\n", " ],\n", " "model_specs": [\n", " {\n", " "label_key": "species",\n", " "signature_name": "serving_default"\n", " }\n", " ],\n", " "slicing_specs": [\n", " {}\n", " ]\n", "}
['feature_slicing_spec']None
['fairness_indicator_thresholds']null
['example_splits']null
['module_file']None
['module_path']None
.component.inputs
['examples']\n", "\n", "
Channel of type 'Examples' (1 artifact) at 0x7f82e19334f0
.type_nameExamples
._artifacts
[0]\n", "\n", "
Artifact of type 'Examples' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1) at 0x7f82eacf42b0
.type<class 'tfx.types.standard_artifacts.Examples'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/CsvExampleGen/examples/1
.span0
.split_names["train", "eval"]
.version0
['model']\n", "\n", "
Channel of type 'Model' (1 artifact) at 0x7f82dbb62580
.type_nameModel
._artifacts
[0]\n", "\n", "
Artifact of type 'Model' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4) at 0x7f84302a0eb0
.type<class 'tfx.types.standard_artifacts.Model'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4
['schema']\n", "\n", "
Channel of type 'Schema' (1 artifact) at 0x7f82dbb62be0
.type_nameSchema
._artifacts
[0]\n", "\n", "
Artifact of type 'Schema' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3) at 0x7f83b7952670
.type<class 'tfx.types.standard_artifacts.Schema'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/SchemaGen/schema/3
.component.outputs
['evaluation']\n", "\n", "
Channel of type 'ModelEvaluation' (1 artifact) at 0x7f82dbda8af0
.type_nameModelEvaluation
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelEvaluation' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/evaluation/5) at 0x7f82e052f280
.type<class 'tfx.types.standard_artifacts.ModelEvaluation'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/evaluation/5
['blessing']\n", "\n", "
Channel of type 'ModelBlessing' (1 artifact) at 0x7f82dbda89d0
.type_nameModelBlessing
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelBlessing' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5) at 0x7f82dbc01b50
.type<class 'tfx.types.standard_artifacts.ModelBlessing'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5
" ], "text/plain": [ "ExecutionResult(\n", " component_id: Evaluator\n", " execution_id: 5\n", " outputs:\n", " evaluation: OutputChannel(artifact_type=ModelEvaluation, producer_component_id=Evaluator, output_key=evaluation, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)\n", " blessing: OutputChannel(artifact_type=ModelBlessing, producer_component_id=Evaluator, output_key=blessing, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False))" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluator = tfx.components.Evaluator(\n", " examples=example_gen.outputs['examples'],\n", " model=trainer.outputs['model'],\n", " schema=infer_schema.outputs['schema'],\n", " eval_config=eval_config)\n", "interactive_context.run(evaluator)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.549448Z", "iopub.status.busy": "2024-04-30T10:33:02.548790Z", "iopub.status.idle": "2024-04-30T10:33:02.583883Z", "shell.execute_reply": "2024-04-30T10:33:02.583264Z" }, "id": "NCV9gcCQ966W" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
ExecutionResult at 0x7f82d807dd30
.execution_id6
.component\n", "\n", "
Pusher at 0x7f82dbb9b7f0
.inputs
['model']\n", "\n", "
Channel of type 'Model' (1 artifact) at 0x7f82dbb62580
.type_nameModel
._artifacts
[0]\n", "\n", "
Artifact of type 'Model' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4) at 0x7f84302a0eb0
.type<class 'tfx.types.standard_artifacts.Model'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4
['model_blessing']\n", "\n", "
Channel of type 'ModelBlessing' (1 artifact) at 0x7f82dbda89d0
.type_nameModelBlessing
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelBlessing' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5) at 0x7f82dbc01b50
.type<class 'tfx.types.standard_artifacts.ModelBlessing'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5
.outputs
['pushed_model']\n", "\n", "
Channel of type 'PushedModel' (1 artifact) at 0x7f82e0120df0
.type_namePushedModel
._artifacts
[0]\n", "\n", "
Artifact of type 'PushedModel' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Pusher/pushed_model/6) at 0x7f82e0120220
.type<class 'tfx.types.standard_artifacts.PushedModel'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Pusher/pushed_model/6
.exec_properties
['push_destination']{\n", " "filesystem": {\n", " "base_directory": "/tmpfs/tmp/tmp4qm6blq5/serving_model/penguins_classification"\n", " }\n", "}
['custom_config']null
.component.inputs
['model']\n", "\n", "
Channel of type 'Model' (1 artifact) at 0x7f82dbb62580
.type_nameModel
._artifacts
[0]\n", "\n", "
Artifact of type 'Model' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4) at 0x7f84302a0eb0
.type<class 'tfx.types.standard_artifacts.Model'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Trainer/model/4
['model_blessing']\n", "\n", "
Channel of type 'ModelBlessing' (1 artifact) at 0x7f82dbda89d0
.type_nameModelBlessing
._artifacts
[0]\n", "\n", "
Artifact of type 'ModelBlessing' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5) at 0x7f82dbc01b50
.type<class 'tfx.types.standard_artifacts.ModelBlessing'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Evaluator/blessing/5
.component.outputs
['pushed_model']\n", "\n", "
Channel of type 'PushedModel' (1 artifact) at 0x7f82e0120df0
.type_namePushedModel
._artifacts
[0]\n", "\n", "
Artifact of type 'PushedModel' (uri: /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Pusher/pushed_model/6) at 0x7f82e0120220
.type<class 'tfx.types.standard_artifacts.PushedModel'>
.uri/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43.981209-5usg33le/Pusher/pushed_model/6
" ], "text/plain": [ "ExecutionResult(\n", " component_id: Pusher\n", " execution_id: 6\n", " outputs:\n", " pushed_model: OutputChannel(artifact_type=PushedModel, producer_component_id=Pusher, output_key=pushed_model, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False))" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pusher = tfx.components.Pusher(\n", " model=trainer.outputs['model'],\n", " model_blessing=evaluator.outputs['blessing'],\n", " push_destination=tfx.proto.PushDestination(\n", " filesystem=tfx.proto.PushDestination.Filesystem(\n", " base_directory=_serving_model_dir)))\n", "interactive_context.run(pusher)" ] }, { "cell_type": "markdown", "metadata": { "id": "9K7RzdBzkru7" }, "source": [ "Running the TFX pipeline populates the MLMD Database. In the next section, you use the MLMD API to query this database for metadata information." ] }, { "cell_type": "markdown", "metadata": { "id": "6GRCGQu7RguC" }, "source": [ "## Query the MLMD Database\n", "\n", "The MLMD database stores three types of metadata: \n", "\n", "* Metadata about the pipeline and lineage information associated with the pipeline components\n", "* Metadata about artifacts that were generated during the pipeline run\n", "* Metadata about the executions of the pipeline\n", "\n", "A typical production environment pipeline serves multiple models as new data arrives. When you encounter erroneous results in served models, you can query the MLMD database to isolate the erroneous models. You can then trace the lineage of the pipeline components that correspond to these models to debug your models" ] }, { "cell_type": "markdown", "metadata": { "id": "o0xVYqAkJybK" }, "source": [ "Set up the metadata (MD) store with the `InteractiveContext` defined previously to query the MLMD database." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.587743Z", "iopub.status.busy": "2024-04-30T10:33:02.587231Z", "iopub.status.idle": "2024-04-30T10:33:02.594668Z", "shell.execute_reply": "2024-04-30T10:33:02.594056Z" }, "id": "P1p38etAv0kC" }, "outputs": [], "source": [ "connection_config = interactive_context.metadata_connection_config\n", "store = mlmd.MetadataStore(connection_config)\n", "\n", "# All TFX artifacts are stored in the base directory\n", "base_dir = connection_config.sqlite.filename_uri.split('metadata.sqlite')[0]" ] }, { "cell_type": "markdown", "metadata": { "id": "uq-1ep4suvuZ" }, "source": [ "Create some helper functions to view the data from the MD store." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.598044Z", "iopub.status.busy": "2024-04-30T10:33:02.597520Z", "iopub.status.idle": "2024-04-30T10:33:02.601588Z", "shell.execute_reply": "2024-04-30T10:33:02.601009Z" }, "id": "q1ib8yStu6CW" }, "outputs": [], "source": [ "def display_types(types):\n", " # Helper function to render dataframes for the artifact and execution types\n", " table = {'id': [], 'name': []}\n", " for a_type in types:\n", " table['id'].append(a_type.id)\n", " table['name'].append(a_type.name)\n", " return pd.DataFrame(data=table)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.604934Z", "iopub.status.busy": "2024-04-30T10:33:02.604448Z", "iopub.status.idle": "2024-04-30T10:33:02.609017Z", "shell.execute_reply": "2024-04-30T10:33:02.608380Z" }, "id": "HmqzYZcV3UG5" }, "outputs": [], "source": [ "def display_artifacts(store, artifacts):\n", " # Helper function to render dataframes for the input artifacts\n", " table = {'artifact id': [], 'type': [], 'uri': []}\n", " for a in artifacts:\n", " table['artifact id'].append(a.id)\n", " artifact_type = store.get_artifact_types_by_id([a.type_id])[0]\n", " table['type'].append(artifact_type.name)\n", " table['uri'].append(a.uri.replace(base_dir, './'))\n", " return pd.DataFrame(data=table)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.612072Z", "iopub.status.busy": "2024-04-30T10:33:02.611710Z", "iopub.status.idle": "2024-04-30T10:33:02.616219Z", "shell.execute_reply": "2024-04-30T10:33:02.615643Z" }, "id": "iBdGCZ0CMJDO" }, "outputs": [], "source": [ "def display_properties(store, node):\n", " # Helper function to render dataframes for artifact and execution properties\n", " table = {'property': [], 'value': []}\n", " for k, v in node.properties.items():\n", " table['property'].append(k)\n", " table['value'].append(\n", " v.string_value if v.HasField('string_value') else v.int_value)\n", " for k, v in node.custom_properties.items():\n", " table['property'].append(k)\n", " table['value'].append(\n", " v.string_value if v.HasField('string_value') else v.int_value)\n", " return pd.DataFrame(data=table)" ] }, { "cell_type": "markdown", "metadata": { "id": "1B-jRNH0M0k4" }, "source": [ "First, query the MD store for a list of all its stored `ArtifactTypes`." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.619510Z", "iopub.status.busy": "2024-04-30T10:33:02.619110Z", "iopub.status.idle": "2024-04-30T10:33:02.628520Z", "shell.execute_reply": "2024-04-30T10:33:02.627859Z" }, "id": "6zXSQL8s5dyL" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname
014Examples
116ExampleStatistics
218Schema
320Model
421ModelRun
523ModelEvaluation
624ModelBlessing
726PushedModel
\n", "
" ], "text/plain": [ " id name\n", "0 14 Examples\n", "1 16 ExampleStatistics\n", "2 18 Schema\n", "3 20 Model\n", "4 21 ModelRun\n", "5 23 ModelEvaluation\n", "6 24 ModelBlessing\n", "7 26 PushedModel" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_types(store.get_artifact_types())" ] }, { "cell_type": "markdown", "metadata": { "id": "quOsBgtM3r7S" }, "source": [ "Next, query all `PushedModel` artifacts." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.631621Z", "iopub.status.busy": "2024-04-30T10:33:02.631224Z", "iopub.status.idle": "2024-04-30T10:33:02.639479Z", "shell.execute_reply": "2024-04-30T10:33:02.638768Z" }, "id": "bUv_EI-bEMMu" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
artifact idtypeuri
08PushedModel./Pusher/pushed_model/6
\n", "
" ], "text/plain": [ " artifact id type uri\n", "0 8 PushedModel ./Pusher/pushed_model/6" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pushed_models = store.get_artifacts_by_type(\"PushedModel\")\n", "display_artifacts(store, pushed_models)" ] }, { "cell_type": "markdown", "metadata": { "id": "UecjkVOqJCBE" }, "source": [ "Query the MD store for the latest pushed model. This tutorial has only one pushed model. " ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.642924Z", "iopub.status.busy": "2024-04-30T10:33:02.642367Z", "iopub.status.idle": "2024-04-30T10:33:02.649240Z", "shell.execute_reply": "2024-04-30T10:33:02.648608Z" }, "id": "N8tPvRtcPTrU" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
propertyvalue
0tfx_version1.15.0
1pushed_destination/tmpfs/tmp/tmp4qm6blq5/serving_model/penguins_...
2namepushed_model:2024-04-30T10:33:02.551347
3producer_componentPusher
4pushed1
5pushed_version1714473182
\n", "
" ], "text/plain": [ " property value\n", "0 tfx_version 1.15.0\n", "1 pushed_destination /tmpfs/tmp/tmp4qm6blq5/serving_model/penguins_...\n", "2 name pushed_model:2024-04-30T10:33:02.551347\n", "3 producer_component Pusher\n", "4 pushed 1\n", "5 pushed_version 1714473182" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pushed_model = pushed_models[-1]\n", "display_properties(store, pushed_model)" ] }, { "cell_type": "markdown", "metadata": { "id": "f5Mz4vfP6wHO" }, "source": [ "One of the first steps in debugging a pushed model is to look at which trained model is pushed and to see which training data is used to train that model. \n", "\n", "MLMD provides traversal APIs to walk through the provenance graph, which you can use to analyze the model provenance. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.652489Z", "iopub.status.busy": "2024-04-30T10:33:02.652001Z", "iopub.status.idle": "2024-04-30T10:33:02.656394Z", "shell.execute_reply": "2024-04-30T10:33:02.655799Z" }, "id": "BLfydQVxOwf3" }, "outputs": [], "source": [ "def get_one_hop_parent_artifacts(store, artifacts):\n", " # Get a list of artifacts within a 1-hop of the artifacts of interest\n", " artifact_ids = [artifact.id for artifact in artifacts]\n", " executions_ids = set(\n", " event.execution_id\n", " for event in store.get_events_by_artifact_ids(artifact_ids)\n", " if event.type == mlmd.proto.Event.OUTPUT)\n", " artifacts_ids = set(\n", " event.artifact_id\n", " for event in store.get_events_by_execution_ids(executions_ids)\n", " if event.type == mlmd.proto.Event.INPUT)\n", " return [artifact for artifact in store.get_artifacts_by_id(artifacts_ids)]" ] }, { "cell_type": "markdown", "metadata": { "id": "3G0e0WIE9e9w" }, "source": [ "Query the parent artifacts for the pushed model." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.659468Z", "iopub.status.busy": "2024-04-30T10:33:02.659165Z", "iopub.status.idle": "2024-04-30T10:33:02.667611Z", "shell.execute_reply": "2024-04-30T10:33:02.666962Z" }, "id": "pOEFxucJQ1i6" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
artifact idtypeuri
04Model./Trainer/model/4
17ModelBlessing./Evaluator/blessing/5
\n", "
" ], "text/plain": [ " artifact id type uri\n", "0 4 Model ./Trainer/model/4\n", "1 7 ModelBlessing ./Evaluator/blessing/5" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parent_artifacts = get_one_hop_parent_artifacts(store, [pushed_model])\n", "display_artifacts(store, parent_artifacts)" ] }, { "cell_type": "markdown", "metadata": { "id": "pJror5mf-W0M" }, "source": [ "Query the properties for the model." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.670854Z", "iopub.status.busy": "2024-04-30T10:33:02.670435Z", "iopub.status.idle": "2024-04-30T10:33:02.677163Z", "shell.execute_reply": "2024-04-30T10:33:02.676513Z" }, "id": "OSCb0bg6Qmj4" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
propertyvalue
0namemodel:2024-04-30T10:32:48.149673
1tfx_version1.15.0
2producer_componentTrainer
\n", "
" ], "text/plain": [ " property value\n", "0 name model:2024-04-30T10:32:48.149673\n", "1 tfx_version 1.15.0\n", "2 producer_component Trainer" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exported_model = parent_artifacts[0]\n", "display_properties(store, exported_model)" ] }, { "cell_type": "markdown", "metadata": { "id": "phz1hfzc_UcK" }, "source": [ "Query the upstream artifacts for the model." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.680495Z", "iopub.status.busy": "2024-04-30T10:33:02.680064Z", "iopub.status.idle": "2024-04-30T10:33:02.687912Z", "shell.execute_reply": "2024-04-30T10:33:02.687321Z" }, "id": "nx_-IVhjRGA4" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
artifact idtypeuri
01Examples./CsvExampleGen/examples/1
13Schema./SchemaGen/schema/3
\n", "
" ], "text/plain": [ " artifact id type uri\n", "0 1 Examples ./CsvExampleGen/examples/1\n", "1 3 Schema ./SchemaGen/schema/3" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_parents = get_one_hop_parent_artifacts(store, [exported_model])\n", "display_artifacts(store, model_parents)" ] }, { "cell_type": "markdown", "metadata": { "id": "00jqfk6o_niu" }, "source": [ "Get the training data the model trained with." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.691182Z", "iopub.status.busy": "2024-04-30T10:33:02.690693Z", "iopub.status.idle": "2024-04-30T10:33:02.697101Z", "shell.execute_reply": "2024-04-30T10:33:02.696503Z" }, "id": "2nMECsKvROEX" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
propertyvalue
0split_names[\"train\", \"eval\"]
1input_fingerprintsplit:single_split,num_files:1,total_bytes:256...
2tfx_version1.15.0
3file_formattfrecords_gzip
4payload_formatFORMAT_TF_EXAMPLE
5span0
\n", "
" ], "text/plain": [ " property value\n", "0 split_names [\"train\", \"eval\"]\n", "1 input_fingerprint split:single_split,num_files:1,total_bytes:256...\n", "2 tfx_version 1.15.0\n", "3 file_format tfrecords_gzip\n", "4 payload_format FORMAT_TF_EXAMPLE\n", "5 span 0" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "used_data = model_parents[0]\n", "display_properties(store, used_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "GgTMTaew_3Fe" }, "source": [ "Now that you have the training data that the model trained with, query the database again to find the training step (execution). Query the MD store for a list of the registered execution types." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.700585Z", "iopub.status.busy": "2024-04-30T10:33:02.700206Z", "iopub.status.idle": "2024-04-30T10:33:02.707296Z", "shell.execute_reply": "2024-04-30T10:33:02.706660Z" }, "id": "8cBKQsScaD9a" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname
012tfx.components.example_gen.csv_example_gen.com...
115tfx.components.statistics_gen.component.Statis...
217tfx.components.schema_gen.component.SchemaGen
319tfx.components.trainer.component.Trainer
422tfx.components.evaluator.component.Evaluator
525tfx.components.pusher.component.Pusher
\n", "
" ], "text/plain": [ " id name\n", "0 12 tfx.components.example_gen.csv_example_gen.com...\n", "1 15 tfx.components.statistics_gen.component.Statis...\n", "2 17 tfx.components.schema_gen.component.SchemaGen\n", "3 19 tfx.components.trainer.component.Trainer\n", "4 22 tfx.components.evaluator.component.Evaluator\n", "5 25 tfx.components.pusher.component.Pusher" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_types(store.get_execution_types())" ] }, { "cell_type": "markdown", "metadata": { "id": "wxcue6SggQ_b" }, "source": [ "The training step is the `ExecutionType` named `tfx.components.trainer.component.Trainer`. Traverse the MD store to get the trainer run that corresponds to the pushed model." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T10:33:02.710852Z", "iopub.status.busy": "2024-04-30T10:33:02.710454Z", "iopub.status.idle": "2024-04-30T10:33:02.719404Z", "shell.execute_reply": "2024-04-30T10:33:02.718728Z" }, "id": "Ned8BxHzaunk" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
propertyvalue
0run_fnNone
1trainer_fnNone
2statecomplete
3run_id2024-04-30T10:32:48.149673
4custom_confignull
5eval_args{\\n \"num_steps\": 50\\n}
6pipeline_root/tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43...
7module_fileNone
8module_pathpenguin_trainer@/tmpfs/tmp/tfx-interactive-202...
9component_idTrainer
10pipeline_nameinteractive-2024-04-30T10_32_43.981209
11train_args{\\n \"num_steps\": 100\\n}
\n", "
" ], "text/plain": [ " property value\n", "0 run_fn None\n", "1 trainer_fn None\n", "2 state complete\n", "3 run_id 2024-04-30T10:32:48.149673\n", "4 custom_config null\n", "5 eval_args {\\n \"num_steps\": 50\\n}\n", "6 pipeline_root /tmpfs/tmp/tfx-interactive-2024-04-30T10_32_43...\n", "7 module_file None\n", "8 module_path penguin_trainer@/tmpfs/tmp/tfx-interactive-202...\n", "9 component_id Trainer\n", "10 pipeline_name interactive-2024-04-30T10_32_43.981209\n", "11 train_args {\\n \"num_steps\": 100\\n}" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def find_producer_execution(store, artifact):\n", " executions_ids = set(\n", " event.execution_id\n", " for event in store.get_events_by_artifact_ids([artifact.id])\n", " if event.type == mlmd.proto.Event.OUTPUT)\n", " return store.get_executions_by_id(executions_ids)[0]\n", "\n", "trainer = find_producer_execution(store, exported_model)\n", "display_properties(store, trainer)" ] }, { "cell_type": "markdown", "metadata": { "id": "CYzlTckHClxC" }, "source": [ "## Summary\n", "\n", "In this tutorial, you learned about how you can leverage MLMD to trace the lineage of your TFX pipeline components and resolve issues.\n", "\n", "To learn more about how to use MLMD, check out these additional resources:\n", "\n", "* [MLMD API documentation](https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd)\n", "* [MLMD guide](https://www.tensorflow.org/tfx/guide/mlmd)" ] } ], "metadata": { "colab": { "collapsed_sections": [ "Tce3stUlHN0L" ], "name": "mlmd_tutorial.ipynb", "private_outputs": true, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 0 }