Healthwatch
Healthwatch
Tanzu
Healthwatch for VMware Tanzu 2.3
Healthwatch for VMware Tanzu
You can find the most up-to-date technical documentation on the VMware by Broadcom website at:
https://techdocs.broadcom.com/
VMware by Broadcom
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com
Copyright © 2025 Broadcom. All Rights Reserved. The term “Broadcom” refers to Broadcom Inc. and/or its
subsidiaries. For more information, go to https://www.broadcom.com. All trademarks, trade names, service marks,
and logos referenced herein belong to their respective companies.
2
Healthwatch for VMware Tanzu
Contents
Healthwatch for VMware Tanzu ................................. 8
Product Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Overview of the Healthwatch Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Overview of the Healthwatch Exporter tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Healthwatch Exporter for Tanzu Platform for Cloud Foundry . . . . . . . . . . . . . . . . . . 10
Healthwatch Exporter for TKGI Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Healthwatch v2.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Assumed Risks of Using Healthwatch v2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3
Healthwatch for VMware Tanzu
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Download and Install a Tile Using Platform Automation . . . . . . . . . . . . . . . . . . 25
Configure and Deploy your Tile Using the om CLI . . . . . . . . . . . . . . . . . . . . . . . 25
Configuring ..................................................... 28
Configuring Healthwatch ........................................ 28
Configure the Healthwatch Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Assign AZs and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Configure Prometheus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
(Optional) Configure Alertmanager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
(Optional) Configure Grafana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
(Optional) Configure Grafana Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
(Optional) Configure Grafana Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
(Optional) Configure Canary URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
(Optional) Configure Remote Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
(Optional) Configure TKGI Cluster Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(Optional) Configure Errands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(Optional) Configure Syslog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(Optional) Configure Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
(Optional) Configure for OpenTelemetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Deploy Healthwatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4
Healthwatch for VMware Tanzu
5
Healthwatch for VMware Tanzu
6
Healthwatch for VMware Tanzu
7
Healthwatch for VMware Tanzu
This topic provides an overview of Healthwatch for VMware Tanzu features and functionality. For
information about new features and breaking changes, see the Healthwatch Release Notes.
Tanzu Application Service is now called Tanzu Platform for Cloud Foundry. The current
version of Tanzu Platform for Cloud Foundry is 10.x.
Healthwatch allows you to monitor metrics related to the functionality of your Tanzu Operations Manager
platform.
A complete Healthwatch installation includes the Healthwatch tile, and at least one Healthwatch Exporter
tile. There are Healthwatch Exporter tiles for both the VMware Tanzu Platform for Cloud Foundry and
VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) runtimes.
You must install a Healthwatch Exporter tile on each Tanzu Operations Manager foundation you want to
monitor. You can install the Healthwatch tile on the same foundation or on a different foundation, depending
on your desired monitoring configuration.
You can also configure the Healthwatch Exporter tiles to expose metrics to a service or database located
outside your Tanzu Operations Manager foundation, such as an external time-series database (TSDB) or an
installation of the Healthwatch tile on a separate Tanzu Operations Manager foundation. This does not
require you to install the Healthwatch tile on the same Tanzu Operations Manager foundation as the
Healthwatch Exporter tiles.
For a detailed explanation of the Healthwatch architecture, a list of open ports required for each component,
and possible configurations for monitoring metrics with Tanzu Operations Manager or an external service or
database, see Healthwatch Architecture.
Product Snapshot
The following table provides version and version support information about Healthwatch:
Version v2.3.3
8
Healthwatch for VMware Tanzu
Healthwatch deploys instances of Prometheus and Grafana. The Prometheus instance scrapes and stores
metrics from the Healthwatch Exporter tiles and allows you to configure alerts with Alertmanager.
Healthwatch then exports the collected metrics to dashboards in the Grafana UI, allowing you to visualize
the data with charts and graphs and create customized dashboards for long-term monitoring and
troubleshooting.
Prometheus:
Scrapes /metrics endpoints for Healthwatch Exporter tiles, collecting metrics related to
the functionality of platform and runtime-level components that include the following:
Service level indicators (SLIs) for the BOSH Director
Counter, gauge, and container metrics for Tanzu Platform for CF from the
Loggregator Firehose
Grafana: Allows you to visualize the collected metrics data in charts and graphs, as well as create
customized dashboards for easier monitoring and troubleshooting
Alertmanager: Manages and sends alerts according to the alerting rules you configure
9
Healthwatch for VMware Tanzu
Healthwatch Exporter sends metrics through the Loggregator Firehose to a Prometheus exposition endpoint
on the associated metric exporter VMs. The Prometheus instance that exists within your metrics monitoring
system then scrapes the exposition endpoints on the metric exporter VMs and imports those metrics into
your monitoring system.
Healthwatch Exporter for Tanzu Platform for CF exposes the following metrics related to the functionality of
Tanzu Platform for CF components, Tanzu Platform for CF apps, and the Healthwatch Exporter tile:
Counter, gauge, and container metrics for Tanzu Platform for CF from the Loggregator Firehose
SVMs
VMs deployed by Healthwatch Exporter for Tanzu Platform for Cloud Foundry
The Prometheus instance that exists within your metrics monitoring system then scrapes the Prometheus
exposition endpoints on the metric exporter VMs and imports those metrics into your monitoring system.
Healthwatch Exporter for TKGI exposes the following metrics related to the functionality of TKGI
components and the Healthwatch Exporter for TKGI tile:
Currently the OpenTelemetry Collector is co-located on the Tanzu Platform for Cloud Foundry VMs
defined within the Tanzu Platform for Cloud Foundry tile, which means that it can only collect
metrics from VMs running within the Tanzu Platform for Cloud Foundry tile. As a result, metrics
from other service tiles are not available in Healthwatch when you enable the OpenTelemetry
Collector.
10
Healthwatch for VMware Tanzu
With Healthwatch v2.3, you can collect metrics using an OpenTelemetry Collector, but be aware
that the OpenTelemetry Collector used in Tanzu Platform for Cloud Foundry is the Beta version.
11
Healthwatch for VMware Tanzu
This topic contains release notes for Healthwatch for VMware Tanzu v2.3.
For information about the risks and limitations of Healthwatch v2.3, see Assumed Risks of Using
Healthwatch v2.3 and Healthwatch v2.3 Limitations.
For more information about the new v2.3 features, see New Features.
Tanzu Application Service is now called Tanzu Platform for Cloud Foundry. The current
version of Tanzu Platform for Cloud Foundry is 10.x.
v2.3.3
Release Date: June 22, 2025
[Known Issue] If you have only TKGI v1.16 or later installed, the Healthwatch Exporter
troubleshooting dashboard in the Grafana UI shows no data.
Prometheus 2.55.0
Grafana 10.3.10
Alertmanager 0.27.0
PXC 1.0.40
v2.3.2
Release Date: April 15, 2025
12
Healthwatch for VMware Tanzu
[Feature] OpenTelemetry (OTel) dashboards are now added into Grafana default dashboards (if
OTel is enabled).
[Bug Fix] Duplicate VM details in the Foundation Job details dashboard is now fixed for Tanzu
Platform for Cloud Foundry.
[Known Issue] If you have only TKGI v1.16 or later installed, the Healthwatch Exporter
troubleshooting dashboard in the Grafana UI shows no data, see No Data for Healthwatch
Exporter troubleshooting.
[Security Fix] The following CVE was fixed by upgrading Grafana release version:
CVE-2024-8118
Prometheus 2.55.0
Grafana 10.3.10
Alertmanager 0.27.0
PXC 1.0.34
v2.3.1
Release Date: October 3, 2024
[Feature] Healthwatch 2.3.1 can now receive metrics through OpenTelemetery (OTel), even if
Firehose is enabled.
[Bug Fix] Duplicate VM details in Foundation Job details dashboard is now fixed.
[Known Issue] If you have only TKGI v1.16 or later installed, the Healthwatch Exporter
troubleshooting dashboard in the Grafana UI shows no data, see No Data for Healthwatch
Exporter troubleshooting.
v2.3.0
Release Date: June 15th, 2024
[Breaking Changes] Applicable only when using the OpenTelemetry Collector. You will notice
changes in the Healthwatch metrics when you start collecting metrics using the OpenTelemetry
Collector. The changes you come across are listed below:
Metrics from the Firehose-specific components, such as reverse log proxy, are not
available after enabling the OpenTelemetry Collector.
Currently, the OpenTelemetry Collector is collocated on TAS VMs and only collects metrics
from these VMs. As a result, metrics from other tiles are not collected by the
OpenTelemetry Collector.
The Prometheus exporter in the OpenTelemetry Collector does not allow leading
underscore, except in private metrics. Therefore, the OpenTelemetry Collector removes
any leading underscores in metric names.
13
Healthwatch for VMware Tanzu
The SVM forwarder VM does not work with OpenTelemetry. You need to switch off the
SVM forwarder VM if it is switched on.
The metrics listed in the table below have the prefix grafana at the beginning of the metric
name when you use the OpenTelemetery Collector.
Metrics name with Firehose Metric name with the OpenTelemetry Collector
access_evaluation_duration_bucket grafana_access_evaluation_duration_bucket
access_evaluation_duration_count grafana_access_evaluation_duration_count
access_evaluation_duration_sum grafana_access_evaluation_duration_sum
access_permissions_duration_bucket grafana_access_permissions_duration_bucket
access_permissions_duration_count grafana_access_permissions_duration_count
access_permissions_duration_sum grafana_access_permissions_duration_sum
[Feature] Healthwatch 2.3.0 supports using the OpenTelemetry Collector (instead of Firehose) to
collect metrics For more information, see Configure for OpenTelemetry. Once the OpenTelemetry
Collector is configured, Healthwatch automatically switches to collecting metrics from the
OpenTelemetery Collector. Healthwatch expects OpenTelemetry Collector data on port 65331.
[Feature] Healthwatch 2.3.0 is upgraded to Grafana 10 and provides the new features offered by
Grafana 10.
[Security Fix] The following CVE was fixed by upgrading Grafana release version: CVE-2023-
49569
Prometheus 2.52.0
Grafana 10.1.10
Alertmanager 0.27.0
PXC 1.0.29
New Features
Healthwatch v2.3 includes the following features:
Runs a suite of service level indicator (SLI) tests to test the functionality of the TKGI API and
collects metrics from those tests in the TKGI Control Plane dashboard in the Grafana UI. For
more information, see TKGI SLI Exporter VM in Healthwatch Metrics.
Separates Diego capacity metrics by isolation segment in the Diego/Capacity dashboard in the
Grafana UI.
No longer displays metrics from BOSH smoke test deployments in the Jobs and Job Details
dashboards in the Grafana UI.
14
Healthwatch for VMware Tanzu
Allows you to include optional dashboards for the VMware Tanzu RabbitMQ for VMs (Tanzu
RabbitMQ) and VMware Tanzu for MySQL on Cloud Foundry tiles in the Grafana UI.
Allows you to monitor super value metrics (SVMs). For more information about SVMs, see
Configure Prometheus in Configuring Healthwatch, SVM Forwarder VM - Healthwatch Component
Metrics in Healthwatch Metrics, and SVM Forwarder VM - Platform Metrics in Healthwatch
Metrics.
Automatically detects which version of Tanzu Platform for CF or TKGI is installed on your Tanzu
Operations Manager foundation and creates the appropriate dashboard in the Grafana UI.
Allows you to use the Tanzu Operations Manager syslog forwarding feature to forward log
messages from Healthwatch component VMs to an external destination for troubleshooting, such
as a remote server or external syslog aggregation service. For more information about how to
configure syslog forwarding, see (Optional) Configure Syslog in Configuring Healthwatch.
Known Issues
This section contains known issues you might encounter.
15
Healthwatch for VMware Tanzu
Healthwatch Architecture
This topic describes the architecture of the Healthwatch for VMware Tanzu, Healthwatch Exporter for
VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu Kubernetes Grid
Integrated Edition (TKGI) tiles. This topic also describes the possible configurations for monitoring metrics
across multiple VMware Tanzu Operations Manager foundations.
There are three tiles that form the Healthwatch architecture: Healthwatch, Healthwatch Exporter for Tanzu
Platform for Cloud Foundry, and Healthwatch Exporter for TKGI.
A complete Healthwatch installation includes the Healthwatch tile, as well as at least one Healthwatch
Exporter tile. However, you can deploy and use each tile separately as part of an alternate monitoring
configuration.
You must install a Healthwatch Exporter tile on each Tanzu Operations Manager foundation you want to
monitor. You can install the Healthwatch tile on the same Tanzu Operations Manager foundation or on a
different Tanzu Operations Manager foundation, depending on your desired monitoring configuration.
You can also configure the Healthwatch Exporter tiles to expose metrics to a service or database located
outside your Tanzu Operations Manager foundation, such as an external time-series database (TSDB) or an
installation of the Healthwatch tile on a separate Tanzu Operations Manager foundation. This does not
require you to install the Healthwatch tile.
For a detailed explanation of the architecture for each tile, a list of open ports required for each component,
and the possible configurations for monitoring metrics across Tanzu Operations Manager foundations, see
the following sections:
Configuration Options
The Prometheus instance scrapes and stores metrics from the Prometheus endpoints on the metric
exporter VMs that the Healthwatch Exporter tiles deploy. Prometheus also allows you to configure alerts
with Alertmanager.
Healthwatch then exports these metrics to dashboards in the Grafana UI, where you can see the data in
charts and graphs. You can also use Grafana to create customized dashboards for long-term monitoring and
troubleshooting.
16
Healthwatch for VMware Tanzu
The MySQL instance that the Healthwatch tile deploys stores only your Grafana settings,
and does not store any time-series data.
The diagram below illustrates how metrics travel from the Healthwatch Exporter tiles through Prometheus
and to Grafana. It also shows how metrics travel through Prometheus to Alertmanager.
High Availability
You can deploy the Healthwatch tile in high availability (HA) mode with three MySQL nodes and two MySQL
Proxy nodes, or in non-HA mode with one MySQL node and one MySQL Proxy node.
Component Scaling
Healthwatch deploys two Prometheus VMs by default to create an HA Prometheus instance. If you do not
need Prometheus to be HA, you can scale the Prometheus instance vertically to one Prometheus VM. To
further scale the Prometheus instance, you can scale it horizontally by increasing the disk size of each VM
in the Prometheus instance.
Healthwatch deploys a single Grafana VM by default. If you want to make the Grafana instance HA, you
can scale the Grafana instance horizontally.
If you do not want to use any Grafana instances in your Healthwatch deployment, you can set the number
of Grafana, MySQL, and MySQL Proxy instances for your Healthwatch deployment to 0 in the Resource
Config pane of the Healthwatch tile.
For more information about scaling Healthwatch resources, see Healthwatch Components and Resource
Requirements.
This component ... Must communicate with ... Default TCP Port Notes
17
Healthwatch for VMware Tanzu
External data
sources
External
authentication
External SMTP
server
blackbox-exporter External canary target URLs N/A Additional networking rules may be
required, depending on your external
canary target URL configuration.
exporter
tsdb (for TKGI cluster For each cluster: 8443 You need to open these ports only if
discovery) Kube API Server you configure TKGI cluster discovery.
10252
Kube Controller
10251
Manager
10200
Kube Scheduler
etcd (Telegraf
output plugin)
Healthwatch Exporter for Tanzu Platform for Cloud Foundry sends metrics through the Loggregator Firehose
to a Prometheus exposition endpoint on the associated metric exporter VMs. The Prometheus instance in
your metrics monitoring system then scrapes the exposition endpoints on the metric exporter VMs and
imports those metrics into your monitoring system.
You can scale the VMs that Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys vertically,
but you should not scale them horizontally.
18
Healthwatch for VMware Tanzu
This component ... Must communicate with ... Default TCP Port
UAA 443
The Prometheus instance in your metrics monitoring system then scrapes the Prometheus exposition
endpoints on the metric exporter VMs and imports those metrics into your monitoring system.
You can scale the VMs that Healthwatch Exporter for TKGI deploys vertically, but you should not scale
them horizontally.
This component ... Must communicate with ... Default TCP Port
Configuration Options
19
Healthwatch for VMware Tanzu
Healthwatch can be configured in multiple ways, allowing you to monitor metrics across a variety of
platform and foundation configurations. The sections below describe the most common configuration
scenarios:
Monitoring Tanzu Platform for Cloud Foundry on a Single Tanzu Operations Manager Foundation
Monitoring Tanzu Platform for Cloud Foundry and TKGI on a Single Tanzu Operations Manager
Foundation
For more information about installing and configuring the Healthwatch tile and Healthwatch Exporter for
Tanzu Platform for Cloud Foundry, see the following topics:
Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated
Pipeline
Configuring Healthwatch
For more information about installing and configuring the Healthwatch tile and Healthwatch Exporter for
TKGI, see the following topics:
Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated
Pipeline
Configuring Healthwatch
20
Healthwatch for VMware Tanzu
Cloud Foundry, and Healthwatch Exporter for TKGI on the same foundation. The Healthwatch tile
automatically detects Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter
for TKGI on the same Tanzu Operations Manager foundation and adds scrape jobs for both Healthwatch
Exporter tiles to the Prometheus instance.
For more information about installing and configuring the Healthwatch tile, Healthwatch Exporter for Tanzu
Platform for Cloud Foundry, and Healthwatch Exporter for TKGI, see the following topics:
Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated
Pipeline
Configuring Healthwatch
When you configure direct scraping for your multi-foundation Healthwatch deployment, the Prometheus
instance in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes metrics
directly from the metric exporter VMs deployed by the Healthwatch Exporter tiles installed on the Tanzu
Operations Manager foundation you monitor.
To configure your Healthwatch deployment to monitor several Tanzu Operations Manager foundations from
a single monitoring Tanzu Operations Manager foundation using direct scraping, see Configure Multi-
Foundation Monitoring Using Direct Scraping.
When you configure federation for your multi-foundation Healthwatch deployment, the Prometheus instance
in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes a subset of metrics
from the Prometheus instances in the Healthwatch tiles installed on the Tanzu Operations Manager
foundations you monitor.
To configure your Healthwatch deployment to monitor several Tanzu Operations Manager foundations from
a single monitoring Tanzu Operations Manager foundation using federation, see Configure Multi-Foundation
Monitoring Using Federation.
21
Healthwatch for VMware Tanzu
Upgrading Healthwatch
Healthwatch uses the open-source components Prometheus, Grafana, and Alertmanager to scrape, store,
view metrics, and configure alerts.
For information about the new Healthwatch features, see the Healthwatch Release Notes.
To upgrade Healthwatch:
Healthwatch Limitations
2. Review the configuration options for Healthwatch to determine which tiles you must install on your
Tanzu Operations Manager foundations. For more information, see Configuration Options in
Healthwatch Architecture.
3. Install the Healthwatch tile and Healthwatch Exporter tiles on the Tanzu Operations Manager
foundations you want to monitor according to the configuration you identified in the previous step.
For more information about installing the Healthwatch tile and Healthwatch Exporter tiles, see the
topic that applies to your configuration:
4. Configure the Healthwatch component VMs through the Healthwatch tile and Healthwatch Exporter
tile UIs in the Tanzu Operations Manager Installation Dashboard and deploy the tiles.
For more information about configuring and deploying the tiles, see:
Configuring Healthwatch
22
Healthwatch for VMware Tanzu
Installing
The topics in this section describe how to install the Healthwatch for VMware Tanzu, Healthwatch Exporter
for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu Kubernetes
Grid Integrated Edition (TKGI) tiles:
To install, configure, and deploy these tiles through an automated pipeline, see Installing,
Configuring, and Deploying a Tile Through an Automated Pipeline.
To manually install the Healthwatch and Healthwatch Exporter tiles, you must download and install the tiles
from the Broadcom Support Portal.
After you have installed the Healthwatch and Healthwatch Exporter tiles, you can configure them through
the Tanzu Ops Manager Installation Dashboard. See Next Steps after completing the installation.
There are risks to using Healthwatch, including missed email notifications, overwritten
dashboards, and minor data loss during upgrades. For more information about how to
prepare for or prevent these problems, see Assumed Risks of Using Healthwatch.
2. Click the name of the tile you want to install to download the .pivotal file for the tile.
5. Select the .pivotal file that you downloaded from Broadcom Support Portal.
23
Healthwatch for VMware Tanzu
6. Click Open. If the tile is successfully uploaded, it appears in the product list beneath the Import a
Product button.
7. After the upload is complete, click the + icon next to the tile listing to add the tile to your staging
area.
Next Steps
After you have successfully installed the Healthwatch tile and the Healthwatch Exporter tiles for the Tanzu
Operations Manager foundations you want to monitor, continue to one of the following topics to configure
each of the tiles you installed:
To configure and deploy the Healthwatch Exporter tile for Tanzu Platform for Cloud Foundry, see
Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry.
To configure and deploy the Healthwatch Exporter for TKGI tile, see Configuring Healthwatch
Exporter for TKGI.
To install the Healthwatch and Healthwatch Exporter tiles manually, see Installing a Tile
Manually.
Automated pipelines allow you to install, configure, and deploy Tanzu Operations Manager tiles through
automation scripts. For more information, see the Platform Automation documentation.
There are risks to using Healthwatch, including missed email notifications, overwritten
dashboards, and minor data loss during upgrades. For more information about how to
prepare for or prevent these problems, see Assumed Risks of Using Healthwatch.
Prerequisites
Before you use an automated pipeline to install, configure, and deploy a tile, you must have the following:
An existing Concourse pipeline. For an example pipeline configuration, see the Platform Automation
documentation.
24
Healthwatch for VMware Tanzu
The Platform Automation Toolkit Docker image imported into Docker. For more information, see the
Platform Automation documentation.
1. Create a configuration file for the download-product task of your automated pipeline. This
configuration file fetches the tile you want to install from Broadcom Support portal. Copy and paste
one of the following sets of properties into your configuration file.
For Healthwatch:
---
pivnet-api-token: token
pivnet-file-glob: "healthwatch-[^pas|pks].*pivotal"
pivnet-product-slug: p-healthwatch
product-version-regex: 2.2.*
---
pivnet-api-token: token
pivnet-file-glob: "healthwatch-pas-*.pivotal"
pivnet-product-slug: p-healthwatch
product-version-regex: 2.2.*
---
pivnet-api-token: token
pivnet-file-glob: "healthwatch-pks-*.pivotal"
pivnet-product-slug: p-healthwatch
product-version-regex: 2.2.*
2. Upload and stage the tile to the Tanzu Ops Manager Installation Dashboard by adding the upload-
stemcell and upload-and-stage-product jobs to your configuration file.
Unless you are an advanced user, VMware recommends configuring and deploying your tile manually the
first time, so you can see which properties in the automation script map to the configuration settings
present in the tile UI before you modify and re-deploy the tile with the om CLI. For more information about
configuring and deploying tiles with the om CLI, see the om repository on GitHub.
The following is an example of an automation script that configures and deploys the Healthwatch tile:
product-name: p-healthwatch2
product-properties:
.properties.scrape_configs:
25
Healthwatch for VMware Tanzu
value:
- ca: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
scrape_job: |
job_name: foundation1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:9090"
- "5.6.7.8:9090"
server_name: pasexporter
tls_certificates:
cert_pem: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
private_key_pem: |
-----BEGIN RSA PRIVATE KEY-----
SECRET
-----END RSA PRIVATE KEY-----
- ca: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
scrape_job: |
job_name: foundation2
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "9.10.11.12:9090"
server_name: pasexporter
tls_certificates:
cert_pem: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
private_key_pem: |
-----BEGIN RSA PRIVATE KEY-----
SECRET
-----END RSA PRIVATE KEY-----
.properties.enable_basic_auth:
selected_option: enabled
value: enabled
.properties.grafana_authentication:
selected_option: uaa
value: uaa
.tsdb.canary_exporter_port:
value: 9115
.tsdb.scrape_interval:
value: 15s
network-properties:
network:
name: subnet1
other_availability_zones:
- name: us-central1-f
26
Healthwatch for VMware Tanzu
- name: us-central1-c
- name: us-central1-b
singleton_availability_zone:
name: us-central1-f
resource-config:
pxc:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 5
pxc-proxy:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 5
tsdb:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 1
grafana:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 5
errand-config:
smoke-test:
post-deploy-state: true
update-admin-password:
post-deploy-state: true
27
Healthwatch for VMware Tanzu
Configuring
The topics in this section describe how to configure the Healthwatch for VMware Tanzu, Healthwatch
Exporter for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu
Kubernetes Grid Integrated Edition (TKGI) tiles:
Configuring Healthwatch
Multi-Foundation Monitoring
Optional Configuration
Configuring Alerting
Configuring Healthwatch
This topic describes how to manually configure and deploy the Healthwatch for VMware Tanzu tile.
To install, configure, and deploy Healthwatch through an automated pipeline, see Installing,
Configuring, and Deploying a Tile Through an Automated Pipeline.
The Healthwatch tile monitors metrics across one or more Tanzu Operations Manager foundations by
scraping metrics from Healthwatch Exporter tiles installed on each foundation. For more information about
the architecture of the Healthwatch tile, see Healthwatch Tile in Healthwatch Architecture.
28
Healthwatch for VMware Tanzu
After installing Healthwatch, you configure Healthwatch component VMs, including the configuration files
associated with them, through the tile UI. You can also configure errands and system logging, scale VM
instances up or down, and configure load balancers for multiple VM instances.
To quickly deploy the Healthwatch tile to ensure that it deploys successfully before you fully configure it,
you only need to configure the Assign AZ and Networks pane. What follows is an overview of the
configure and deploy procedure.
1. Go to the Healthwatch tile in the Tanzu Ops Manager Installation Dashboard. For more information,
see Configure to the Healthwatch Tile.
2. Assign jobs to your Availability Zones (AZs) and networks. For more information, see Assign AZs
and Networks.
3. Configure the Prometheus pane. For more information, see Configure Prometheus.
4. (Optional) Configure the Alertmanager pane. For more information, see (Optional) Configure
Alertmanager.
5. (Optional) Configure the Grafana pane. For more information, see (Optional) Configure Grafana.
6. (Optional) Configure the Grafana Authentication pane. For more information, see (Optional)
Configure Grafana Authentication.
7. (Optional) Configure the Grafana Dashboards pane. For more information, see (Optional) Configure
Grafana Dashboards.
8. (Optional) Configure the Canary URLs pane. For more information, see (Optional) Configure Canary
URLs.
9. (Optional) Configure the Remote Write pane. For more information, see (Optional) Configure
Remote Write.
10. (Optional) Configure the TKGI Cluster Discovery pane. For more information, see (Optional)
Configure TKGI Cluster Discovery.
11. (Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands.
12. (Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog.
13. (Optional) Configure the Resource Config pane. For more information, see (Optional) Configure
Resources.
14. (Optional) Configure for OpenTelemetry. For more information, see (Optional) Configure for
OpenTelemetry.
After you have configured and deployed the Healthwatch tile, you can configure and deploy the Healthwatch
Exporter tiles for the Tanzu Operations Manager foundations you want to monitor. For more information, see
Next Steps.
29
Healthwatch for VMware Tanzu
2. Under Place singleton jobs in, select the first AZ. Tanzu Operations Manager runs any job with a
single instance in this AZ.
3. Under Balance other jobs in, select one or more other AZs. Tanzu Operations Manager balances
instances of jobs with more than one instance across the AZs that you specify.
4. From the Network dropdown, select the runtime network that you created when configuring the
BOSH Director tile.
5. Click Save.
Configure Prometheus
In the Prometheus pane, you configure the Prometheus instance in the Healthwatch tile to scrape metrics
from the Healthwatch Exporter tiles installed on each Tanzu Operations Manager foundation, as well as any
external services or databases from which you want to collect metrics.
The values that you configure in the Prometheus pane also configure their corresponding properties in the
Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch,
Prometheus Configuration Options, and the Prometheus documentation.
1. Select Prometheus.
2. For Scrape interval, specify the frequency at which you want the Prometheus instance to scrape
Prometheus exposition endpoints for metrics. The Prometheus instance scrapes all Prometheus
exposition endpoints at once through a global scrape. You can enter a value string that specifies
ns, us, µs, ms, s, m, or h. To scrape detailed metrics without consuming too much storage, VMware
recommends using the default value of 15s, (15 seconds).
3. (Optional) To configure the Prometheus instance to scrape metrics from the Healthwatch Exporter
tiles installed on other Tanzu Operations Manager foundations or from external services or
databases, configure additional scrape jobs under Additional scrape jobs. You can configure
scrape jobs for any app or service that exposes metrics using a Prometheus exposition format,
such as Concourse CI. For more information about Prometheus exposition formats, see the
Prometheus documentation.
30
Healthwatch for VMware Tanzu
1. Click Add.
2. For Scrape job configuration parameters, provide the configuration YAML for the scrape
job you want to configure. This job can use any of the properties defined by Prometheus
except those in the tls_config section. Do not prefix the configuration YAML with a dash.
For example:
job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:9090"
- "5.6.7.8:9090"
For the job_name property, do not use the following job names:
Healthwatch-view-pas-exporter
Healthwatch-view-pks-exporter
tsdb
grafana
pks-master-kube-scheduler
pks-master-kube-controller-manager
3. (Optional) To allow the Prometheus instance to communicate with the server for your
external service or database over TLS:
1. For Certificate and private key for TLS, provide a certificate and private key for
the Prometheus instance to use for TLS connections to the server for your external
service or database.
2. For CA certificate for TLS, provide a certificate for the certificate authority (CA)
that the server for your external service or database uses to verify TLS
certificates.
3. For Target server name, enter the name of the server for your external service or
database as it appears on the server’s TLS certificate.
4. If the certificate you provided in Certificate and private key for TLS is signed by
a self-signed CA certificate or a certificate that is signed by a self-signed CA
certificate, select the Skip TLS certificate verification checkbox. When this
checkbox is selected, the Prometheus instance does not verify the identity of the
server for your external service or database. This checkbox is unselected by
default.
31
Healthwatch for VMware Tanzu
If you are using the OpenTelemetry Collector, this step does not apply.
If you are using the OpenTelemetry Collector, this step does not apply.
4. (Optional) For Static IP addresses for Prometheus VMs, enter a comma-separated list of valid
static IP addresses that you want to reserve for the Prometheus instance. You must enter a
separate IP address for each VM in the Prometheus instance. These IP addresses must not be
within the reserved IP ranges you configured in the BOSH Director tile. To find the IP addresses of
the Prometheus VMs:
2. In the TSDB row, record the IP addresses of each Prometheus VM from the IPs column.
The Prometheus instance includes two VMs by default. For more information
about viewing or scaling your VMs, see Healthwatch Components and Resource
Requirements.
5. Click Save.
32
Healthwatch for VMware Tanzu
The values that you configure in the Grafana pane also configure their corresponding properties in the
Grafana configuration file. For more information, see Overview of Configuration Files in Healthwatch,
Grafana, and the Grafana documentation.
1. Select Grafana.
2. Under Grafana UI route, configure the route used to access the Grafana UI by selecting one of the
following options:
Automatically configure in TAS for VMs: If you are installing Healthwatch on an Tanzu
Operations Manager foundation with Tanzu Platform for Cloud Foundry installed,
Healthwatch automatically configures a route for the Grafana UI in Tanzu Platform for
Cloud Foundry. VMware recommends selecting this option when available. You access the
Grafana UI by navigating to https://grafana.sys.DOMAIN in a browser window, where
DOMAIN is the system domain you configured in the Domains pane of the Tanzu Platform
for Cloud Foundry tile. For more information, see the Tanzu Platform for Cloud Foundry
documentation.
Manually configure: Reveals the configuration fields described in the following steps,
where you manually configure the URL and TLS settings for the Grafana UI. To manually
configure the URL and TLS settings for the Grafana UI:
1. For Grafana root URL, enter the URL used to access the Grafana UI. Configuring
this field allows a generic OAuth provider or UAA to redirect users to the Grafana
UI. Alertmanager also uses this URL to generate links to the Grafana UI in alert
messages.
After you deploy the Healthwatch tile for the first time, you must configure a DNS
entry for the Grafana instance in the console for your IaaS using this root URL and
the IP address of either the Grafana VMs or the load balancer associated with the
Grafana instance. The Grafana instance listens on either port 443 or 80, depending
on whether you provide a TLS certificate in the following Certificate and private
key for HTTPS fields. For more information about configuring DNS entries for the
Grafana instance, see Configuring DNS for the Grafana Instances.
33
Healthwatch for VMware Tanzu
in CA certificate for HTTPS. You can generate a self-signed certificate using the
Tanzu Operations Manager root CA, but if you do, your browser warns you that
your CA is invalid every time you access the Grafana UI.
4. Click Generate.
12. For Certificate and private key for HTTPS, provide the Tanzu
Operations Manager root CA certificate that you downloaded in a
previous step.
3. Under Grafana email alerts, choose whether to configure email alerts from the Grafana instance.
VMware recommends using Alertmanager to configure and manage alerts in Healthwatch. If you
34
Healthwatch for VMware Tanzu
require additional or alternative alerts, you can configure the SMTP server for the Grafana instance
to send email alerts.
2. For SMTP server host name, enter the host name of your SMTP server.
3. For SMTP server port, enter the port of your SMTP server.
6. (Optional) To allow the Grafana instance to skip TLS certificate verification when
communicating with your SMTP server over TLS, select the Skip TLS certificate
verification checkbox. When this checkbox is selected, the Grafana instance
does not verify the identity of your SMTP server. This checkbox is unselected by
default.
7. For From address, enter the sender email address that appears on outgoing email
alerts.
8. For From name, enter the sender name that appears on outgoing email alerts.
9. For EHLO client ID, enter the name for the client identity that your SMTP server
uses when sending EHLO commands.
10. For Certificate and private key for TLS, enter a certificate and private key for the
Grafana instance to use for TLS connections to your SMTP server.
To disallow email alerts from the Grafana instance, select Do not configure. Email alerts
are disallowed by default. For more information, see the Grafana documentation.
4. Under HTTP and HTTPS proxy request settings, you choose whether to allow the Grafana
instance to make HTTP and HTTPS requests through proxy servers:
You need to configure proxy settings only if you are deploying Healthwatch in an
air-gapped environment and want to configure alert channels to external
addresses, such as the external Slack webhook.
To allow the Grafana instance to make HTTP and HTTPS requests through a proxy server:
1. Select Configure.
2. For HTTP proxy URL, enter the URL for your HTTP proxy server. The Grafana
instance sends all HTTP and HTTPS requests to this URL, except those from
hosts you configure in the HTTPS proxy URL and Excluded hosts fields.
3. For HTTPS proxy URL, enter the URL for your HTTPS proxy server. The Grafana
instance sends all HTTPS requests to this URL, except those from hosts you
configure in the Excluded hosts field.
4. For Excluded hosts, enter a comma-separated list of the hosts you want to
exclude from proxying. VMware recommends including *.bosh and the range of
your internal network IP addresses so the Grafana instance can still access the
Prometheus instance without going though the proxy server. For example,
35
Healthwatch for VMware Tanzu
To disallow the Grafana instance from making HTTP and HTTPS requests through proxy
servers, select Do not configure. HTTP and HTTPS proxy requests are disallowed by
default.
5. (Optional) For Static IP addresses for Grafana VMs, enter a comma-separated list of valid static
IP addresses that you want to reserve for the Grafana instance. These IP addresses must not be
within the reserved IP ranges you configured in the BOSH Director tile.
6. (Optional) If you want to use Grafana legacy alerting instead of new Grafana Alerting, select the
Opt out of Grafana Alerting checkbox. Please note that this will delete any alerts and changes
made in Grafana Alerting.
7. (Optional) If you want to disable the gravatar, select the Disable gravatar checkbox.
8. (Optional) To log all access to Grafana, select the Enable router logging checkbox. This allows
auditing of all traffic into the system.
9. Click Save.
Include: The Grafana instance creates dashboards in the Grafana UI for metrics from
Tanzu Platform for Cloud Foundry. To specify the version of Tanzu Platform for CF for
which you want the Grafana instance to create dashboards, use the Version selector.
Select one of the following options:
The version of Tanzu Platform for Cloud Foundry that is installed on your Tanzu
Operations Manager foundation.
36
Healthwatch for VMware Tanzu
Exclude: The Grafana instance does not create dashboards in the Grafana UI for metrics
from Tanzu Platform for Cloud Foundry.
Include: The Grafana instance creates dashboards in the Grafana UI for metrics from
TKGI. To specify the version of TKGI for which you want the Grafana instance to create
dashboards, use the Version dropdown. Select one of the following options:
The version of TKGI that is installed on your Tanzu Operations Manager
foundation.
Exclude: The Grafana instance does not create dashboards in the Grafana UI for metrics
from TKGI.
4. Under Tanzu SQL for VMs, select one of the following options:
Include: The Grafana instance creates a dashboard in the Grafana UI for metrics from
Tanzu SQL for VMs.
Exclude: The Grafana instance does not create a dashboard in the Grafana UI for metrics
from Tanzu SQL for VMs.
Include: The Grafana instance creates dashboards in the Grafana UI for metrics from
Tanzu RabbitMQ.
Exclude: The Grafana instance does not create dashboards in the Grafana UI for metrics
from Tanzu RabbitMQ.
The Canary URLs pane configures the Blackbox Exporters in the Prometheus instance. For more
information, see the Blackbox exporter repository on GitHub.
37
Healthwatch for VMware Tanzu
The Blackbox Exporters in the Prometheus instance run canary tests on the fully-qualified domain name
(FQDN) of your Tanzu Operations Manager deployment by default. The results from these canary tests
appear in the Ops Manager Health dashboard in the Grafana UI.
2. For Port, specify the port that the Blackbox Exporter exposes to the Prometheus instance. The
default port is 9115. You do not need to specify a different port unless port 9115 is already in use
on the Prometheus instance.
3. (Optional) Under Additional target URLs, you can configure additional canary target URLs. The
Prometheus instance runs continuous canary tests to these URLs and records the results. To
configure additional canary target URLs:
1. Click Add.
2. For URL, enter the URL to which you want the Prometheus instance to send canary tests.
The Prometheus instance automatically creates scrape jobs for these URLs. You
do not need to create additional scrape jobs for them in the Prometheus pane.
4. Click Save.
The values that you configure in the Remote Write pane also configure their corresponding properties in the
Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch,
Remote Write, and the Prometheus documentation.
3. For Remote storage URL, enter the URL for your remote storage endpoint. For example,
https://REMOTE-STORAGE-FQDN, where REMOTE-STORAGE-FQDN is the FQDN of your remote
storage endpoint.
4. In Remote timeout, enter in seconds the amount of time that the Prometheus VM tries to make a
request to your remote storage endpoint before the request fails.
5. If your remote storage endpoint requires a username and password for login, configure the following
fields:
1. For Remote storage username, enter the username that the Prometheus instance uses to
log in to your remote storage endpoint.
38
Healthwatch for VMware Tanzu
2. For Remote storage password, enter the password that the Prometheus instance uses to
log in to your remote storage endpoint.
6. If your remote storage endpoint requires a bearer token for log in, enter the bearer token that the
Prometheus instance uses to log in to your remote storage endpoint in Bearer token.
If you configure a bearer token for the Prometheus instance to use when logging in
to your remote storage endpoint, you cannot also configure a username and
password.
7. (Optional) To allow the Prometheus instance to communicate with the server for your remote
storage endpoint over TLS:
1. For Certificate and private key for TLS, provide a certificate and private key for the
Prometheus instance to use for TLS connections to your remote storage endpoint.
2. For CA certificate for TLS, provide the certificate for the CA that the server for your
remote storage endpoint uses to verify TLS certificates.
3. For Remote storage server name, enter the name of the server for your remote storage
endpoint as it appears on the server’s TLS certificate.
4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected, the
Prometheus instance does not verify the identity of the server for your remote storage
endpoint. This checkbox is unselected by default.
8. (Optional) To allow the Prometheus instance to make HTTP or HTTPS requests to your remote
storage endpoint through a proxy server, enter the URL for your proxy server in Proxy URL.
9. You can configure more granular settings for writing to your remote storage endpoint by specifying
additional parameters for the shards containing in-memory queues that read from the write-ahead
log in the Prometheus instance. To configure additional parameters for these shards:
1. For Queue capacity, enter how many samples your remote storage endpoint can queue in
memory per shard before the Prometheus instance blocks the queue from reading from the
write-ahead log.
2. For Minimum shards per queue, enter the minimum number of shards the Prometheus
instance can use for each remote write queue. This number is also the number of shards
the Prometheus instance uses when remote write begins after each deployment of the
Healthwatch tile.
3. For Maximum shards per queue, enter the maximum number of shards the Prometheus
instance can use for each remote write queue.
4. For Maximum samples per send, enter the maximum number of samples the Prometheus
instance can send to a shard at a time.
39
Healthwatch for VMware Tanzu
5. For Maximum batch wait time, enter in seconds the maximum amount of time the
Prometheus instance can wait before sending a batch of samples to a shard, whether that
shard has reached the limit configured in Maximum samples per send or not.
6. For Minimum backoff time, enter in milliseconds the minimum amount of time the
Prometheus instance can wait before retrying a failed request to your remote storage
endpoint.
7. For Maximum backoff time, enter in milliseconds the maximum amount of time the
Prometheus instance can wait before retrying a failed request to your remote storage
endpoint.
For more information about configuring these queue parameters, see the Prometheus
documentation.
In the Errands pane, you can select On to always run an errand or Off to never run it.
For more information about how Tanzu Operations Manager manages errands, see the Tanzu Operations
Manager documentation.
1. Select Errands.
2. (Optional) Choose whether to always run or never run the following errands:
Smoke Test Errand: Verifies that the Grafana and Prometheus instances are running.
Update Grafana Admin Password: Updates the administrator password for the Grafana
UI.
3. Click Save.
40
Healthwatch for VMware Tanzu
1. Select Syslog.
2. Under Do you want to configure Syslog forwarding?, select one of the following options:
Yes: Allows syslog forwarding and allows you to edit the following configuration fields.
3. For Address, enter the IP address or DNS domain name of your external destination.
5. For Transport Protocol, select TCP or UDP from the dropdown. This determines which transport
protocol Healthwatch uses to forward system logs to your external destination.
2. For Permitted Peer, enter either the name or SHA1 fingerprint of the remote peer.
3. For SSL Certificate, enter the TLS certificate for your external destination.
7. (Optional) For Enviroment identifier, enter an identifier for your environment. This identifier is
included in the log lines.
8. (Optional) For Queue Size, specify the number of log messages Healthwatch can hold in a buffer
at a time before sending them to your external destination. The default value is 100000.
9. (Optional) To forward debug logs to your external destination, select the Forward Debug Logs
checkbox. This checkbox is unselected by default.
10. (Optional) To specify a custom syslog rule, enter it in Custom rsyslog configuration in
RainerScript syntax. For more information about custom syslog rules, see the Tanzu Platform for
Cloud Foundry documentation. For more information about RainerScript syntax, see the rsyslog
documentation.
2. (Optional) To scale a job, select an option from the dropdown for the resource you want to modify:
Persistent Disk Type: Configures the amount of persistent disk space to allocate to the
job.
41
Healthwatch for VMware Tanzu
3. Ensure that the Internet Connected checkbox is unselected. Activating this checkbox
gives VMs a public IP address that allows outbound Internet access.
4. Click Save.
1. Go to the Healthwatch Exporter tile in the Tanzu Ops Manager Installation Dashboard.
1. Select Settings > Resource Config.
1. Set the TAS Counter Exporter and TAS Gauge Exporter configurations to O
because they are only used by FireHose.
2. Click Save.
2. Save the certificate content. The certificate and key are used when configuring the
Open Telemetry collector.
3. Click DOWNLOAD ROOT CA CERT. The Tanzu Operations Manager root CA is used when
configuring the Open Telemetry collector.
3. Open the VMware Tanzu Platform for Cloud Foundry for VMs tile.
1. You can disable the Enable V1 Firehose and Enable V2 Firehose configurations
if there are no other dependencies on firehose data. These fields don’t exist on
Tanzu Platform for CF.
3. Scroll to the bottom of the Open Telemetry configuration and add a collector for
Healthwatch.
4. For TAS for VMs 6.0, Healthwatch expects a prometheus Open Telemetry collector
that supports mTLS and sends data on port 65331. For example:
prometheus/healthwatch:
endpoint: ":65331"
add_metric_suffixes: false
tls:
ca_pem: "CA-CERT"
42
Healthwatch for VMware Tanzu
cert_pem: "CERT_PEM"
key_pem: "PRIVATE_KEY_PEM"
Where:
2. For Tanzu Platform for Cloud Foundry 10.0, you can configure certificates under
OpenTelemetry Collector Secrets and refer to them in the OTel configuration. For example:
exporters:
prometheus/healthwatch:
endpoint: ":65331"
add_metric_suffixes: false
tls:
ca_pem: '{{ .healthwatch.ca }}'
cert_pem: '{{ .healthwatch.cert }}'
key_pem: '{{ .healthwatch.key }}'
service:
pipelines:
metrics:
exporters:
- prometheus/healthwatch
Add Secrets:
3. For Certificate Authority enter CA-CERT from the Operations Manager root CA.
4. The Client Certificate PEM is the cert_pem of the Healthwatch OTel mtls credential.
5. The Client Ceritificate Private Key PEM is the private_key_pem of the Healthwatch OTel
Mtls credential.
6. Remove the newline character (\n) from the certificates you copy: awk
'{gsub(/\n/,"\n")}1' <file_name> or printf -- "<CERT_DATA>"
7. Click Save.
4. If you made changes to the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile
configuration in Settings > TAS for VMs Metric Exporter VMs > Filter out custom application
metrics, deploy your changes to Healthwatch as explained in the next section.
Deploy Healthwatch
To complete your installation of the Healthwatch tile:
43
Healthwatch for VMware Tanzu
Next Steps
After you have successfully installed the Healthwatch tile, continue to one of the following topics to
configure and deploy the Healthwatch Exporter tiles for the Tanzu Operations Manager foundations you
want to monitor:
If you have Tanzu Platform for Cloud Foundry installed on a Tanzu Operations Manager foundation
you want to monitor, see Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry.
If you have TKGI installed on an a Tanzu Operations Manager foundation you want to monitor, see
Configuring Healthwatch Exporter for TKGI.
To install, configure, and deploy Healthwatch Exporter for Tanzu Platform for Cloud Foundry through an
automated pipeline, see Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.
When installed on an Tanzu Operations Manager foundation you want to monitor, Healthwatch Exporter for
Tanzu Platform for Cloud Foundry deploys metric exporter VMs to generate each type of metric related to
the health of your Tanzu Platform for CF deployment. Healthwatch Exporter for Tanzu Platform for Cloud
Foundry sends metrics through the Loggregator Firehose to a Prometheus exposition endpoint on the
associated metric exporter VMs. The Prometheus instance in your metrics monitoring system then scrapes
the exposition endpoints on the metric exporter VMs and imports those metrics into your monitoring
system. For more information about the architecture of the Healthwatch Exporter for Tanzu Platform for
Cloud Foundry tile, see Healthwatch Exporter for Tanzu Platform for Cloud Foundry in Healthwatch
Architecture.
After installing Healthwatch Exporter for Tanzu Platform for Cloud Foundry, you configure the metric
exporter VMs deployed by Healthwatch Exporter for Tanzu Platform for Cloud Foundry through the tile UI.
You can also configure errands and system logging, and you can scale VM instances up or down and
configure load balancers for multiple VM instances.
If you want to quickly deploy the Healthwatch Exporter for Tanzu Platform for Cloud
Foundry tile to ensure that it deploys successfully before you fully configure it, you only
need to configure the Assign AZ and Networks and BOSH Health Metric Exporter VM
panes.
To configure and deploy the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile:
1. Go to the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile in the Tanzu Ops
Manager Installation Dashboard.
2. Assign jobs to your availability zones (AZs) and networks. For more information, see Assign AZs
and Networks.
44
Healthwatch for VMware Tanzu
3. (Optional) Configure the TAS for VMs Metric Exporter VMs pane. For more information, see
(Optional) Configure Tanzu Platform for Cloud Foundry Metric Exporter VMs.
4. Configure the BOSH Health Metric Exporter VM pane. For more information, see Configure the
BOSH Health Metric Exporter VM.
5. (Optional) Configure the BOSH Deployment Metric Exporter VM pane. For more information, see
(Optional) Configure the BOSH Deployment Metric Exporter VM.
6. (Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands.
7. (Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog.
8. (Optional) Configure the Resource Config pane. For more information, see (Optional) Configure
Resources.
9. From the Tanzu Ops Manager Installation Dashboard, deploy the Healthwatch Exporter for Tanzu
Platform for Cloud Foundry tile. For more information, see Deploy Healthwatch Exporter for Tanzu
Platform for Cloud Foundry.
10. After you have finished installing, configuring, and deploying Healthwatch Exporter for Tanzu
Platform for Cloud Foundry, configure a scrape job for Healthwatch Exporter for Tanzu Platform for
Cloud Foundry in the Prometheus VM in your monitoring system. For more information, see
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry.
You don't need to configure a scrape job for installations of Healthwatch Exporter
for Tanzu Platform for Cloud Foundry that are on the same Tanzu Operations
Manager foundation as your Healthwatch for VMware Tanzu tile. The Prometheus
instance in the Healthwatch tile automatically discovers and scrapes Healthwatch
Exporter tiles that are installed on the same Tanzu Operations Manager foundation
as the Healthwatch tile.
2. Click the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile.
2. Under Place singleton jobs in, select the first AZ. Tanzu Operations Manager runs any job with a
single instance in this AZ.
3. Under Balance other jobs in, select one or more other AZs. Tanzu Operations Manager balances
instances of jobs with more than one instance across the AZs that you specify.
45
Healthwatch for VMware Tanzu
4. From the Network dropdown, select the runtime network that you created when configuring the
BOSH Director tile. For more information about Tanzu Platform for Cloud Foundry networks, see the
Tanzu Operations Manager documentation.
5. (Optional) If you want to assign jobs to a service network in addition to your runtime network, select
it from the Services Network dropdown. For more information about Tanzu Platform for Cloud
Foundry service networks, see the Tanzu Operations Manager documentation.
6. Click Save.
You can also deploy two other VMs: the Tanzu Platform for Cloud Foundry service level indicator (SLI)
exporter VM and the certificate expiration metric exporter VM.
The IP addresses you configure in the TAS for VMs Metric Exporter VMs pane must not
be within the reserved IP ranges you configured in the BOSH Director tile.
2. (Optional) For Static IP address for counter metric exporter VM, enter a valid static IP address
that you want to reserve for the counter metric exporter VM.
3. (Optional) For Static IP address for gauge metric exporter VM, enter a valid static IP address
that you want to reserve for the gauge metric exporter VM.
4. (Optional) For Static IP address for TAS for VMs SLI exporter VM, enter a valid static IP address
that you want to reserve for the Tanzu Platform for Cloud Foundry SLI exporter VM. The Tanzu
Platform for Cloud Foundry SLI exporter VM generates SLIs that allow you to monitor whether the
core functions of the Cloud Foundry Command-Line Interface (cf CLI) are working as expected. The
cf CLI allows developers to create and manage apps through Tanzu Platform for Cloud Foundry. For
more information, see Tanzu Platform for Cloud Foundry SLI Exporter VM in Healthwatch Metrics.
5. (Optional) For Static IP address for certificate expiration metric exporter VM, enter a valid static
IP address that you want to reserve for the certificate expiration metric exporter VM. The certificate
expiration metric exporter VM collects metrics that show when certificates in your Tanzu
Operations Manager deployment are due to expire. For more information, see Certificate Expiration
Metric Exporter VM and Monitoring Certificate Expiration.
If you have both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for TKGI installed on the same Tanzu Operations Manager
foundation, scale the certificate expiration metric exporter VM to zero instances in
the Resource Config pane in one of the Healthwatch Exporter tiles. Otherwise,
46
Healthwatch for VMware Tanzu
the two certificate expiration metric exporter VMs create redundant sets of
metrics.
6. (Optional) If your Tanzu Operations Manager deployment uses self-signed certificates, select the
Skip TLS certificate verification for certificate metric exporter VM checkbox. When this
checkbox is selected, the certificate expiration metric exporter VM does not verify the identity of
the Tanzu Operations Manager VM. This checkbox is unselected by default.
7. Under cf CLI version, select from the dropdown the version of the cf CLI that your Tanzu Platform
for Cloud Foundry deployment uses:
If you have TAS for VMs v4.0 or later, or Tanzu Platform for Cloud Foundry installed, select
CF CLI 8. This allows the SLI exporter VM to run SLI tests for cf CLI v8.
Tanzu Application Service versions 4.0 and later are supported. This includes
Tanzu Platform for Cloud Foundry 10.x+.
8. (Optional) If Metric Registrar is configured in your Tanzu Platform for Cloud Foundry tile, and
you do not want Healthwatch to scrape custom application metrics, select the Filter out custom
application metrics checkbox.
9. Click Save.
2. Under Availability zone, select the AZ on which you want Healthwatch Exporter for Tanzu Platform
for Cloud Foundry to deploy the BOSH health metric exporter VM.
3. Under VM type, select from the dropdown the type of VM you want Healthwatch Exporter for Tanzu
Platform for Cloud Foundry to deploy.
4. Click Save.
47
Healthwatch for VMware Tanzu
If you have both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for TKGI installed on the same Tanzu Operations Manager
foundation, scale the BOSH health metric exporter VM to zero instances in the
Resource Config pane in one of the Healthwatch Exporter tiles. Otherwise, the
two sets of BOSH health metric exporter VM metrics cause a 401 error in your
BOSH Director deployment, and one set of metrics reports that the BOSH Director
is down in the Grafana UI. For more information, see BOSH Health Metrics Cause
Errors When Two Healthwatch Exporter Tiles Are Installed in Troubleshooting
Healthwatch.
2. (Optional) For UAA client credentials, enter the username and secret for the UAA client that the
BOSH deployment metric exporter VM uses to access the BOSH Director VM. For more
information, see Create a UAA Client for the BOSH Deployment Metric Exporter VM.
3. (Optional) For Static IP address for BOSH deployment metric exporter VM, enter a valid static
IP address that you want to reserve for the BOSH deployment metric exporter VM. This IP address
must not be within the reserved IP ranges you configured in the BOSH Director tile.
4. Click Save.
If you have both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for TKGI installed on the same Tanzu Operations Manager
foundation, scale the BOSH deployment metric exporter VM to zero instances in the
Resource Config pane in one of the Healthwatch Exporter tiles. Otherwise, the two BOSH
deployment metric exporter VMs create redundant sets of metrics.
To create a UAA client for the BOSH deployment metric exporter VM:
2. Record the IP address for the BOSH Director VM and the login and administrator credentials for the
BOSH Director UAA instance. For more information about internal authentication settings for your
Tanzu Operations Manager deployment, see the Tanzu Operations Manager documentation.
48
Healthwatch for VMware Tanzu
3. Record the IP address in the IPs column of the BOSH Director row.
5. In the Uaa Admin Client Credentials row of the BOSH Director section, click
Link to Credential.
6. Record the value of password. This value is the secret for Uaa Admin Client
Credentials.
8. In the Uaa Login Client Credentials row of the BOSH Director section, click Link
to Credential.
9. Record the value of password. This value is the secret for Uaa Login Client
Credentials.
1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.
2. Click Settings.
9. Record the IP address in the IPs column of the BOSH Director row.
11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.
12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.
1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.
2. Click Settings.
49
Healthwatch for VMware Tanzu
9. Record the IP address in the IPs column of the BOSH Director row.
11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.
12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.
3. SSH into the Tanzu Operations Manager VM by following the procedure in the Tanzu Operations
Manager documentation.
Where BOSH-DIRECTOR-IP is the IP address for the BOSH Director VM that you recorded from the
Status tab in the BOSH Director tile in an earlier step.
If your Tanzu Operations Manager deployment uses internal authentication, log in to the
UAA instance by running:
Where UAA-LOGIN-CLIENT-SECRET is the secret you recorded from the Uaa Login Client
Credentials row in the Credentials tab in the BOSH Director tile in an earlier step.
If your Tanzu Operations Manager deployment uses SAML or LDAP, log in to the UAA
instance by running:
Where BOSH-UAA-CLIENT-SECRET is the secret you recorded from the Uaa Bosh Client
Credentials row in the Credentials tab in the BOSH Director tile in a previous step.
6. When prompted, enter the UAA administrator client username admin and the secret you recorded
from the Uaa Admin Client Credentials row in the Credentials tab in the BOSH Director tile in a
previous step.
7. Create a UAA client for the BOSH deployment metric exporter VM by running:
50
Healthwatch for VMware Tanzu
Where:
CLIENT-USERNAME is the username you want to set for the UAA client.
CLIENT-SECRET is the secret you want to set for the UAA client.
9. Click the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile.
11. For UAA client credentials, enter the username and secret for the UAA client you just created.
In the Errands pane, you can select On to always run an errand or Off to never run it.
For more information about how Tanzu Operations Manager manages errands, see the Tanzu Operations
Manager documentation.
1. Select Errands.
2. (Optional) Choose whether to always run or never run the following errands:
Smoke Tests: Verifies that the metric exporter VMs are running.
Cleanup: Deletes any existing BOSH deployments created by the BOSH health metric
exporter VM for running SLI tests.
Remove CF SLI User: Deletes the user account that the Tanzu Platform for Cloud Foundry
SLI exporter VM creates to run the Tanzu Platform for Cloud Foundry SLI test suite. For
more information, see Tanzu Platform for Cloud Foundry SLI Exporter VM.
3. Click Save.
1. Select Syslog.
2. (Optional) Under Do you want to configure Syslog forwarding?, select one of the following
options:
51
Healthwatch for VMware Tanzu
Yes: Allows syslog forwarding and allows you to edit the configuration fields described
below.
3. For Address, enter the IP address or DNS domain name of your external destination.
5. For Transport Protocol, select TCP or UDP from the dropdown menu. This determines which
transport protocol Healthwatch Exporter for Tanzu Platform for Cloud Foundry uses to forward
system logs to your external destination.
2. In Permitted Peer, enter either the name or SHA1 fingerprint of the remote peer.
3. In SSL Certificate, enter the TLS certificate for your external destination.
7. (Optional) In Queue Size, specify the number of log messages Healthwatch Exporter for Tanzu
Platform for Cloud Foundry can hold in a buffer at a time before sending them to your external
destination. The default value is 100000.
8. (Optional) To forward debug logs to your external destination, select the Forward Debug Logs
checkbox. This checkbox is unselected by default.
2. (Optional) To scale a job, select an option from the dropdown for the resource you want to modify:
Persistent Disk Type: Configures the amount of persistent disk space to allocate to the
job
52
Healthwatch for VMware Tanzu
3. Ensure that the Internet Connected checkbox is unselected. Selecting this checkbox
gives VMs a public IP address that allows outbound Internet access.
4. (Optional) The instance count for the SVM Forwarder VM is set to 0 by default. This VM emits
Healthwatch-generated super value metrics (SVMs) into the Loggregator Firehose. To deploy the
SVM Forwarder VM, increase the instance count by selecting from the Instances dropdown. You
do not need to deploy this VM unless you use a third-party nozzle that can export the SVMs to an
external system, such as a remote server or a syslog aggregation service. For more information
about the SVM Forwarder VM, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM
- Healthwatch Component Metrics in Healthwatch Metrics.
If you installed the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile
before installing the Healthwatch tile, you may need to re-deploy Healthwatch
Exporter after deploying the SVM Forwarder VM. For more information, see Deploy
Healthwatch Exporter for Tanzu Platform for Cloud Foundry.
5. (Optional) Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys the counter and
gauge metric exporter VMs by default. If you do not want to collect both of these metric types, set
the instance count to 0 for the VMs associated with the metrics you do not want to collect.
6. Click Save.
If you monitor metrics using the Healthwatch tile on an Tanzu Operations Manager foundation, see
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry in
Healthwatch.
You don't need to configure a scrape job for installations of Healthwatch Exporter
for Tanzu Platform for Cloud Foundry that are on the same Tanzu Operations
Manager foundation as your Healthwatch tile. The Prometheus instance in the
Healthwatch tile automatically discovers and scrapes Healthwatch Exporter tiles
53
Healthwatch for VMware Tanzu
that are installed on the same Tanzu Operations Manager foundation as the
Healthwatch tile.
If you monitor metrics using a service or database located outside your Tanzu Operations Manager
foundation, such as from an external TSDB, see Configure a Scrape Job for Healthwatch Exporter
for Tanzu Platform for Cloud Foundry in an External Monitoring System.
1. Open network communication paths from your external service or database to the metric exporter
VMs in Healthwatch Exporter for Tanzu Platform for Cloud Foundry. The procedure to open these
network paths differs depending on your Tanzu Operations Manager foundation’s IaaS. For a list of
TCP ports used by each metric exporter VM, see Required Networking Rules for Healthwatch
Exporter for Tanzu Platform for Cloud Foundry in Healthwatch Architecture.
2. In the scrape_config section of the Prometheus configuration file, create a scrape job for your
Tanzu Operations Manager foundation. Under static_config, specify the TCP ports of each
metric exporter VM as static targets for the IP address of your external service or database. For
example:
job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:8443"
- "1.2.3.4:25555"
- "1.2.3.4:443"
- "1.2.3.4:8082"
For more information about the scrape_config section of the Prometheus configuration file, see
the Prometheus documentation. For more information about the static_config section of the
Prometheus configuration file, see the Prometheus documentation.
To install, configure, and deploy Healthwatch Exporter for TKGI through an automated pipeline, see
Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.
54
Healthwatch for VMware Tanzu
When installed on a Tanzu Operations Manager foundation you want to monitor, Healthwatch Exporter for
TKGI deploys metric exporter VMs to generate service level indicators (SLIs) related to the health of your
TKGI deployment. The Prometheus instance in your metrics monitoring system then scrapes the
Prometheus exposition endpoints on the metric exporter VMs and imports those metrics into your
monitoring system. For more information about the architecture of the Healthwatch Exporter for TKGI tile,
see Healthwatch Exporter for TKGI in Healthwatch Architecture.
After installing Healthwatch Exporter for TKGI, you configure the metric exporter VMs deployed by
Healthwatch Exporter for TKGI through the tile UI. You can also configure errands and system logging, and
you can scale VM instances up or down and configure load balancers for multiple VM instances.
If you want to quickly deploy the Healthwatch Exporter for TKGI tile to ensure that it
deploys successfully before you fully configure it, you only need to configure the Assign
AZ and Networks and BOSH Health Metric Exporter VM panes.
1. Go to the Healthwatch Exporter for TKGI tile in the Tanzu Ops Manager Installation Dashboard.
2. Assign jobs to your availability zones (AZs) and networks. For more information, see Assign AZs
and Networks.
3. (Optional) Configure the TKGI Metric Exporter VMs pane. For more information, see (Optional)
Configure TKGI and Certificate Expiration Metric Exporter VMs.
4. (Optional) Configure the TKGI SLI Exporter VM pane. For more information, see (Optional)
Configure TKGI SLI Exporter VMs.
5. Configure the BOSH Health Metric Exporter VM pane. For more information, see Configure the
BOSH Health Metric Exporter VM.
6. (Optional) Configure the BOSH Deployment Metric Exporter VM pane. For more information, see
(Optional) Configure the BOSH Deployment Metric Exporter VM.
7. (Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands.
8. (Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog.
9. (Optional) Configure the Resource Config pane. For more information, see (Optional) Configure
Resources.
10. From the Tanzu Ops Manager Installation Dashboard, deploy the Healthwatch Exporter for TKGI
tile. For more information, see Deploy Healthwatch Exporter for TKGI.
11. After you have finished installing, configuring, and deploying Healthwatch Exporter for TKGI,
configure a scrape job for Healthwatch Exporter for TKGI in the Prometheus instance in your
monitoring system. For more information, see Configure a Scrape Job for Healthwatch Exporter for
TKGI.
You don't need to configure a scrape job for installations of Healthwatch Exporter
for TKGI that are on the same Tanzu Operations Manager foundation as your
Healthwatch for VMware Tanzu tile. The Prometheus instance in the Healthwatch
tile automatically discovers and scrapes Healthwatch Exporter tiles that are
55
Healthwatch for VMware Tanzu
2. Click the Healthwatch Exporter for Tanzu Kubernetes Grid - Integrated tile.
2. Under Place singleton jobs in, select the first AZ. Tanzu Operations Manager runs any job with a
single instance in this AZ.
3. Under Balance other jobs in, select one or more other AZs. Tanzu Operations Manager balances
instances of jobs with more than one instance across the AZs that you specify.
4. From the Network dropdown, select the runtime network that you created when configuring the
BOSH Director tile. For more information about TKGI networks, see the Tanzu Operations Manager
documentation.
5. (Optional) If you want to assign jobs to a service network in addition to your runtime network, select
it from the Services Network dropdown. For more information about TKGI service networks, see
the Tanzu Operations Manager documentation.
6. Click Save.
The IP addresses you configure in the TKGI Metric Exporter VMs pane must not be within
the reserved IP ranges you configured in the BOSH Director tile.
2. (Optional) For Static IP address for TKGI metric exporter VM, enter a valid static IP address that
you want to reserve for the TKGI metric exporter VM. The TKGI metric exporter VM collects health
metrics from the BOSH Director. For more information, see TKGI Metric Exporter VM in
Healthwatch Metrics.
56
Healthwatch for VMware Tanzu
3. (Optional) For Static IP address for certificate expiration metric exporter VM, enter a valid static
IP address that you want to reserve for the certificate expiration metric exporter VM. The certificate
expiration metric exporter VM collects metrics that show when certificates in your Tanzu
Operations Manager deployment are due to expire. For more information, see Certificate Expiration
Metric Exporter VM and Monitoring Certificate Expiration.
If you have both Healthwatch Exporter for TKGI and Healthwatch Exporter for
Tanzu Platform for Cloud Foundry installed on the same Tanzu Operations
Manager foundation, scale the certificate expiration metric exporter VM to zero
instances in the Resource Config pane in one of the Healthwatch Exporter tiles.
Otherwise, the two certificate expiration metric exporter VMs create redundant
sets of metrics.
4. (Optional) If your Tanzu Operations Manager deployment uses self-signed certificates, select the
Skip TLS certificate verification checkbox. When this checkbox is selected, the certificate
expiration metric exporter VM does not verify the identity of the Tanzu Operations Manager VM.
This checkbox is unselected by default.
5. Click Save.
2. (Optional) For Static IP address for TKGI SLI exporter VM, enter a valid static IP address that
you want to reserve for the TKGI SLI exporter VM. This IP address must not be within the =
reserved IP ranges you configured in the BOSH Director tile.
3. For SLI test frequency, enter in seconds how frequently you want the TKGI SLI exporter VM to run
SLI tests.
4. (Optional) To allow TKGI SLI exporter VM to communicate with the TKGI API over TLS, configure
one of the following options:
To configure the TKGI SLI exporter VM to use a self-signed certificate authority (CA)
certificate or a certificate that is signed by a self-signed CA certificate when
communicating with the TKGI API over TLS:
1. For CA certificate for TLS, provide the CA certificate. If you provide a self-signed
CA certificate, it must be the for same CA that signs the certificate in the TKGI
API.
57
Healthwatch for VMware Tanzu
To configure the TKGI SLI exporter VM to skip TLS certificate verification when
communicating with the TKGI API over TLS, leave the CA certificate for TLS field blank.
The Skip TLS certificate verification checkbox is selected and not configurable by
default. When this checkbox is selected, the TKGI SLI exporter VM does not verify the
identity of the TKGI API. VMware does not recommend skipping TLS certificate verification
in a production environment.
5. Click Save.
2. Under Availability zone, select the AZ on which you want Healthwatch Exporter for TKGI to deploy
the BOSH health metric exporter VM.
3. Under VM type, select from the dropdown the type of VM you want Healthwatch Exporter for TKGI
to deploy.
4. Click Save.
If you have both Healthwatch Exporter for TKGI and Healthwatch Exporter for Tanzu
Platform for Cloud Foundry installed on the same Tanzu Operations Manager foundation,
scale the BOSH health metric exporter VM to zero instances in the Resource Config pane
in one of the Healthwatch Exporter tiles. Otherwise, the two sets of BOSH health metric
exporter VM metrics cause a 401 error in your BOSH Director deployment, and one set of
metrics reports that the BOSH Director is down in the Grafana UI. For more information,
see BOSH Health Metrics Cause Errors When Two Healthwatch Exporter Tiles Are
Installed in Troubleshooting Healthwatch.
2. (Optional) For UAA client credentials, enter the username and secret for the UAA client that the
BOSH deployment metric exporter VM uses to access the BOSH Director VM. For more
58
Healthwatch for VMware Tanzu
information, see Create a UAA Client for the BOSH Deployment Metric Exporter VM.
3. (Optional) For Static IP address for BOSH deployment metric exporter VM, enter a valid static
IP address that you want to reserve for the BOSH deployment metric exporter VM. This IP address
must not be within the reserved IP ranges you configured in the BOSH Director tile.
4. Click Save.
If you have both Healthwatch Exporter for TKGI and Healthwatch Exporter for Tanzu
Platform for Cloud Foundry installed on the same Tanzu Operations Manager foundation,
scale the BOSH deployment metric exporter VM to zero instances in the Resource Config
pane in one of the Healthwatch Exporter tiles. Otherwise, the two BOSH deployment metric
exporter VMs create redundant sets of metrics.
To create a UAA client for the BOSH deployment metric exporter VM:
2. Record the IP address for the BOSH Director VM and the login and administrator credentials for the
BOSH Director UAA instance. For more information about internal authentication settings for your
Tanzu Operations Manager deployment, see the Tanzu Operations Manager documentation.
3. Record the IP address in the IPs column of the BOSH Director row.
5. In the Uaa Admin Client Credentials row of the BOSH Director section, click
Link to Credential.
6. Record the value of password. This value is the secret for Uaa Admin Client
Credentials.
8. In the Uaa Login Client Credentials row of the BOSH Director section, click Link
to Credential.
9. Record the value of password. This value is the secret for Uaa Login Client
Credentials.
1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.
59
Healthwatch for VMware Tanzu
2. Click Settings.
9. Record the IP address in the IPs column of the BOSH Director row.
11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.
12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.
1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.
2. Click Settings.
9. Record the IP address in the IPs column of the BOSH Director row.
11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.
12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.
3. SSH into the Tanzu Operations Manager VM by following the procedure in the Tanzu Operations
Manager documentation.
Where BOSH-DIRECTOR-IP is the IP address for the BOSH Director VM that you recorded from the
Status tab in the BOSH Director tile in an earlier step.
60
Healthwatch for VMware Tanzu
If your Tanzu Operations Manager deployment uses internal authentication, log in to the
UAA instance by running:
Where UAA-LOGIN-CLIENT-SECRET is the secret you recorded from the Uaa Login Client
Credentials row in the Credentials tab in the BOSH Director tile in an earlier step.
If your Tanzu Operations Manager deployment uses SAML or LDAP, log in to the UAA
instance by running:
Where BOSH-UAA-CLIENT-SECRET is the secret you recorded from the Uaa Bosh Client
Credentials row in the Credentials tab in the BOSH Director tile in an earlier step.
6. When prompted, enter the UAA administrator client username admin and the secret you recorded
from the Uaa Admin Client Credentials row in the Credentials tab in the BOSH Director tile in an
earlier step.
7. Create a UAA client for the BOSH deployment metric exporter VM by running:
Where:
CLIENT-USERNAME is the username you want to set for the UAA client.
CLIENT-SECRET is the secret you want to set for the UAA client.
9. Click the Healthwatch Exporter for Tanzu Kubernetes Grid - Integrated tile.
11. For UAA client credentials, enter the username and secret for the UAA client you just created.
In the Errands pane, you can select On to always run an errand or Off to never run it.
For more information about how Tanzu Operations Manager manages errands, see the Tanzu Operations
Manager documentation.
61
Healthwatch for VMware Tanzu
1. Select Errands.
2. (Optional) This tile has only one errand: choose whether to always run or never run the Smoke
Tests errand. This errand verifies that the metric exporter VMs are running.
3. Click Save.
1. Select Syslog.
2. Under Do you want to configure Syslog forwarding?, select one of the following options:
Yes: Allows syslog forwarding and allows you to edit the configuration fields described
3. For Address, enter the IP address or DNS domain name of your external destination.
5. For Transport Protocol, select TCP or UDP from the dropdown. This determines which transport
protocol Healthwatch Exporter for TKGI uses to forward system logs to your external destination.
2. In Permitted Peer, enter either the name or SHA1 fingerprint of the remote peer.
3. In SSL Certificate, paste in the TLS certificate for your external destination.
7. (Optional) For Queue Size, specify the number of log messages Healthwatch Exporter for TKGI
can hold in a buffer at a time before sending them to your external destination. The default value is
100000.
8. (Optional) To forward debug logs to your external destination, select the Forward Debug Logs
checkbox. This checkbox is unselected by default.
62
Healthwatch for VMware Tanzu
2. (Optional) To scale a job, select an option from the dropdown for the resource you want to modify:
Persistent Disk Type: Configures the amount of persistent disk space to allocate to the
job
3. Ensure that the Internet Connected checkbox is unselected. Selecting this checkbox
gives VMs a public IP address that allows outbound Internet access.
4. Click Save.
If you monitor metrics using the Healthwatch tile on an Tanzu Operations Manager foundation, see
Configure a Scrape Job for Healthwatch Exporter for TKGI in Healthwatch.
You don't need to configure a scrape job for installations of Healthwatch Exporter
for TKGI that are on the same Tanzu Operations Manager foundation as your
Healthwatch tile. The Prometheus instance in the Healthwatch tile automatically
discovers and scrapes Healthwatch Exporter tiles that are installed on the same
Tanzu Operations Manager foundation as the Healthwatch tile.
If you monitor metrics using a service or database located outside your Tanzu Operations Manager
foundation, such as from an external TSDB, see Configure a Scrape Job for Healthwatch Exporter
for TKGI in an External Monitoring System.
63
Healthwatch for VMware Tanzu
1. Open network communication paths from your external service or database to the metric exporter
VMs in Healthwatch Exporter for TKGI. The procedure to open these network paths differs
depending on your Tanzu Operations Manager foundation’s IaaS. For a list of TCP ports used by
each metric exporter VM, see Required Networking Rules for Healthwatch Exporter for TKGI in
Healthwatch Architecture.
2. In the scrape_config section of the Prometheus configuration file, create a scrape job for your
Tanzu Operations Manager foundation. Under static_config, specify the TCP ports of each
metric exporter VM as static targets for the IP address of your external service or database. For
example:
job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:8443"
- "1.2.3.4:25555"
- "1.2.3.4:443"
- "1.2.3.4:25595"
- "1.2.3.4:9021"
For more information about the scrape_config section of the Prometheus configuration file, see
the Prometheus documentation. For more information about the static_config section of the
Prometheus configuration file, see the Prometheus documentation.
You can monitor several Tanzu Operations Manager foundations that have VMware Tanzu Platform for
Cloud Foundry or VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) installed from a Healthwatch tile
that you install on a separate Tanzu Operations Manager foundation.
There are two ways to monitor several Tanzu Operations Manager foundations from a single monitoring
Tanzu Operations Manager foundation:
Direct scraping: The Prometheus instance in the Healthwatch deployment on your monitoring
Tanzu Operations Manager foundation scrapes metrics directly from the metric exporter VMs
deployed by the Healthwatch Exporter tiles installed on the Tanzu Operations Manager foundation
64
Healthwatch for VMware Tanzu
you monitor. Direct scraping allows you to easily scrape metrics from the Healthwatch Exporter
tiles on the Tanzu Operations Manager foundations you monitor and store them in a single
Prometheus instance. For more information, see Configuring Multi-Foundation Monitoring Through
Direct Scraping.
Federation: The Prometheus instance in the Healthwatch deployment on your monitoring Tanzu
Operations Manager foundation federates metrics from the Prometheus instances in the
Healthwatch deployments on the Tanzu Operations Manager foundations you monitor. Federation
allows you to to monitor a subset of metrics from multiple Tanzu Operations Manager foundations
without storing all metrics from those Tanzu Operations Manager foundations in a single
Prometheus instance. For more information, see Configuring Multi-Foundation Monitoring Through
Federation.
With both methods, you can label the metrics with the name of the Tanzu Operations Manager foundation
from which they were collected. This allows you to see all metrics for a specific Tanzu Operations Manager
foundation or compare certain metrics across Tanzu Operations Manager foundations.
Direct scraping allows you to easily scrape the metrics you want to monitor from the Healthwatch Exporter
tiles on the Tanzu Operations Manager foundations you monitor. If you want to monitor component metrics
and SLIs related to the health of your Tanzu Platform for CF or TKGI deployments, and you do not want to
monitor metrics for Kubernetes clusters for any TKGI deployments, VMware recommends configuring direct
scraping for your multi-foundation Healthwatch deployment.
However, the Prometheus instance in the Healthwatch deployment on your monitoring Tanzu Operations
Manager foundation cannot directly scrape metrics for Kubernetes clusters created through TKGI
deployments on other Tanzu Operations Manager foundations. If you want to also scrape metrics for
Kubernetes clusters for TKGI deployments on the Tanzu Operations Manager foundations you monitor, you
must monitor your multi-foundation Healthwatch deployment through federation instead. For more
information, see Configure Federation for TKGI in the next section.
To configure direct scraping for your multi-foundation Healthwatch deployment, you must install the
Healthwatch tile on your monitoring Tanzu Operations Manager foundation and only the Healthwatch
Exporter tile for Tanzu Platform for Cloud Foundry or the Healthwatch Exporter tile for TKGI on the Tanzu
Operations Manager foundations you want to monitor.
To configure direct scraping for your multi-foundation Healthwatch deployment, see Configuring Direct
Scraping for Multi-Foundation Monitoring.
65
Healthwatch for VMware Tanzu
When you configure federation for your multi-foundation Healthwatch deployment, the Prometheus instance
in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes a subset of metrics
from the Prometheus instances in the Healthwatch tiles installed on the Tanzu Operations Manager
foundations you monitor.
Federation allows you to to monitor a subset of metrics from multiple Tanzu Operations Manager
foundations without storing all metrics from those Tanzu Operations Manager foundations in a single
Prometheus instance. Because federation allows you to choose which metrics the Healthwatch deployment
on your monitoring Tanzu Operations Manager foundation receives, you can monitor a large number of
Tanzu Operations Manager foundations without overwhelming the Prometheus instance in the Healthwatch
deployment on your monitoring Tanzu Operations Manager foundation. If you want to monitor component
metrics, SLIs related to the health of your Tanzu Platform for CF or TKGI deployments, and metrics for
Kubernetes clusters for TKGI deployments, or if you want to monitor a large number of Tanzu Operations
Manager foundations, VMware recommends configuring federation for your multi-foundation Healthwatch
deployment.
Federating all metrics from an Tanzu Operations Manager foundation you monitor
negatively affects the performance of the Prometheus instance in the Healthwatch tile
installed on your monitoring Tanzu Operations Manager foundation, sometimes even
causing it to crash. To avoid this, VMware recommends limiting federation to only certain
metrics, such as service level indicator (SLI) metrics, from each Tanzu Operations
Manager foundation you monitor. For more information about the metrics you can collect,
see Healthwatch Metrics.
Federation also reduces firewall and network complexity for your multi-foundation Healthwatch deployment,
since the Prometheus instance in the Healthwatch tile on your monitoring Tanzu Operations Manager
foundation scrapes metrics only from the Prometheus instance on each of the Tanzu Operations Manager
foundations you monitor, rather than from each metric exporter VM deployed by the Healthwatch Exporter
tile on each of the Tanzu Operations Manager foundations you monitor.
To configure federation for your multi-foundation Healthwatch deployment, you must install the Healthwatch
tile on your monitoring Tanzu Operations Manager foundation and on each Tanzu Operations Manager
foundation you want to monitor, in addition to installing the Healthwatch Exporter tile on each Tanzu
Operations Manager foundation you want to monitor. Then, you must configure the Healthwatch tile on your
monitoring Tanzu Operations Manager foundation to federate metrics from the Prometheus installed on the
Tanzu Operations Manager foundations you want to monitor.
To configure federation for your multi-foundation Healthwatch deployment, see Configuring Federation for
Multi-Foundation Monitoring.
66
Healthwatch for VMware Tanzu
When you configure direct scraping for your multi-foundation Healthwatch deployment, the Prometheus
instance in the Healthwatch tile on a monitoring VMware Tanzu Operations Manager foundation scrapes
metrics directly from the metric exporter VMs deployed by the Healthwatch Exporter tiles installed on the
Tanzu Operations Manager foundation you monitor.
Direct scraping allows you to easily scrape the metrics you want to monitor from the Healthwatch Exporter
tiles on the Tanzu Operations Manager foundations you monitor.
To configure direct scraping for your multi-foundation Healthwatch deployment, you must install the
Healthwatch tile on your monitoring Tanzu Operations Manager foundation and only the Healthwatch
Exporter for Tanzu Platform for CF tile or Healthwatch Exporter for TKGI tile on the Tanzu Operations
Manager foundations you want to monitor.
1. Install and configure the Healthwatch tile on your monitoring Tanzu Operations Manager foundation.
To install and configure the Healthwatch tile, see the following topics:
Configuring Healthwatch
2. Install and configure either Healthwatch Exporter for Tanzu Platform for Cloud Foundry or
Healthwatch Exporter for TKGI on each Tanzu Operations Manager foundation you want to monitor.
To install and configure a Healthwatch Exporter tile, see the following topics:
3. For each Healthwatch Exporter tile you installed and configured, open the ports for the metric
exporter VMs that the Healthwatch Exporter tile deploys in the user console for your IaaS. For more
information about the ports you must open for each metric exporter VM, see either Networking
Rules for Healthwatch Exporter for Tanzu Platform for Cloud Foundry or Networking Rules for
Healthwatch Exporter for TKGI in Healthwatch Architecture.
4. Add a scrape job for each Healthwatch Exporter tile in the Prometheus pane of the Healthwatch
tile that you installed on your monitoring Tanzu Operations Manager foundation. To add a scrape job
for a Healthwatch Exporter tile:
67
Healthwatch for VMware Tanzu
1. Retrieve the Tanzu Operations Manager root certificate authority (CA) for the Tanzu
Operations Manager foundation you want to monitor. For more information, see the Tanzu
Operations Manager documentation.
2. Go to the Tanzu Ops Manager Installation Dashboard for the Tanzu Operations Manager
foundation you want to monitor.
3. Click the Healthwatch Exporter for Tanzu Platform for Cloud Foundry or Healthwatch
Exporter for Tanzu Kubernetes Grid - Integrated tile, depending on which Healthwatch
Exporter tile you installed on the Tanzu Operations Manager foundation you want to
monitor.
5. In the row for Healthwatch Exporter Client Mtls, click Link to Credential.
7. In a browser window, navigate to the user console for your Tanzu Operations Manager
deployment’s IaaS.
8. In the user console for your IaaS, record the public IP addresses of the metric exporter
VMs deployed by the Healthwatch Exporter tile you installed on the Tanzu Operations
Manager foundation you want to monitor, depending on which metrics you want to monitor
for that foundation:
For Healthwatch Exporter for Tanzu Platform for Cloud Foundry, record the public
IP addresses of any or all of the following metric exporter VMs:
pas-exporter-counter, the counter metric exporter VM
For Healthwatch Exporter for TKGI, record the public IP addresses of any or all of
the following metric exporter VMs:
68
Healthwatch for VMware Tanzu
To find the public IP addresses of deployed VMs in the user console for your IaaS, see the
documentation for your IaaS:
AWS: To find the public IP address of a Linux instance, see the AWS
documentation for Linux instances of Amazon EC2. To find the public IP address
for a Windows instance, see the AWS documentation for Windows instances of
Amazon EC2.
Azure: To create or view the public IP address for an Azure VM, see the Azure
documentation.
GCP: To find the public IP address for a GCP VM, see the GCP documentation.
vSphere: To find the public IP address of a vSphere VM, see the vSphere
documentation.
9. Go to the Tanzu Ops Manager Installation Dashboard for your monitoring Tanzu Operations
Manager foundation.
For Healthwatch Exporter for Tanzu Platform for Cloud Foundry, provide
configuration parameters similar to the following example:
job_name: FOUNDATION-NAME
metrics_path: /metrics
scheme: https
69
Healthwatch for VMware Tanzu
static_configs:
- targets:
- "COUNTER-EXPORTER-VM-IP-ADDRESS:9090"
- "GAUGE-EXPORTER-VM-IP-ADDRESS:9090"
- "SLI-EXPORTER-VM-IP-ADDRESS:9090"
- "CERT-EXPIRATION-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-HEALTH-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-DEPLOYMENTS-EXPORTER-VM-IP-ADDRESS:9090"
Where:
For Healthwatch Exporter for TKGI, provide configuration parameters similar to the
following example:
job_name: FOUNDATION-NAME
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "TKGI-EXPORTER-VM-IP-ADDRESS:9090"
- "CERT-EXPIRATION-EXPORTER-VM-IP-ADDRESS:9090"
- "SLI-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-HEALTH-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-DEPLOYMENTS-EXPORTER-VM-IP-ADDRESS:9090"
Where:
70
Healthwatch for VMware Tanzu
14. In Certificate and private key for TLS, enter the certificate and private key from
Healthwatch Exporter Client Mtls that you recorded from the Credentials tab in the
Healthwatch Exporter tile in a previous step.
15. In CA certificate for TLS, enter the Tanzu Operations Manager root CA that you retrieved
in a previous step.
16. In Target server name, enter the custom hostname resolver to use when verifying the TLS
certificates. Enter the name of the server that facilitates TLS communication between the
Prometheus instance in the Healthwatch tile and the metric exporter VMs that the
Healthwatch Exporter tile deploys. If the CN or SAN on the TLS certificate does not match
the URL or IP of the target server, enter what is on the TLS certificate.
When you configure your Healthwatch deployment to federate metrics, the Prometheus instance in the
Healthwatch tile on a monitoring VMware Tanzu Operations Manager foundation scrapes a subset of
metrics from the Prometheus instances in the Healthwatch tiles installed on the Tanzu Operations Manager
foundations you monitor. This is useful if you want to monitor a subset of metrics from multiple Tanzu
Operations Manager foundations without storing all metrics from those Tanzu Operations Manager
foundations in a single Prometheus instance. Because federation allows you to choose which metrics the
Healthwatch deployment on your monitoring Tanzu Operations Manager foundation receives, you can
monitor a large number of Tanzu Operations Manager foundations without overwhelming the Prometheus
instance in the Healthwatch deployment on your monitoring Tanzu Operations Manager foundation.
To configure federation for your Healthwatch deployment, you must install the Healthwatch tile on your
monitoring Tanzu Operations Manager foundation and on each Tanzu Operations Manager foundation you
want to monitor, in addition to installing the Healthwatch Exporter tile on each Tanzu Operations Manager
foundation you want to monitor. Then, you must configure the Healthwatch tile on your monitoring Tanzu
71
Healthwatch for VMware Tanzu
Operations Manager foundation to federate metrics from the Prometheus installed on the Tanzu Operations
Manager foundations you want to monitor. If you want to federate metrics from Tanzu Operations Manager
foundations with TKGI installed, you must also configure TKGI cluster discovery on the Tanzu Operations
Manager foundations you want to monitor.
1. Set up your multi-foundation deployment for federation by following the procedure in the section for
your runtime:
2. Configure scrape jobs for the Prometheus instances in the Healthwatch tiles on the Tanzu
Operations Manager foundations you want to monitor. See Configure Scrape Jobs.
If your multi-foundation Healthwatch deployment contains one or more highly available (HA)
Healthwatch deployments, see Federation for a Highly Available Healthwatch Deployment.
Federating all metrics from an Tanzu Operations Manager foundation you monitor
negatively affects the performance of the Prometheus instance in the Healthwatch
tile installed on your monitoring Tanzu Operations Manager foundation, sometimes
even causing it to crash. To avoid this, VMware recommends federating only
certain metrics, such as service level indicator (SLI) metrics, from each Tanzu
Operations Manager foundation you monitor. For more information about the
metrics you can collect, see Healthwatch Metrics.
1. Install and configure the Healthwatch and Healthwatch Exporter tiles on each Tanzu Operations
Manager foundation you want to monitor. To install and configure the Healthwatch and Healthwatch
Exporter tiles, see the following topics:
Configuring Healthwatch
2. Install and configure the Healthwatch tile on your monitoring Tanzu Operations Manager foundation.
To install and configure the Healthwatch tile, see the following topics:
72
Healthwatch for VMware Tanzu
Configuring Healthwatch
3. In the Healthwatch tile on your monitoring Tanzu Operations Manager foundation, configure scrape
jobs for the Prometheus instances in the Healthwatch tiles on the Tanzu Operations Manager
foundations you want to monitor. See Configure Scrape Jobs.
For a Healthwatch deployment on one Tanzu Operations Manager foundation to receive metrics for
Kubernetes clusters created through TKGI deployments on other Tanzu Operations Manager foundations,
you must configure the Healthwatch Exporter for TKGI deployment on those Tanzu Operations Manager
foundations to federate metrics to the Prometheus instance in the Healthwatch deployment on the Tanzu
Operations Manager foundation you use to monitor the other Tanzu Operations Manager foundations. If you
do not configure federation for TKGI deployments on the Tanzu Operations Manager foundations you want
to monitor, the Healthwatch Exporter for TKGI deployments on those Tanzu Operations Manager
foundations can only send component metrics and SLIs related to the health of those TKGI deployments.
To configure TKGI deployments on multiple Tanzu Operations Manager foundations to federate metrics to a
single monitoring Tanzu Operations Manager foundation:
1. Install and configure the Healthwatch and Healthwatch Exporter for TKGI tiles on each Tanzu
Operations Manager foundation you want to monitor. To install and configure the Healthwatch and
Healthwatch Exporter for TKGI tiles, see the following topics:
Configuring Healthwatch
2. Install and configure the Healthwatch tile on your monitoring Tanzu Operations Manager foundation.
To install and configure the Healthwatch tile, see the following topics:
Configuring Healthwatch
3. Configure TKGI cluster discovery in the Healthwatch tile on each Tanzu Operations Manager
foundation you want to monitor. Do not configure TKGI cluster discovery in the Healthwatch tile on
your monitoring foundation. To configure TKGI cluster discovery on the Tanzu Operations Manager
foundations you want to monitor, see Configuring TKGI Cluster Discovery.
4. In the Healthwatch tile on your monitoring Tanzu Operations Manager foundation, configure scrape
jobs for the Prometheus instances in the Healthwatch tiles on the Tanzu Operations Manager
foundations you want to monitor. To configure these scrape jobs, see Configure Scrape Jobs.
73
Healthwatch for VMware Tanzu
1. For each Tanzu Operations Manager foundation you want to monitor, open port 4450 for the
Prometheus instance in the Healthwatch tile in the user console for your IaaS. For more
information, see the documentation for your IaaS.
4. In the Promxy Client Mtls row of the TSDB section, click Link to Credential.
5. Record the values of private_key_pem and cert_pem. These values are the private key
and certificate for Promxy Client mTLS.
6. Retrieve the certificate for the Tanzu Operations Manager root certificate authority (CA) of
the Tanzu Operations Manager foundation you want to monitor. For more information, see
the Tanzu Operations Manager documentation.
7. Go to the Tanzu Ops Manager Installation Dashboard for your monitoring Tanzu Operations
Manager foundation.
9. Select Prometheus.
11. For Scrape job configuration parameters, provide, in YAML format, the configuration
parameters for a scrape job for the Prometheus instance in the Healthwatch tile on the
Tanzu Operations Manager foundation you want to monitor. In the example below, the
scrape job federates all metrics with names that match the regular expression
^metric_name_regex.* from the Prometheus instance at the IP address listed under the
targets property:
job_name: example-job-name
scheme: https
metrics_path: '/federate'
params:
'match[]':
- '{__name__=~"^metric_name_regex.*"}'
74
Healthwatch for VMware Tanzu
static_configs:
- targets:
- 'source-tsdb-1:4450'
- 'source-tsdb-2:4450'
If you have configured a load balancer or DNS entry for the Prometheus
instance, include the IP address for your load balancer or DNS entry in
each target listed under the targets property instead of the IP address for
the Prometheus instance.
12. For Certificate and private key for TLS, enter the certificate and private key you recorded
from the Promxy Client mTLS row in the Credentials tab in the Healthwatch tile installed
on the Tanzu Operations Manager foundation you want to monitor in a previous step.
13. For CA certificate for TLS, enter the Tanzu Operations Manager root CA certificate for the
Tanzu Operations Manager foundation you want to monitor that you recorded in a previous
step.
16. Test your federation configuration. See Test your Federation Configuration.
product-properties:
.properties.scrape_configs:
value:
- ca: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
scrape_job: |
job_name: example-job-name
scheme: https
metrics_path: '/federate'
params:
'match[]':
- '{__name__=~"^my_metric_name_regex.*"}'
static_configs:
- targets:
- 'source-prometheus-1:4450'
server_name: promxy
tls_certificates:
cert_pem: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
private_key_pem: |
-----BEGIN RSA PRIVATE KEY-----
75
Healthwatch for VMware Tanzu
SECRET
-----END RSA PRIVATE KEY-----
For more information, see Configure and Deploy your Tile Using the om CLI in Installing, Configuring, and
Deploying a Tile Through an Automated Pipeline.
Configure Prometheus
3. On the left side of the Grafana UI homepage, click the Explore icon. An empty Explore tab
appears.
4. In the query field to the right of the Metrics browser menu tab, enter up.
6. Under Table, review the query results. If your federation configuration is working, the job column
includes the job_name from the scrape jobs you configured for each Tanzu Operations Manager
foundation you monitor in Configure Federation.
There are two ways to do this, each with its own pros and cons:
When federating metrics, you can configure the Prometheus instance in the Healthwatch tile on
your monitoring Tanzu Operations Manager foundation to scrape both copies of that data from the
Prometheus instance in the Healthwatch tile on each Tanzu Operations Manager foundation you
monitor.
To do this, include both VMs in each Prometheus instance from the Tanzu Operations Manager
foundations you want to monitor in the scrape job configuration parameters. While including both
VMs creates duplicate sets of metrics, it also ensures that you do not lose metrics data if one of
the VMs goes down. However, doubling the number of metrics that the Prometheus instance
collects also negatively affects the performance of the Prometheus instance.
Alternatively, you can create load balancers or DNS entries in your IaaS user console for the
Prometheus instances on each Tanzu Operations Manager foundation you monitor, then include the
IP addresses for each load balancer or DNS entry in the targets listed under the targets property
in your scrape job configuration parameters. For more information, see Configure Scrape Jobs.
76
Healthwatch for VMware Tanzu
In both cases, VMware recommends configuring static IP addresses for both VMs in each of the
Prometheus instances. For more information about configuring static IP addresses for Prometheus
instances, see Configure Prometheus in Configuring Healthwatch.
The Prometheus instance detects and scrapes TKGI clusters by connecting to the Kubernetes API through
the TKGI API using a UAA client. To allow this, you must configure the Healthwatch tile, the Prometheus
instance in the Healthwatch tile, the UAA client that the Prometheus instance uses to connect to the TKGI
API, and the TKGI tile.
1. Configure the TKGI Cluster Discovery pane in the Healthwatch tile. For more information, see
Configure TKGI Cluster Discovery in Healthwatch below.
2. Configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters. For more
information, see Configure TKGI below.
If TKGI cluster discovery fails after you have completed both parts of the procedure in this topic, see
Troubleshooting TKGI Cluster Discovery Failure below.
To collect additional BOSH system metrics related to TKGI and view them in the Grafana
UI, you must install and configure the Healthwatch Exporter for TKGI on your Tanzu
Operations Manager foundations with TKGI installed. To install the Healthwatch Exporter
for TKGI tile, see Installing a Tile Manually. To configure the Healthwatch Exporter for
TKGI tile, see Configuring Healthwatch Exporter for TKGI.
77
Healthwatch for VMware Tanzu
On: This option allows TKGI cluster discovery and reveals the configuration fields
described in the steps below. TKGI cluster discovery is allowed by default when TKGI is
installed on your Tanzu Operations Manager foundation.
5. For Discovery interval, enter in seconds how frequently you want the Prometheus instance
detects and scrapes TKGI clusters. The minimum value is 60.
6. (Optional) To allow the Prometheus instance to communicate with the TKGI API over TLS,
configure one of the following options:
7. Click Save.
Configure TKGI
After you configure TKGI cluster discovery in the Healthwatch tile, you must configure TKGI to allow the
Prometheus instance to scrape metrics from TKGI clusters.
To configure TKGI:
5. Select the Include etcd metrics checkbox to allow TKGI to send etcd server and debugging
metrics to Healthwatch.
6. Select the Include Kubernetes Controller Manager metrics checkbox to allow TKGI to send
Kubernetes Controller Manager metrics to Healthwatch.
7. If you are using TKGI v1.14.2 or later, select the Include Kubernetes Scheduler metrics
checkbox to allow TKGI to send Kubernetes Scheduler metrics to Healthwatch.
8. For Setup Telegraf Outputs, provide the following TOML configuration file:
[[outputs.prometheus_client]]
listen = ":10200"
78
Healthwatch for VMware Tanzu
metric_version = 2
You must use 10200 as the listening port to allow the Prometheus instance to scrape Telegraf
metrics from your TKGI clusters. For more information about creating a configuration file in TKGI,
see the TKGI documentation.
If you are configuring TKGI v1.12 or earlier, remove metric_version = 2 from the
TOML configuration file. TKGI v1.12 and earlier are out of support. Consider
upgrading to at least v1.17, which is currently the oldest supported version.
9. Click Save.
2. For (Optional) Add-ons - Use with caution, enter the following YAML snippet to create
the roles required to allow the Prometheus instance to scrape metrics from your TKGI
clusters:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: healthwatch
rules:
- resources:
- pods/proxy
- pods
- nodes
- nodes/proxy
- namespace/pods
- endpoints
- services
verbs:
- get
- watch
- list
apiGroups:
- ""
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: healthwatch
roleRef:
apiGroup: ""
kind: ClusterRole
name: healthwatch
subjects:
- apiGroup: ""
kind: User
name: healthwatch
79
Healthwatch for VMware Tanzu
If (Optional) Add-ons - Use with caution already contains other API resource definitions,
append the above YAML snippet to the end of the existing resource definitions, followed by
a newline character.
13. Ensure that the Upgrade all clusters errand is running. Running this errand configures your TKGI
clusters with the roles you created in the (Optional) Add-ons - Use with caution field of the plans
you monitor in a previous step.
On a Tanzu Operations Manager foundation that has VMware Tanzu Platform for Cloud Foundry installed,
Healthwatch can automatically configure a DNS record for the Grafana VM with the Gorouter in Tanzu
Platform for CF. For more information, see (Optional) Configure Grafana in Configuring Healthwatch.
On an Tanzu Operations Manager foundation that does not have Tanzu Platform for CF installed, you must
manually configure a DNS entry for either the public IP address of a single Grafana VM or the load balancer
in front of the Grafana instance.
To configure a DNS entry for the load balancer associated with the Grafana instance:
1. Ensure that you have associated a load balancer with the Grafana instance. For more information,
see Configure Resources in Configuring Healthwatch.
2. Find the public IP address for your load balancer. You may need to assign a public or elastic IP
address to your load balancer if it does not already have one. For more information, see the
documentation for your IaaS:
80
Healthwatch for VMware Tanzu
AWS: If your Tanzu Operations Manager deployment is on AWS, skip this step. You do not
need the public IP address of your load balancer for Grafana to configure a DNS entry in
your Amazon DNS.
Azure: For more information about finding the public IP address of your Azure load
balancer, see the Azure documentation.
GCP: For more information about finding the public IP address of your GCP load balancer,
see the GCP documentation.
OpenStack: For more information about assigning a floating IP address to your OpenStack
load balancer, see the OpenStack documentation.
vSphere: For more information about finding the public IP address of your vSphere load
balancer, see the vSphere documentation.
3. Create an A record in your DNS server named grafana that points to the public IP address of the
load balancer that you recorded in the previous step. For more information, see the documentation
for your IaaS:
AWS: For more information about configuring a DNS entry in the Amazon VPC console,
see the AWS documentation.
Azure: For more information about configuring an A record in Azure DNS, see the Azure
documentation.
GCP: For more information about adding an A record to Cloud DNS, see the GCP
documentation.
OpenStack: For more information about configuring a DNS entry in the OpenStack internal
DNS, see the OpenStack documentation.
vSphere: For more information about configuring a DNS entry in the vCenter Server
Appliance, see the vSphere documentation.
5. Ensure that the Grafana UI login page appears as expected by navigating to the URL that you
configured in your DNS entry in a web browser. When you see the Grafana UI login page, you have
successfully created a DNS entry.
1. In the user console for your IaaS, find the public IP address for the Grafana VM. For more
information, see the documentation for your IaaS:
AWS: To find the public IP address of a Linux instance, see the AWS documentation for
Linux instances of Amazon EC2. To find the public IP address for a Windows instance, see
the AWS documentation for Windows instances of Amazon EC2.
Azure: To create or view the public IP address for an Azure VM, see the Azure
documentation.
GCP: To find the public IP address for a GCP VM, see the GCP documentation.
81
Healthwatch for VMware Tanzu
vSphere: To find the public IP address of a vSphere VM, see the vSphere documentation.
3. Create an A record in your DNS server named grafana that points to the public IP address of the
Grafana VM that you recorded in the previous step. For more information, see the documentation
for your IaaS:
AWS: For more information about configuring a DNS entry in the Amazon VPC console,
see the AWS documentation.
Azure: For more information about configuring an A record in Azure DNS, see the Azure
documentation.
GCP: For more information about adding an A record to Cloud DNS, see the GCP
documentation.
OpenStack: For more information about configuring a DNS entry in the OpenStack internal
DNS, see the OpenStack documentation.
vSphere: For more information about configuring a DNS entry in the vCenter Server
Appliance, see the vSphere documentation.
5. Ensure that the Grafana UI login page appears as expected by navigating to the URL that you
configured in your DNS entry in a web browser. When you see the Grafana UI login page, you have
successfully created a DNS entry.
You create firewall policies in the console for your Tanzu Operations Manager deployment’s IaaS. To create
a firewall policy for the Grafana instance, see the section for your IaaS:
82
Healthwatch for VMware Tanzu
4. For Security group name, enter the name you want to give the security group. For example,
grafana-port-access.
6. For VPC, select from the dropdown the VPC where the Grafana instance is deployed.
15. Select the checkbox next to the security group you created for the Grafana instance.
For more information about creating a firewall policy in AWS for a Linux instance, see the AWS
documentation for Linux instances of Amazon EC2. For more information about creating a firewall policy in
AWS for a Windows instance, see the AWS documentation for Windows instances of Amazon EC2.
83
Healthwatch for VMware Tanzu
3. Click Add.
4. Create a resource group for the Grafana instance. For more information, see the Azure
documentation.
7. For Name, enter the name you want to give the rule collection. For example, grafana-port-
access.
11. Under IP addresses, configure the following fields for your first rule:
6. For Destination address, enter the public IP address of the Grafana instance or the load
balancer for the Grafana instance.
12. Under IP addresses, configure the following fields for your second rule:
6. For Destination address, enter the public IP address of the Grafana instance or the load
balancer for the Grafana instance.
84
Healthwatch for VMware Tanzu
For more information about creating a firewall policy in Azure, see the Azure documentation.
3. For Network, select from the dropdown the network where the Grafana instance is
deployed.
3. For Network, select from the dropdown the network where the Grafana instance is
deployed.
For more information about creating a firewall policy in GCP, see the GCP documentation.
85
Healthwatch for VMware Tanzu
1. Log in to vSphere.
5. Select Manage.
6. Select Firewall.
4. For Destination, enter the public IP address for the Grafana instance or the load balancer
for the Grafana instance.
4. For Destination, enter the public IP address for the Grafana instance or the load balancer
for the Grafana instance.
For more information about adding an NSX Edge firewall rule, see the vSphere documentation.
By default, users log in to the Grafana UI using basic authentication. With basic authentication, all users log
in to the Grafana UI using the username admin and the administrator login credentials found in the
Credentials tab of the Healthwatch for VMware Tanzu tile.
However, you can configure the Grafana UI to use another authentication method alongside or instead of
basic authentication. This allows users to use their own credentials to log in to the Grafana UI.
86
Healthwatch for VMware Tanzu
The sections in this topic describe how to configure different authentication methods for users to log in to
the Grafana UI:
Basic authentication. For more information, see Configuring Basic Authentication below.
Generic OAuth, UAA, or LDAP. For more information, see Configuring Other Authentication
Methods below.
The values that you configure in the Grafana Authentication pane also configure their corresponding
properties in the Grafana configuration file. For more information, see Overview of Configuration Files in
Healthwatch in Configuration File Reference Guide, Grafana Authentication in Configuration File Reference
Guide, and the Grafana documentation.
5. Click Save.
7. In the Admin Login Password row of the Grafana section, click Link to Credential.
8. Record the value of password. This value is the password that all users must use to log in to the
Grafana UI.
5. Click Save.
87
Healthwatch for VMware Tanzu
For example, if you want to limit the number of users who can log in to the Grafana UI with the Grafana
administrator credentials found in the Credentials tab of the Healthwatch tile, but still allow non-
administrator users to log in to the Grafana UI, you can configure one of the following authentication
methods in addition to basic authentication:
Generic OAuth. For more information, see Configure Generic OAuth Authentication below.
User Account and Authentication (UAA). For more information, see Configure UAA Authentication
below.
If you want to only allow users with Grafana administrator credentials to log in to the Grafana UI, you can
configure the Grafana UI to use only basic authentication. For more information, see Configure Only Basic
Authentication below.
6. For Client ID, enter the client ID of your OAuth provider. The method to retrieve this client ID differs
depending on your OAuth provider. To find the client ID of your OAuth provider, see the Grafana
documentation.
7. For Client secret, enter the client secret of your OAuth provider. The method to retrieve this client
secret differs depending on your OAuth provider. To find the client secret of your OAuth provider,
see the Grafana documentation.
8. For Scopes, enter a comma-separated list of scopes that the OAuth provider adds to the user’s
token when they log in to the Grafana UI. These scopes differ depending on your OAuth provider.
To find the scopes for your OAuth provider, see the Grafana documentation.
9. For Authorization URL, enter the authorization URL of the server for your OAuth provider.
10. For Token URL, enter the token URL of the server for your OAuth provider.
11. For API URL, enter the API URL of the server for your OAuth provider.
88
Healthwatch for VMware Tanzu
12. For Logout URL, enter the URL to which users are redirected after logging out of the Grafana UI.
13. (Optional) For Email attribute, enter the attribute that contains the email address of the user. For
more information, see the Grafana documentation.
14. For Grafana domain, enter the domain of the Grafana instance. You must configure this field if
your OAuth provider requires a callback URL that uses the root URL of the Grafana instance. If
your OAuth provider does not require a callback URL, do not configure this field.
15. (Optional) To allow new users to create a new Grafana account when they log in with their existing
OAuth credentials for the first time, select the Allow new accounts with existing OAuth
credentials checkbox. This checkbox is selected by default. Unselecting this checkbox prevents
users without a pre-existing Grafana account from creating a new Grafana account or logging in to
the Grafana UI with their existing OAuth credentials.
16. (Optional) For Allowed domains, enter a comma-separated list of domains. Configuring this field
limits Grafana UI access to users who belong to one or more of the listed domains.
17. For Allowed teams, enter a comma-separated list of teams. Configuring this field limits Grafana UI
access to users who belong to one or more of the listed teams. Configure this field if your OAuth
provider allows you to separate users into teams. If your OAuth provider does not allow you to
separate users into teams, do not configure this field.
18. For Allowed organizations, enter a comma-separated list of organizations. Configuring this field
limits Grafana UI access to users who belong to one or more of the listed organizations. Configure
this field if your OAuth provider allows you to separate users into organizations. If your OAuth
provider does not allow you to separate users into organizations, do not configure this field.
19. For Allowed groups, enter a comma-separated list of groups. Configuring this field limits Grafana
UI access to users who belong to one or more of the listed groups. Configure this field if your
OAuth provider allows you to separate users into groups. If your OAuth provider does not allow you
to separate users into groups, do not configure this field.
20. (Optional) For Role attribute path, enter a JMESPath string that maps users to Grafana roles. For
example, contains(scope[*], 'healthwatch.admin') && 'Admin' || contains(scope[*],
'healthwatch.edit') && 'Editor' || contains(scope[*], 'healthwatch.read') &&
'Viewer'.
21. (Optional) To prevent users who are not mapped to a valid Grafana role from accessing the Grafana
UI, select the Deny access to users without Grafana roles checkbox. This checkbox is
unselected by default. Unselecting this checkbox assigns the Viewer role to users who cannot be
not mapped to a valid Grafana role by the string configured in the Role attribute path field.
22. (Optional) To allow the Grafana instance to communicate with the server for your OAuth provider
over TLS:
1. For Certificate and private key for TLS, provide a certificate and private key for the
Grafana instance to use for TLS connections to the server for your OAuth provider.
2. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that the
server for your OAuth provider uses to verify TLS certificates.
89
Healthwatch for VMware Tanzu
checkbox is selected, the Grafana instance does not verify the identity of the server for
your OAuth provider. This checkbox is unselected by default.
Healthwatch can automatically configure authentication with the UAA instance of the runtime that is
installed on the same Tanzu Operations Manager foundation as the Healthwatch tile, either VMware Tanzu
Platform for Cloud Foundry or Tanzu Kubernetes Grid Integrated Edition (TKGI). If you want to configure
authentication with the UAA instance of Tanzu Platform for CF or TKGI installed on the same Tanzu
Operations Manager foundation, follow the procedure below. If you want to configure authentication with the
UAA instance of a runtime that is installed on a different Tanzu Operations Manager foundation, you must
manually configure it using the fields described in Configure Generic OAuth Authentication above.
If you are configuring authentication with the UAA instance for a TKGI deployment,
Healthwatch does not add the UAA administrator user account to the
healthwatch.admin group by default. If you want to log in to the Grafana UI using
the UAA administrator credentials, you must manually add the UAA administrator
user account to the healthwatch.admin group.
Where UAA-URL is the URL of the UAA instance with which you want to configure
authentication. For UAA instances for Tanzu Platform for CF, this URL is usually
https://login.SYSTEM-DOMAIN, where SYSTEM-DOMAIN is the domain you configured in
the System domain field in the Domains pane of the Tanzu Platform for Cloud Foundry
tile. For TKGI, this URL is usually https://TKGI-API-URL:8443, where TKGI-API-URL is
the URL of the TKGI API.
Where:
90
Healthwatch for VMware Tanzu
USERNAME is the username of the user to which you want to assign a Grafana role.
3. Click Save.
5. For Host address, enter the network address of your LDAP server host.
6. (Optional) For Port, enter the port for your LDAP server host. The default port is 389 when the Use
TLS checkbox is selected, or 636 when the Use TLS checkbox is unselected.
7. (Optional) To allow new users to create a new Grafana account when they log in with their existing
LDAP credentials for the first time, select the Allow new accounts with existing LDAP
credentials checkbox. This checkbox is selected by default. Unselecting this checkbox prevents
users without a pre-existing Grafana account from creating a new Grafana account or logging in to
the Grafana UI with their existing LDAP credentials.
8. (Optional) To allow the LDAP server to communicate with the Grafana instance over TLS when
authenticating user credentials, select the Use TLS checkbox. This checkbox is unselected by
default.
9. (Optional) To allow the LDAP server to run the STARTTLS command when communicating with the
Grafana instance over TLS, select the Use STARTTLS checkbox. This checkbox is unselected by
default.
10. (Optional) To allow the Grafana instance to skip TLS certificate verification when communicating
with the LDAP server over TLS, select the Skip TLS certificate verification checkbox. This
checkbox is unselected by default.
11. (Optional) For Bind DN, enter the distinguished name (DN) for binding to the LDAP server. For
example, cn=admin,dc=grafana,dc=org.
12. (Optional) For Bind password, enter the password for binding to the LDAP server. For example,
grafana.
13. (Optional) For User search filter, enter a regex string that defines LDAP user search criteria. For
example, (cn=%s).
14. (Optional) For Search base DNs, enter an array of base DNs in the LDAP directory tree from which
any LDAP user search begins. The typical LDAP search base matches your domain name. For
example, dc=grafana,dc=org.
91
Healthwatch for VMware Tanzu
15. (Optional) For Group search filter, enter a regex string that defines LDAP group search criteria.
For example, (&(objectClass=posixGroup)(memberUid=%s)).
16. (Optional) For Group search base DNs, enter an array of base DNs in the LDAP directory tree from
which any LDAP group search begins. For example, ou=groups,dc=grafana,dc=org.
17. (Optional) For Group search filter user attribute, enter a value that defines which user attribute is
substituted for %s in the regex string you entered for Group search filter. You can use the value of
any property listed in Server attributes below. The default value is the value of the username
property.
18. (Optional) For Server attributes, enter in TOML format tables of the LDAP attributes that your
LDAP server uses. Each table must use the table name [servers.attributes].
19. (Optional) For Server group mappings, enter in TOML format an array of tables of LDAP groups
mapped to Grafana orgs and roles. Each table must use the table name
[[servers.group_mappings]]. For more information, see the Grafana documentation.
20. (Optional) To allow the Grafana instance to communicate with your LDAP server over TLS:
1. For Certificate and private key for TLS, provide a certificate and private key for the
Grafana instance to use for TLS connections to your LDAP server.
2. For CA certificate for TLS, provide a certificate for the CA that your LDAP server uses to
verify TLS certificates.
Optional Configuration
The topics in this section describe how to configure optional features in the Healthwatch for VMware Tanzu,
Healthwatch Exporter for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware
Tanzu Kubernetes Grid Integrated Edition (TKGI) tiles:
Configuring Alerting
Configuring Alerting
This topic explains how to configure alerting in Healthwatch for VMware Tanzu.
Overview
In Healthwatch, you can configure the Prometheus instance to send alerts to Alertmanager according to
alerting rules you configure. Alertmanager then manages those alerts by removing duplicate alerts, grouping
alerts together, and routing those groups to alert receiver integrations such as email, PagerDuty, or Slack.
Alertmanager also silences and inhibits alerts according to the alerting rules you configure.
92
Healthwatch for VMware Tanzu
Configure Alerting
In the Alertmanager pane, you configure alerting rules, routing rules, and alert receivers for Alertmanager to
use.
The values that you configure in the Alertmanager pane also configure their corresponding properties in the
Alertmanager configuration file. For more information, see Overview of Configuration Files in Healthwatch in
Configuration File Reference Guide, Configuring the Alertmanager Configuration File in Configuration File
Reference Guide, and the Prometheus documentation.
3. Select Alertmanager.
4. For Alerting rules, provide in YAML format the rule statements that define which alerts
Alertmanager sends to your alert receivers:
1. The following YAML files contain alerting rules for VMware Tanzu Platform for Cloud
Foundry and VMware Tanzu Kubernetes Grid Integrated Edition (TKGI). Choose the YAML
file below that corresponds to your runtime and replace OPS_MANAGER_URL with the fully-
qualified domain name (FQDN) of your Tanzu Operations Manager deployment:
Tanzu Platform for Cloud Foundry
TKGI
2. Modify the YAML file according to the observability requirements for your Tanzu Operations
Manager foundation.
For more information about rule statements for Alertmanager, see the Prometheus
documentation.
5. For Routing rules, provide in YAML format the route block that defines where Alertmanager sends
alerts, how frequently Alertmanager sends alerts, and how Alertmanager groups alerts together. The
following example shows a possible set of routing rules:
receiver: 'example-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [cluster, alertname]
group_by gathers all alerts with the same label into a single alert. For
example, including cluster in the group_by property groups together all
alerts from the same cluster. You can see the labels for the metrics that
93
Healthwatch for VMware Tanzu
You must define all route configuration parameters. For more information about the parameters you
must provide, see the Prometheus documentation.
6. (Optional) For Inhibit rules, provide in YAML format the rule statements that define which alerts
Alertmanager does not send to your alert receivers. For more information, see the Prometheus
documentation.
7. Configure the alert receivers that you specified in Routing rules in a previous step. For more
information, see Configure Alert Receivers below.
You can also configure custom alert receiver integrations that are not natively supported by Alertmanager
through webhook receivers. For more information about configuring custom alert receiver integrations, see
the Prometheus documentation.
If you configure two or more alert receivers with the same name, Alertmanager merges them into a single
alert receiver. For more information, see Combining Alert Receivers below.
The following sections describe how to configure each type of alert receiver:
If you want to provide authentication and TLS communication settings for your alert
receivers, you must provide them in the associated alert receiver configuration fields
described in the sections below. If the base configuration YAML for your alert receivers
include fields for authentication and TLS communication settings, do not include them
when you provide the configuration YAML for your alert receivers in the Alert receiver
configuration parameters fields.
2. For Alert receiver name, enter the name you want to give your email alert receiver. The name you
enter in this field must match the name you specified in the route block you entered in the Routing
rules field in Configure Alerting above.
3. For Alert receiver configuration parameters, provide the configuration parameters for your email
alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a
94
Healthwatch for VMware Tanzu
to: 'operator1@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
At minimum, your configuration parameters must include the to, from, and smarthost properties.
The other properties you must include depend on both the SMTP server for which you are
configuring an alert receiver and the needs of your Tanzu Operations Manager foundation. For more
information about the properties you can include in this configuration, see the Prometheus
documentation.
If you exclude the html and headers properties or leave them blank,
Healthwatch automatically populates them with a default template. To view
the default email template for Healthwatch, see email_template.yml on
GitHub.
4. (Optional) To configure SMTP authentication between Alertmanager and your email alert receiver,
configure one of the following fields:
If your SMTP server uses basic authentication, enter the authentication password for your
SMTP server in SMTP server authentication password.
If your SMTP server uses CRAM_MD5 authentication, enter the authentication secret for
your SMTP server in SMTP server authentication secret.
5. (Optional) To allow Alertmanager to communicate with your email alert receiver over TLS, configure
the following fields:
1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to your SMTP server.
2. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that your
SMTP server uses to verify TLS certificates.
3. For SMTP server name, enter the name of the SMTP server as it appears on the server’s
TLS certificate.
4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of your SMTP server. This checkbox is
unselected by default.
For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.
95
Healthwatch for VMware Tanzu
2. For Alert receiver name, enter the name you want to give your PagerDuty alert receiver. The name
you enter in this field must match the name you specified in the route block you entered in the
Routing rules field in Configure Alerting above.
3. For Alert receiver configuration parameters, provide the configuration parameters for your
PagerDuty alert receiver in YAML format. Do not prefix the YAML with a dash. The following
example shows a possible set of configuration parameters:
url: https://api.pagerduty.com/api/v2/alerts
client: '{{ template "pagerduty.example.client" . }}'
client_url: '{{ template "pagerduty.example.clientURL" . }}'
description: '{{ template "pagerduty.example.description" .}}'
severity: 'error'
The properties you must include depend on both the PagerDuty instance for which you are
configuring an alert receiver and the needs of your Tanzu Operations Manager foundation. For more
information about the properties you can include in this configuration, see the Prometheus
documentation.
If you selected Events API v2 as your integration type in PagerDuty, enter your PagerDuty
integration key in Routing key.
If you selected Prometheus as your integration type in PagerDuty, enter your PagerDuty
integration key in Service key.
5. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the PagerDuty
API, configure one of the following options:
To configure the HTTP client to authenticate the PagerDuty API using basic authentication,
enter the username and password associated with the HTTP client in Basic authentication
credentials.
To configure the HTTP client to authenticate the PagerDuty API using a bearer token, enter
the bearer token associated with the HTTP client in Bearer token.
96
Healthwatch for VMware Tanzu
For more information about configuring an HTTP client for Alertmanager, see the
Prometheus documentation.
6. (Optional) To allow Alertmanager to communicate with your PagerDuty alert receiver over TLS,
configure the following fields:
1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to the PagerDuty API server.
2. For CA certificate for TLS, provide a certificate for the CA that the PagerDuty API server
uses to verify TLS certificates.
3. For PagerDuty server name, enter the name of the PagerDuty API server as it appears on
the server’s TLS certificate.
4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of the PagerDuty API server. This checkbox is
unselected by default.
For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.
2. For Alert receiver name, enter the name you want to give your Slack alert receiver. The name you
enter in this field must match the name you specified in the route block you entered in the Routing
rules field in Configure Alerting above.
3. (Optional) For Alert receiver configuration parameters, provide the configuration parameters for
your Slack alert receiver in YAML format. Do not prefix the YAML with a dash. The following
example shows a possible set of configuration parameters:
channel: '#operators'
username: 'Example Alerting Integration'
The properties you must include depend on both the Slack instance for which you are configuring
an alert receiver and the needs of your Tanzu Operations Manager foundation. For more information
about the properties you can include in this configuration, see the see the Prometheus
documentation.
97
Healthwatch for VMware Tanzu
parameters for your Slack alert receiver. You can configure these
properties in the next steps of this procedure.
4. For Slack API URL, enter the webhook URL for your Slack instance from your Slack app directory.
5. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the server for
your Slack instance, configure one of the following options:
To configure the HTTP client to authenticate the server for your Slack instance using basic
authentication, enter the username and password associated with the HTTP client in Basic
authentication credentials.
To configure the HTTP client to authenticate the server for your Slack instance using a
bearer token, enter the bearer token associated with the HTTP client in Bearer token.
For more information about configuring an HTTP client for Alertmanager, see the
Prometheus documentation.
6. (Optional) To allow Alertmanager to communicate with your Slack alert receiver over TLS, configure
the following fields:
1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to the server for your Slack instance.
2. For CA certificate for TLS, provide a certificate for the CA that the server for your Slack
instance uses to verify TLS certificates.
3. For Slack server name, enter the name of the server for your Slack instance as it appears
on the server’s TLS certificate.
4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of the server for your Slack instance. This
checkbox is unselected by default.
For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.
2. For Alert receiver name, enter the name you want to give your webhook alert receiver. The name
you enter in this field must match the name you specified in the route block you entered in the
Routing rules field in Configure Alerting above.
3. For Alert receiver configuration parameters, provide the configuration parameters for your
webhook alert receiver in YAML format. Do not prefix the YAML with a dash. The following example
shows a possible set of configuration parameters:
url: https://example.com/data/12345
98
Healthwatch for VMware Tanzu
max_alerts: 0
The properties you must include depend on both the webhook for which you are configuring an alert
receiver and the needs of your Tanzu Operations Manager foundation. For more information about
the properties you can include in this configuration, see the Prometheus documentation.
You can also configure custom alert receiver integrations that are not
natively supported by Alertmanager through webhook alert receivers. For
more information about configuring custom alert receiver integrations, see
the Prometheus documentation.
4. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the server that
processes your webhook, configure one of the following options:
To configure the HTTP client to authenticate the server that processes your webhook using
basic authentication, enter the username and password associated with the HTTP client in
Basic authentication credentials.
To configure the HTTP client to authenticate the server that processes your webhook using
a bearer token, enter the bearer token associated with the HTTP client in Bearer token.
For more information about configuring an HTTP client for Alermanager, see the
Prometheus documentation.
5. (Optional) To allow Alertmanager to communicate with your webhook alert receiver over TLS,
configure the following fields:
1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to the server that processes your webhook.
2. For CA certificate for TLS, provide a certificate for the CA that the server that processes
your webhook uses to verify TLS certificates.
3. For Webhook server name, enter the name of the server that processes your webhook as
it appears on the server’s TLS certificate.
4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of the server that processes your webhook. This
checkbox is unselected by default.
For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.
6. Click Save.
99
Healthwatch for VMware Tanzu
If you configure two or more alert receivers with the same name, Alertmanager merges them into a single
alert receiver. For example, if you configure:
One alert receiver named “Foundation” containing two email configurations and a PagerDuty
configuration
The example below shows how Alertmanager combines the alert receivers described above in its
configuration file:
receivers:
- name: 'Foundation'
email_configs:
- to: 'operator1@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnno
tations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
- to: 'operator2@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnno
tations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
pagerduty_configs:
- url: https://api.pagerduty.com/api/v2/alerts
client: '{{ template "pagerduty.example.client" . }}'
client_url: '{{ template "pagerduty.example.clientURL" . }}'
description: '{{ template "pagerduty.example.description" .}}'
severity: 'error'
- name: 'Clusters'
email_configs:
- to: 'operator1@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnno
tations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
Silence Alerts
Alertmanager includes a command-line tool called amtool. You can use amtool to temporarily silence
Alertmanager alerts without modifying your alerting rules. For more information about how to use amtool,
see the Alertmanager documentation on GitHub.
100
Healthwatch for VMware Tanzu
You can also use the Alertmanager UI to view and silence alerts. To access the Alertmanager UI, see
Viewing the Alertmanager UI in Troubleshooting Healthwatch.
1. SSH into one of the Prometheus VMs deployed by the Healthwatch tile. Alertmanager replicates
any changes you make in one Prometheus VM to all other Prometheus VMs. To SSH into one of
the Prometheus VMs, see the Tanzu Operations Manager documentation.
cd /var/vcap/jobs/alertmanager/packages/alertmanager/bin
This command returns a list of all currently running alerts that includes detailed information about
each alert, including the name of the alert and the Prometheus instance on which it runs.
You can also query the list of alerts by name and instance to view specific alerts.
Where ALERT-NAME is the name of the alert you want to silence. You can query the exact
name of the alert, or you can query a partial name and include the regular expression .* to
see all alerts that include the partial name, such as in the following example:
Where INSTANCE-NUMBER is the number of the Prometheus instance for which you want to
silence alerts.
Where:
INSTANCE-NUMBER is the number of the Prometheus instance for which you want to
silence an alert.
4. Run one of the following commands to silence either a specific alert or all alerts:
101
Healthwatch for VMware Tanzu
Where:
INSTANCE-NUMBER is the number of the Prometheus instance for which you want to
silence an alert.
Where:
~.+ is a regular expression that includes all alerts in the silence you set.
Where: * ALERT-NAME is the name of the alert you want to silence. * COMMENT is a note
about why the alert is being silenced.
5. Record the ID string from the output. You can use this ID to unmute the alert
Run:
7. For more information, run amtool --help or see the Alertmanager documentation on GitHub.
102
Healthwatch for VMware Tanzu
This topic describes how to monitor the expiration of VMware Tanzu Operations Manager certificates using
metrics collected by the Healthwatch Exporter for VMware Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) tiles.
Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy the
certificate expiration metric exporter VM, cert-expiration-exporter. The certificate expiration metric
exporter VM uses the om CLI to send a GET request with the query parameter ?expires_within=1y to the
/api/v0/deployed/certificates Tanzu Operations Manager API endpoint. The Tanzu Operations
Manager API then returns the expiration dates of all certificates that are due to expire within the next year.
The Prometheus instance in your Healthwatch deployment scrapes the certificate expiration metrics from
the certificate expiration metric exporter VM and sends them to Grafana. For more information about the
/api/v0/deployed/certificates endpoint, see the Tanzu Operations Manager API documentation.
You cannot configure the certificate expiration metric exporter VM to specify a different
time period when it sends a GET request to the /api/v0/deployed/certificates
endpoint.
If your BOSH Director deployment uses custom CAs, you can configure them in the Trusted Certificates
field in the Security pane of the BOSH Director tile. Configuring custom CAs in the Trusted Certificates
field allows all BOSH-deployed components in your deployment to trust custom root certificates. For more
information about this field, see the Tanzu Operations Manager documentation.
If any CAs or leaf certificates for your Tanzu Operations Manager foundation are due to expire soon, rotate
them before they expire to avoid downtime for your foundation. To rotate CAs and leaf certificates, see the
Tanzu Operations Manager documentation.
To configure a static IP address for the certificate expiration metric exporter VM, see the configuration topic
for your Healthwatch Exporter tile:
(Optional) Configure Tanzu Platform for Cloud Foundry Metric Exporter VMs.
103
Healthwatch for VMware Tanzu
This topic describes how to configure authentication with a User Account and Authentication (UAA) instance
on a different VMware Tanzu Operations Manager foundation for users to log in to the Grafana UI. This
configuration is in the context of Healthwatch for VMware Tanzu.
If you want to configure authentication with the UAA instance of a runtime that is installed on a different
Tanzu Operations Manager foundation, you must select Generic OAuth and configure it manually through
the Grafana Authentication pane.
1. Go to the Tanzu Ops Manager Installation Dashboard for the Tanzu Operations Manager foundation
with the UAA instance with which you want to configure authentication for the Grafana UI.
2. Click the VMware Tanzu Platform for Cloud Foundry or Tanzu Kubernetes Grid Integrated
Edition tile, depending on which runtime is installed on this Tanzu Operations Manager foundation.
4. View and record the credentials to log in to the UAA instance for the runtime installed on this Tanzu
Operations Manager foundation:
1. In the Admin Client Credentials row of the UAA section, click Link to
Credential.
2. Record the value of password. This value is the secret for Admin Client
Credentials.
For TKGI:
1. In the Pks Uaa Management Admin Client row, click Link to Credential.
2. Record the value of secret. This value is the secret for Pks Uaa Management
Admin Client.
5. Target the server for the UAA instance for the runtime installed on this Tanzu Operations Manager
foundation using the User Account and Authentication Command Line Interface (UAAC). Run:
104
Healthwatch for VMware Tanzu
Where UAA-URL is the URL of the UAA instance with which you want to configure authentication.
For UAA instances for Tanzu Platform for CF, this URL is usually https://login.SYSTEM-DOMAIN,
where SYSTEM-DOMAIN is the domain you configured in the System domain field in the Domains
pane of the Tanzu Platform for Cloud Foundry tile. For TKGI, this URL is usually https://TKGI-
API-URL:8443, where TKGI-API-URL is the URL of the TKGI API.
For more information about the UAAC, see the Tanzu Platform for Cloud Foundry documentation.
Where UAA-ADMIN-CLIENT-SECRET is the UAA administrator client secret you recorded from the
Credentials tab in the runtime tile in a previous step.
Where:
CLIENT-SECRET is the secret you want to set for the UAA client.
PROTOCOL is either http or https, depending on the protocol you configured the Grafana
instance to use in the Grafana pane of the Healthwatch tile.
GRAFANA-ROOT-URL is the root URL for the Grafana instance that you use to access the
Grafana UI.
8. If you are using TKGI, you must manually create UAA user groups to map to administrator, editor,
and viewer permissions for Grafana. Run:
If you are using Tanzu Platform for CF, you added the UAA client to UAA user groups mapped to
administrator, editor, and viewer permissions for Grafana in the previous step. Continue to the next
step.
9. Create a user account for the UAA client you created in a previous step to log in to the Grafana
instance. Run:
Where:
USERNAME is the username you want to set for the user account.
SECRET is the secret you want to set for the user account.
105
Healthwatch for VMware Tanzu
EMAIL is the email address you want to associate with the user account.
10. Assign user permissions to the user account you created in the previous step by running:
Where:
USERNAME is the username you set for the user account you created in the previous step.
1. Go to the Tanzu Ops Manager Installation Dashboard for the Tanzu Operations Manager foundation
with the Grafana instance for which you want to configure UAA authentication.
5. For Provider name, enter a name that identifies the UAA instance with which you want to configure
authentication. For example, UAA.
6. For Client ID, enter the client ID of the UAA client that was created for the UAA instance with which
you want to configure authentication in Create a UAA Client for the Grafana Instance above.
7. For Client secret, enter the client secret of the UAA client that was created for the UAA instance
with which you want to configure authentication in Create a UAA Client for the Grafana Instance
above.
9. For Authorization URL, enter the authorization URL for your runtime:
10. For Token URL, enter the token URL for your runtime:
11. For API URL, enter http://localhost:3002/userinfo. This is the URL of a local proxy server
that Healthwatch can use to translate the UAA token into a format that is compatible with Grafana.
106
Healthwatch for VMware Tanzu
12. To allow new users to create a new Grafana account when they log in with their existing UAA
credentials for the first time, select the Allow new accounts with existing OAuth credentials
checkbox. This checkbox is selected by default. Unselecting this checkbox prevents users without
a pre-existing Grafana account from creating a new Grafana account or logging in to the Grafana UI
with their existing UAA credentials.
13. For Role attribute path, enter the following JMESPath string to map users to Grafana roles:
contains(scope[*], 'healthwatch.admin') && 'Admin' || contains(scope[*],
'healthwatch.edit') && 'Editor' || contains(scope[*], 'healthwatch.read') &&
'Viewer'.
14. (Optional) To prevent users who are not mapped to a valid Grafana role from accessing the Grafana
UI, select the Deny access to users without Grafana roles checkbox. This checkbox is
unselected by default. Unselecting this checkbox assigns the Viewer role to users who cannot be
not mapped to a valid Grafana role by the string configured in the Role attribute path field.
15. (Optional) To allow the Grafana instance to communicate with the server for your OAuth provider
over TLS:
1. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that the
UAA instance with which you want to configure authentication uses to verify TLS
certificates. You must configure this field if the UAA instance with which you want to
configure authentication uses a TLS certificate that is signed by an untrusted authority.
The Prometheus instance detects and scrapes TKGI clusters by connecting to the Kubernetes API through
the TKGI API using a UAA client. To allow this, you must configure the Healthwatch tile, the Prometheus
instance in the Healthwatch tile, the UAA client that the Prometheus instance uses to connect to the TKGI
API, and the TKGI tile.
107
Healthwatch for VMware Tanzu
1. Configure the TKGI Cluster Discovery pane in the Healthwatch tile. For more information, see
Configure TKGI Cluster Discovery in Healthwatch below.
2. Configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters. For more
information, see Configure TKGI below.
If TKGI cluster discovery fails after you have completed both parts of the procedure in this topic, see
Troubleshooting TKGI Cluster Discovery Failure below.
To collect additional BOSH system metrics related to TKGI and view them in the Grafana
UI, you must install and configure the Healthwatch Exporter for TKGI on your Tanzu
Operations Manager foundations with TKGI installed. To install the Healthwatch Exporter
for TKGI tile, see Installing a Tile Manually. To configure the Healthwatch Exporter for
TKGI tile, see Configuring Healthwatch Exporter for TKGI.
On: This option allows TKGI cluster discovery and reveals the configuration fields
described in the steps below. TKGI cluster discovery is allowed by default when TKGI is
installed on your Tanzu Operations Manager foundation.
5. For Discovery interval, enter in seconds how frequently you want the Prometheus instance
detects and scrapes TKGI clusters. The minimum value is 60.
6. (Optional) To allow the Prometheus instance to communicate with the TKGI API over TLS,
configure one of the following options:
108
Healthwatch for VMware Tanzu
the TKGI API. This checkbox is unselected by default. VMware does not recommend
skipping TLS certificate verification in a production environment.
7. Click Save.
Configure TKGI
After you configure TKGI cluster discovery in the Healthwatch tile, you must configure TKGI to allow the
Prometheus instance to scrape metrics from TKGI clusters.
To configure TKGI:
5. Select the Include etcd metrics checkbox to allow TKGI to send etcd server and debugging
metrics to Healthwatch.
6. Select the Include Kubernetes Controller Manager metrics checkbox to allow TKGI to send
Kubernetes Controller Manager metrics to Healthwatch.
7. If you are using TKGI v1.14.2 or later, select the Include Kubernetes Scheduler metrics
checkbox to allow TKGI to send Kubernetes Scheduler metrics to Healthwatch.
8. For Setup Telegraf Outputs, provide the following TOML configuration file:
[[outputs.prometheus_client]]
listen = ":10200"
metric_version = 2
You must use 10200 as the listening port to allow the Prometheus instance to scrape Telegraf
metrics from your TKGI clusters. For more information about creating a configuration file in TKGI,
see the TKGI documentation.
If you are configuring TKGI v1.12 or earlier, remove metric_version = 2 from the
TOML configuration file. TKGI v1.12 and earlier are out of support. Consider
upgrading to at least v1.17, which is currently the oldest supported version.
9. Click Save.
2. For (Optional) Add-ons - Use with caution, enter the following YAML snippet to create
the roles required to allow the Prometheus instance to scrape metrics from your TKGI
clusters:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
109
Healthwatch for VMware Tanzu
metadata:
name: healthwatch
rules:
- resources:
- pods/proxy
- pods
- nodes
- nodes/proxy
- namespace/pods
- endpoints
- services
verbs:
- get
- watch
- list
apiGroups:
- ""
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: healthwatch
roleRef:
apiGroup: ""
kind: ClusterRole
name: healthwatch
subjects:
- apiGroup: ""
kind: User
name: healthwatch
If (Optional) Add-ons - Use with caution already contains other API resource definitions,
append the above YAML snippet to the end of the existing resource definitions, followed by
a newline character.
13. Ensure that the Upgrade all clusters errand is running. Running this errand configures your TKGI
clusters with the roles you created in the (Optional) Add-ons - Use with caution field of the plans
you monitor in a previous step.
110
Healthwatch for VMware Tanzu
This topic describes the Healthwatch for VMware Tanzu component configuration file properties that you
configure through the Healthwatch tile.
Through the Healthwatch tile, you configure properties in the configuration files through the following panes:
Prometheus. For more information, see Configuring the the Prometheus Configuration File below.
Alertmanager. For more information, see Configuring the Alertmanager Configuration File below.
Grafana. For more information, see Configuring the Grafana Configuration File below.
For more information about configuring these properties, see Configure Prometheus in Configuring
Healthwatch and the Prometheus documentation.
Remote Write
111
Healthwatch for VMware Tanzu
The following table lists which configuration options in the Remote Write pane in the Healthwatch tile
configure properties in the Prometheus configuration file:
For more information about configuring these properties, see Configure Remote Write in Configuring
Healthwatch and the Prometheus documentation.
112
Healthwatch for VMware Tanzu
Alert receiver configuration See Configure an Email Alert Receiver in receivers - email_configs
parameters Configuring Alerting and the Prometheus
documentation.
Certificate and private key for cert_file, key_file receivers - email_configs - tls_config
TLS
Alert receiver configuration See Configure a Slack Alert Receiver in receivers - slack_configs
parameters Configuring Alerting and the Prometheus
documentation.
113
Healthwatch for VMware Tanzu
Certificate and private key for cert_file, key_file receivers - slack_configs - http_config
TLS - tls_config
Alert receiver configuration See Configure a Webhook Alert Receiver receivers - webhook_configs
parameters in Configuring Alerting and the
Prometheus documentation.
For more information about configuring these properties, see Configuring Alerting and the Prometheus
documentation.
114
Healthwatch for VMware Tanzu
Generic OAuth
UAA
LDAP
115
Healthwatch for VMware Tanzu
Email alerts
For more information about configuring these properties, see Configure Grafana in Configuring Healthwatch,
Configuring Grafana Authentication, and the Grafana documentation.
116
Healthwatch for VMware Tanzu
This topic describes how to access and use your Healthwatch for VMware Tanzu dashboards in the Grafana
user interface (UI).
Healthwatch uses Grafana to visualize metrics data in charts and graphs. Once you have installed and
configured the Healthwatch and Healthwatch Exporter tiles, you can log in to the Grafana UI to view
dashboards that show how your Tanzu Operations Manager foundations are performing.
Each dashboard contains charts and graphs called panels. Each dashboard and panel contains detailed
descriptions of the metrics they display and how to troubleshoot your Tanzu Operations Manager
foundations based on those metrics.
A - Filters: Use these dropdowns to filter the metrics that the dashboard displays. Not all
dashboards have filters.
B - Dashboard Header: Click this header to see a description of the metrics that the dashboard
displays, how to use those metrics for troubleshooting, and links to further documentation.
C - Information Icon: Hover over this icon to see a description of the metrics that the panel
displays and how to use those metrics for troubleshooting.
117
Healthwatch for VMware Tanzu
You can edit a dashboard by copying the dashboard and editing the dashboard copy. To further customize
your dashboards, see the Grafana documentation.
1. In your web browser, navigate to the URL that you configured in the DNS entry for the Grafana
instance. For more information, see Configuring DNS for the Grafana Instance.
2. Follow one of the procedures below to log in to the Grafana UI, according to the authentication
method you configured in the Healthwatch tile:
For Password, enter the password for the Grafana UI that you find in the
Healthwatch tile. To find the password for the Grafana UI:
1. Go to the Tanzu Ops Manager Installation Dashboard.
5. Record the value of password. This value is the password that all
users must use to log in to the Grafana UI.
Generic OAuth: On the Grafana UI login page, click OAuth to log in with your OAuth
credentials.
UAA: On the Grafana UI login page, click UAA to log in with your UAA credentials.
LDAP: On the Grafana UI login page, click LDAP to log in with your LDAP credentials.
For more information about configuring an authentication method for the Grafana UI, see
Configuring Grafana Authentication.
3. On the left side of the Grafana UI homepage, hover over the Dashboards icon on the menu bar. A
navigation menu appears.
Click either the name or the expansion arrow on a folder to expand a list of the dashboards
it contains.
Click the list icon to view a single list of all dashboards contained in all folders. The folder
icon is selected by default and groups your dashboards by folder in expandable and
collapsible lists.
118
Healthwatch for VMware Tanzu
For detailed descriptions of each default dashboard in the Grafana UI, see Default Dashboards in the
Grafana UI below.
The following list describes the default Healthwatch dashboards you may see in the Grafana UI:
All Jobs: Contains metrics related to the percentage of healthy VMs in your BOSH
Director, runtime, Healthwatch, and Healthwatch Exporter tile deployments. You can filter
these metrics by deployment.
BOSH Director Health: Contains metrics related to the health of the BOSH Director.
Canary App Health: Contains metrics related to the availability of the canary app that the
Blackbox Exporter uses to run canary tests. You can filter these metrics by canary URL.
Certificate Expiration: Contains metrics related to when certificates for your Tanzu
Operations Manager deployment are due to expire. For more information, see Monitoring
Certificate Expiration.
Job Details: Contains metrics related to the functionality of each component and VM in
your runtime, Healthwatch, and Healthwatch Exporter tile deployments. You can filter these
metrics by health status, deployment, job instance, and VM. If your Tanzu Operations
Manager foundation has neither Tanzu Platform for Cloud Foundry or VMware Tanzu
Kubernetes Grid Integrated Edition (TKGI) installed, you see this dashboard upon logging in
to the Grafana UI.
Ops Manager Health: Contains metrics related to the availability of your Tanzu Operations
Manager deployment.
System at a Glance: Contains an overview of metrics related to the health of your Tanzu
Operations Manager deployment. You see this dashboard upon logging in to the Grafana
UI.
119
Healthwatch for VMware Tanzu
This folder only appears if you configure Healthwatch to monitor your Tanzu SQL
for VMs tile. For more information, see (Optional) Configure Grafana Dashboards in
Configuring Healthwatch.
MySQL Overview: Contains metrics related to the percentage of healthy VMware Tanzu
SQL with MySQL for VMs (Tanzu SQL for VMs) clusters and nodes in your deployment.
This folder only appears if you configure Healthwatch to monitor your Tanzu
RabbitMQ tile. For more information, see (Optional) Configure Grafana Dashboards
in Configuring Healthwatch.
This folder only appears if you install and configure Healthwatch Exporter for
Tanzu Platform for Cloud Foundry on one or more of your Tanzu Operations
Manager foundations. For more information about Healthwatch Exporter, see
Healthwatch Exporter for Tanzu Platform for Cloud Foundry Architecture in
Healthwatch Architecture and Configuring Healthwatch Exporter for Tanzu Platform
for Cloud Foundry.
App Instances: Contains metrics related to the number and functionality of apps and tasks
running on your Tanzu Platform for CF deployments.
120
Healthwatch for VMware Tanzu
CLI Health: Contains metrics related to the functionality of the Cloud Foundry Command-
Line Interface (cf CLI).
Diego/Capacity: Contains metrics related to the capacity of all Diego Cells in your Tanzu
Platform for CF deployments. You can filter these metrics by memory chunk size and disk
size.
Logging and Metrics Pipeline: Contains metrics related to the health and functionality of
the Firehose in your Tanzu Platform for CF deployments.
Router: Contains metrics related to the health and functionality of the Gorouters in your
Tanzu Platform for CF deployments.
Tanzu Platform for Cloud Foundry SLOs: Contains metrics related to the availability of
the cf push command and the canary apps for your Tanzu Platform for CF deployments.
You can filter these metrics by cf push uptime SLO target and canary app uptime SLO
target.
TAS MySQL Health: Contains metrics related to the health of the MySQL databases for
your Tanzu Platform for CF deployments.
UAA: Contains metrics related to the health and functionality of the UAA instances in your
Tanzu Platform for CF deployments.
Usage Service: Contains metrics related to the health and functionality of the UAA
instances in your Tanzu Platform for CF deployments.
If you do not see any data in the Usage Service dashboard, make sure
that you have configured the Metric Registrar in the Tanzu Platform for
Cloud Foundry tile. If you do not configure the Metric Registrar, the Usage
Service API cannot emit metrics to the Loggregator Firehose, and
Healthwatch Exporter for Tanzu Platform for Cloud Foundry cannot collect
them. For more information, see the Tanzu Platform for Cloud Foundry
documentation.
This folder appears only if you installed and configured Healthwatch Exporter for
Tanzu Kubernetes Grid Integrated Edition (TKGI) on one or more of your Tanzu
Operations Manager foundations. For more information about Healthwatch Exporter
for TKGI, see Healthwatch Exporter for TKGI Architecture in Healthwatch
Architecture and Configuring Healthwatch Exporter for TKGI.
Kubernetes API Server: Contains metrics related to the functionality of the Kubernetes
API Server instances in your TKGI clusters. You can filter these metrics by cluster and
instance.
Kubernetes Cluster Detail: Contains metrics related to the functionality of the nodes in
your TKGI clusters. You can filter these metrics by cluster. If your Tanzu Operations
Manager foundation has only TKGI installed, you see this dashboard upon logging in to the
Grafana UI.
121
Healthwatch for VMware Tanzu
Kubernetes Etcd: Contains metrics related to the functionality of the etcd instances in
your TKGI clusters. You can filter these metrics by cluster and instance.
Kubernetes Nodes: Contains metrics related to the functionality of your TKGI cluster
nodes. You can filter these metrics by cluster and instance.
TKGI Control Plane: Contains metrics related to the functionality of the TKGI Control
Plane.
122
Healthwatch for VMware Tanzu
Healthwatch Metrics
This topic describes the metrics that the Healthwatch Exporter and the Healthwatch Exporter for VMware
Tanzu Kubernetes Grid Integrated Edition (TKGI) generate.
BOSH SLIs
Platform Metrics
Each metric exporter VM exposes these metrics and SLIs on a Prometheus exposition endpoint, /metrics.
The Prometheus instance that exists within your metrics monitoring system then scrapes each /metrics
endpoints on the metric exporter VMs and imports those metrics into your monitoring system. You can
configure the frequency at which the Prometheus instance scrapes the /metrics endpoints in the
Prometheus pane of the Healthwatch for VMware Tanzu tile. To configure the scrape interval for the
Prometheus instance, see Configure Prometheus in Configuring Healthwatch.
The name of each metric is in PromQL format. For more information, see the Prometheus documentation.
BOSH SLIs
In a VMware Tanzu Operations Manager foundation, the BOSH Director manages the VMs that each tile
deploys. If the BOSH Director fails or is not responsive, the VMs that the BOSH Director manages also fail.
Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy two
VMs that continuously test the functionality of the BOSH Director: the BOSH health metric exporter VM and
the BOSH deployment metric exporter:
123
Healthwatch for VMware Tanzu
The BOSH health metric exporter VM, bosh-health-exporter, creates a BOSH deployment called bosh-
health every ten minutes. This BOSH deployment deploys another VM, bosh-health-check, that runs a
suite of SLI tests to validate the functionality of the BOSH Director. After the SLI tests are complete, the
BOSH health metric exporter VM collects the metrics from the bosh-health-check VM, then deletes the
bosh-health deployment and the bosh-health-check VM.
The following table describes each metric the BOSH health metric exporter VM generates:
Metric Description
bosh_sli_duration_seconds_bucket{exported_job="bosh- The number of seconds the BOSH health SLI test suite
health-exporter"} takes to run, grouped by how many ran in less than a
certain amount of time. This metric is also called a
bucket of BOSH health SLI test suite duration metrics.
bosh_sli_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all BOSH
health-exporter"} health SLI test suite duration metric buckets.
bosh_sli_failures_total{exported_job="bosh-health- The total number of times the BOSH health SLI test
exporter"} suite fails. A failed test suite is one in which any number
of tests within the test suite fail.
bosh_sli_runs_total{exported_job="bosh-health- The total number of times the BOSH health SLI test
exporter"} suite runs. To see the failure rate of
bosh_sli_runs_total{exported_job="bosh-
health-exporter"}, divide the value of
bosh_sli_failures_total{exported_job="bosh-
health-exporter"} by the value of
bosh_sli_runs_total{exported_job="bosh-
health-exporter"}.
bosh_sli_task_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all task
health-exporter"} duration metric buckets.
124
Healthwatch for VMware Tanzu
The following table describes each metric the BOSH deployment metric exporter VM generates:
Metric Description
bosh_sli_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all BOSH
deployments-exporter"} deployment check duration metric buckets.
125
Healthwatch for VMware Tanzu
bosh_sli_task_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all task
deployments-exporter"} duration metric buckets.
Platform Metrics
Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy VMs
that generate metrics regarding the health of several Tanzu Operations Manager and runtime components.
You can use the following Platform Metrics metrics to calculate percent availability and error budgets:
Prometheus VM
126
Healthwatch for VMware Tanzu
Developers create and manage apps on Tanzu Platform for Cloud Foundry using the Cloud Foundry
Command Line Interface (cf CLI). Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys the
Tanzu Platform for Cloud Foundry SLI exporter VM, pas-sli-exporter, which continuously tests the
functionality of the cf CLI.
The following table describes each metric the Tanzu Platform for Cloud Foundry SLI exporter VM generates:
Metric Description
tas_sli_duration_seconds_sum The total value of the duration metrics across all Tanzu
Platform for CF SLI test suite duration metric buckets.
tas_sli_task_duration_seconds_sum The total value of the duration metrics across all task
duration metric buckets.
127
Healthwatch for VMware Tanzu
The following table describes each metric the TKGI SLI exporter VM generates:
Metric Description
tkgi_sli_duration_seconds_bucket The number of seconds the TKGI SLI test suite takes
to run, grouped by how many ran in less than a certain
amount of time. This metric is also called a bucket of
TKGI SLI test suite duration metrics.
tkgi_sli_duration_seconds_sum The total value of the duration metrics across all TKGI
SLI test suite duration metric buckets.
tkgi_sli_exporter_status The health status of the TKGI SLI exporter VM. A value
of 1 indicates that the TKGI SLI exporter VM is running
and healthy.
tkgi_sli_failures_total The total number of times the TKGI SLI test suite fails.
128
Healthwatch for VMware Tanzu
tkgi_sli_run_duration_seconds The number of seconds the TKGI SLI test suite takes
to run.
tkgi_sli_runs_total The total number of times the TKGI SLI test suite runs.
To see the failure rate of tkgi_sli_runs_total, divide
the value of tkgi_sli_failures_total by the value
of tkgi_sli_runs_total.
tkgi_sli_task_duration_seconds_sum The total value of the duration metrics across all task
duration metric buckets.
The following table describes the metric the certificate expiration metric exporter VM generates:
Metric Description
129
Healthwatch for VMware Tanzu
Prometheus VM
In the Canary URLs pane of the Healthwatch tile, you configure target URLs to which the Blackbox
Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge
the overall health and accessibility of an app, runtime, or deployment.
On the Prometheus VM, tsdb, the Blackbox Exporter job, blackbox-exporter, generates canary test
metrics.
The following table describes each metric the Blackbox Exporters in the Prometheus instance generates:
Metric Description
probe_dns_serial The serial number of the DNS zone for your canary
target URL.
probe_http_ssl Whether the canary test used TLS for the final redirect.
A value of 0 indicates that the canary test did not use
TLS for the final redirect. A value of 1 indicates that the
canary test did use TLS for the final redirect.
130
Healthwatch for VMware Tanzu
probe_ssl_last_chain_expiry_timestamp_seconds The last TLS chain expiration for the canary test URL
in Unix time.
probe_ssl_last_chain_info Information about the TLS leaf certificate for the canary
test URL.
probe_tls_version_info The TLS version the canary test uses, or NaN when
unknown.
The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to
the Loggregator Firehose. For more information about SVMs related to Healthwatch component metrics, see
SVM Forwarder VM - Healthwatch Component Metrics below.
The following table describes each platform metric the SVM Forwarder VM sends to the Loggregator
Firehose:
131
Healthwatch for VMware Tanzu
Metric Description
health_check_bosh_director_success Whether the BOSH SLI test suite that the BOSH health
metric exporter VM ran succeeded or failed. A value of
0 indicates that the BOSH SLI test suite failed. A value
of 1 indicates that the BOSH SLI test suite succeeded.
132
Healthwatch for VMware Tanzu
Healthwatch Exporter for Tanzu Platform for Cloud Foundry Metric Exporter VMs
133
Healthwatch for VMware Tanzu
The following table describes each metric the TKGI metric exporter VM collects and converts:
Metric Description
healthwatch_boshExporter_ingressLatency_seconds_sum The total value of the metrics across all ingress latency
metric buckets.
Each of the following metric exporter VMs collects and converts a single metric type from the Loggregator
Firehose. The names of the metric exporter VMs correspond to the types of metrics they collect and
convert:
134
Healthwatch for VMware Tanzu
The counter metric exporter VM, pas-exporter-counter, collects counter metrics from the Loggregator
Firehose and converts them into a Prometheus exposition format.
The following table describes each metric the counter metric exporter VM collects and converts:
Metric Description
The gauge metric exporter VM, pas-exporter-gauge, collects gauge metrics from the Loggregator
Firehose and converts them into a Prometheus exposition format.
The following table describes each metric the gauge metric exporter VM collects and converts:
Metric Description
The following table describes each metric the /metrics endpoint on each metric exporter VM generates:
135
Healthwatch for VMware Tanzu
Metric Description
The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to
the Loggregator Firehose. For more information about SVMs related to platform metrics, see SVM
Forwarder VM - Platform Metrics above.
The following table describes each Healthwatch component metric the SVM Forwarder VM sends to the
Loggregator Firehose:
Metric Description
136
Healthwatch for VMware Tanzu
Troubleshooting Healthwatch
This topic describes how to troubleshoot problems and known issues that may arise when deploying or
operating Healthwatch for VMware Tanzu, Healthwatch Exporter for VMware Tanzu Platform for Cloud
Foundry, and Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI).
1. Run:
bosh deployments
This command returns a list of all BOSH deployments that are currently running.
3. Run:
Where DEPLOYMENT-NAME is the name of your Healthwatch deployment that you recorded in the
previous step.
8. Record the certificate and private key for Tsdb Client Mtls.
137
Healthwatch for VMware Tanzu
9. Add the certificate and private key for Tsdb Client Mtls that you recorded in the previous step to
the keystore for your operating system.
2. Create a cert.key file containing the Tsdb Client Mtls private key.
3. Change the access permissions on the certificate and private key files to 0600.
For example:
4. To import the Tsdb Client Mtls private key into the macOS keychain:
5. To import the Tsdb Client Mtls certificate into the macOS keychain:
10. In a web browser, navigate to localhost:9090. If your browser prompts you to specify which
certificate to use for mTLS, select the certificate you added to your operating system keystore.
On macOS:
3. In the displayed dialog, select the Tsdb Client Mtls certificate you added to the
macOS keychain and click OK.
4. When prompted, provide the keychain access password, and select Always
Allow.
138
Healthwatch for VMware Tanzu
1. Run:
bosh deployments
This command returns a list of all BOSH deployments that are currently running.
3. Run:
Where DEPLOYMENT-NAME is the name of your Healthwatch deployment that you recorded in the
previous step.
The Smoke Tests errand fails because the Prometheus instance fails to scrape metrics from the Grafana
instance. Potential causes of this failure include:
There is a network issue between the Prometheus instance and Grafana instance.
The Grafana instance uses a certificate that does not match the certificate authority (CA) you
configured in the Grafana pane in the Healthwatch tile. This could occur because the CA you
configured in the Grafana pane is either a self-signed certificate or a different CA from the one that
generated the certificate. As a result, the Prometheus instance does not trust the certificate that
the Grafana instance uses. For more information about configuring a CA for the Grafana instance,
see (Optional) Configure Grafana in Configuring Healthwatch.
To find out why the Prometheus instance fails to scrape metrics from the Grafana instance:
1. Log in to one of the VMs in the Prometheus instance by following the procedure in the Tanzu
Operations Manager documentation.
139
Healthwatch for VMware Tanzu
The lastError field in the command output describes the reason for the Prometheus instance
failing to scrape the Grafana instance.
This error appears when the TKGI metric exporter VM cannot verify that the certificate chain of the UAA
server for the BOSH Director is valid. To allow the TKGI metric exporter VM to connect to the BOSH
Director, you must correct any certificate chain errors.
To check for certificate chain errors in the UAA server for the BOSH Director:
1. Log in to the TKGI metric exporter VM by following the procedure in the Tanzu Operations Manager
documentation.
4. Run:
If the command returns an OK message, the certificate is trusted and has a valid certificate chain. If
the command returns any other message, see the OpenSSL documentation to troubleshoot.
140
Healthwatch for VMware Tanzu
13959.admin, bosh.teams.p-healthwatch2-pas-exporter-b3a337d7ec4cca94f166.admin"}'
This occurs because both Healthwatch Exporter tiles deploy a BOSH health metric exporter VM, and both
BOSH health metric exporter VMs are named bosh-health-exporter. This causes the two sets of metrics
to conflict with each other.
To address this, you must scale the BOSH health metric exporter VM down to zero instances in one of the
Healthwatch Exporter tiles.
To scale the BOSH health metric exporter VM down to zero instances in one of the Healthwatch Exporter
tiles:
2. Click the Healthwatch Exporter for Tanzu Kubernetes Grid - Integrated tile or Healthwatch Exporter
for Tanzu Platform for Cloud Foundry tile.
4. In the Bosh Health Exporter row, select 0 from the Instances dropdown.
5. Click Save.
To fix this issue, either upgrade to Healthwatch v2.2.1 or configure an OIDC provider as the identity provider
for your Kubernetes clusters in the TKGI tile. This cleans up the service accounts that the TKGI SLI
exporter VM creates in future TKGI SLI tests, but does not clean up existing service accounts from
previous TKGI SLI tests. For more information about configuring an OIDC provider in TKGI, see the TKGI
documentation.
VMware recommends that you manually delete existing service accounts from previous TKGI SLI tests if
running the tkgi get-credentials command returns an error similar to the following example:
Error: Status: 500; ErrorMessage: nil; Description: Create Binding: Timed out waiting
for secrets; ResponseError: nil
Manually deleting service accounts also deletes the secrets and Clusterrolebindings associated with
those service accounts.
141
Healthwatch for VMware Tanzu
Where:
NAMESPACE is the namespace that contains the service account you want to delete.
To fix this issue, either upgrade to Healthwatch v2.2.1 or manually clean up the snapshots. To manually
clean up the snapshots:
1. Log in to the Prometheus VM you want to clean up by following the procedure in the Tanzu
Operations Manager documentation.
2. Run:
sudo -i
rm -rf /var/vcap/store/prometheus/snapshots/*
cd /var/vcap/store/prometheus/snapshots
ls
To find out why the Prometheus instance fails to scrape metrics from your TKGI clusters, see Diagnose
Prometheus Scrape Job Failure below.
You are using TKGI v1.10.0 or v1.10.1. For more information, see No Data on Kubernetes Nodes
Dashboard for TKGI v1.10 below.
You are using TKGI v1.12. For more information, see No Data on Kubernetes Nodes Dashboard for
TKGI v1.12 in Healthwatch Release Notes.
142
Healthwatch for VMware Tanzu
You are using TKGI to monitor Windows clusters. For more information, see No Data on
Kubernetes Nodes Dashboard for Windows Clusters in Healthwatch Release Notes.
The Prometheus instance in the Healthwatch tile could not detect or create scrape jobs for the
clusters, causing TKGI cluster discovery to fail. For more information, see Configure DNS for Your
TKGI Cluster below.
To find out why the Prometheus instance fails to scrape metrics from your TKGI clusters:
1. Log in to one of the VMs in the Prometheus instance by following the procedure in the Tanzu
Operations Manager documentation.
3. Find the scrape jobs for your TKGI clusters. The lastError field describes the reason for the
Prometheus instance failing to scrape your TKGI clusters.
To fix this issue, upgrade to TKGI v1.10.2 or later. For more information about upgrading to TKGI v1.10.2 or
later, see the TKGI documentation.
Where:
143
Healthwatch for VMware Tanzu
This occurs because the TKGI API cannot access your TKGI clusters from the Internet. To resolve this
issue, you must configure a DNS entry for the control plane of each of your TKGI clusters in the console for
your IaaS.
To configure DNS entries for the control planes of your TKGI clusters:
1. Find the IP addresses and hostnames of the control plane of each of your TKGI clusters. For more
information, see the TKGI documentation.
2. Record the Kubernetes Master IP(s) and Kubernetes Master Host from the output you viewed in
the previous step. For more information, see the TKGI documentation.
4. For each TKGI cluster, find the public IP address of the VM that has an internal IP address
matching the Kubernetes Master IP(s) you recorded in a previous step. For more information, see
the documentation for your IaaS:
AWS: To find the public IP address of a Linux instance, see the AWS documentation for
Linux instances of Amazon EC2. To find the public IP address for a Windows instance, see
the AWS documentation for Windows instances of Amazon EC2.
Azure: To create or view the public IP address for an Azure VM, see the Azure
documentation.
GCP: To find the public IP address for a GCP VM, see the GCP documentation.
vSphere: To find the public IP address of a vSphere VM, see the vSphere documentation.
5. For each TKGI cluster, create an A record in your DNS server that points to the public IP address of
the control plane of the TKGI cluster that you recorded in the previous step. For more information,
see the documentation for your IaaS:
AWS: For more information about configuring a DNS entry in the Amazon VPC console,
see the AWS documentation.
Azure: For more information about configuring an A record in Azure DNS, see the Azure
documentation.
GCP: For more information about adding an A record to Cloud DNS, see the GCP
documentation.
OpenStack: For more information about configuring a DNS entry in the OpenStack internal
DNS, see the OpenStack documentation.
vSphere: For more information about configuring a DNS entry in the vCenter Server
Appliance, see the vSphere documentation.
144
Healthwatch for VMware Tanzu
Up: The current health of the Prometheus endpoint on the metric exporter VM. A value of 1
indicates that the Prometheus endpoint is healthy. A value of 0 or missing data indicates that either
the Prometheus endpoint is unresponsive or the Prometheus instance failed to scrape the
Prometheus endpoint. For more information, see the Prometheus documentation.
Exporter SLO: The percentage of time that the Healthwatch Exporter tile was up and running over
the selected time period.
Error Budget Remaining: How many minutes are left in the error budget before exceeding the
selected Uptime SLO Target over the selected time period.
Minutes of Downtime: How many minutes the Healthwatch Exporter tiles were down during the
selected time period.
145
Healthwatch for VMware Tanzu
CPU Usage: A graph of the cpu_usage_user metric, showing the percentage of CPU used over the
selected time period. You can use this graph to determine whether the amount of CPU used by
Healthwatch Exporter for Tanzu Platform for Cloud Foundry is reaching capacity.
146
Healthwatch for VMware Tanzu
This topic describes the Healthwatch for VMware Tanzu tile components and the resource requirements for
the Healthwatch tile.
For information about the metric exporter VMs that the Healthwatch Exporter for VMware Tanzu Platform for
Cloud Foundry and Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) tiles
deploy, see Healthwatch Metrics.
Prometheus: The Prometheus instance scrapes and stores metrics from the Healthwatch Exporter
tiles, allows you to configure alerts with Alertmanager, and sends canary tests to target URLs with
Blackbox Exporter.
Grafana: The Healthwatch tile exports collected metrics to dashboards in the Grafana UI, allowing
you to visualize the data with charts and graphs and create customized dashboards for long-term
monitoring and troubleshooting.
MySQL: MySQL is used only to store your Grafana settings and does not store any time series
data.
MySQL Proxy: MySQL Proxy routes client connections to healthy MySQL nodes.
The Healthwatch tile automatically selects the instance type that is best suited for each job based on the
available resources for your deployment.
For more information about Healthwatch components, see Healthwatch Component VMs and Resource
Requirements for the Healthwatch Tile below.
By default, the Healthwatch tile deploys two Prometheus VMs and only one each of the Grafana, MySQL,
and MySQL Proxy VMs. For information about scaling these resources, see Scale Healthwatch below.
147
Healthwatch for VMware Tanzu
Prometheus tsdb Collects metrics related to the functionality of platform-level and runtime-level
components
MySQL Proxy pxc-proxy Routes client connections to healthy MySQL nodes and away from unhealthy MySQL
nodes
Prometheus 2 4 16 GB 5 GB 512 GB
Grafana 1 1 4 GB 5 GB 5 GB
MySQL 1 1 4 GB 5 GB 10 GB
MySQL Proxy 1 1 4 GB 5 GB 5 GB
Scale Healthwatch
By default, the Healthwatch tile deploys two Prometheus VMs, one Grafana VM, one MySQL VM, and one
MySQL Proxy VM.
148
Healthwatch for VMware Tanzu
2. Scale Healthwatch tile resources. You can scale Healthwatch tile resources either vertically or
horizontally:
Prometheus: You can scale the Prometheus instance vertically only. Do not scale
Prometheus horizontally.
MySQL: If you make Grafana HA, VMware recommends scaling your MySQL instance to
three VMs.
MySQL Proxy: If you make Grafana HA, VMware recommends scaling your MySQL Proxy
instance to two VMs.
To remove Grafana from your Healthwatch deployment, set the number of Grafana,
MySQL, and MySQL Proxy instances to `0`. For more information, see Remove
Grafana from Healthwatch below.
For more information about vertical and horizontal scaling, see Scaling platform capacity in the Tanzu
Platform for Cloud Foundry documentation.
1. Complete the steps in Scale Healthwatch Component VMs with the following HA configurations:
Prometheus: Healthwatch deploys two Prometheus VMs by default. With two VMs in the
Prometheus instance, Prometheus and Alertmanager are HA by default.
149
Healthwatch for VMware Tanzu
If you remove Grafana from your Healthwatch deployment, scale Grafana, MySQL, and
MySQL Proxy to 0 at the same time. MySQL is used only to store Grafana settings and
MySQL Proxy is used only to route client connections to healthy MySQL nodes in an HA
Grafana deployment. Neither component is necessary if you have not deployed Grafana.
150