KEMBAR78
Healthwatch | PDF | Virtual Machine | Software
0% found this document useful (0 votes)
51 views150 pages

Healthwatch

tanzu healthwatch

Uploaded by

Shalabi Shalabie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views150 pages

Healthwatch

tanzu healthwatch

Uploaded by

Shalabi Shalabie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

Healthwatch for VMware

Tanzu
Healthwatch for VMware Tanzu 2.3
Healthwatch for VMware Tanzu

You can find the most up-to-date technical documentation on the VMware by Broadcom website at:

https://techdocs.broadcom.com/

VMware by Broadcom
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com

Copyright © 2025 Broadcom. All Rights Reserved. The term “Broadcom” refers to Broadcom Inc. and/or its
subsidiaries. For more information, go to https://www.broadcom.com. All trademarks, trade names, service marks,
and logos referenced herein belong to their respective companies.

2
Healthwatch for VMware Tanzu

Contents
Healthwatch for VMware Tanzu ................................. 8
Product Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Overview of the Healthwatch Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Overview of the Healthwatch Exporter tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Healthwatch Exporter for Tanzu Platform for Cloud Foundry . . . . . . . . . . . . . . . . . . 10
Healthwatch Exporter for TKGI Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Healthwatch v2.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Assumed Risks of Using Healthwatch v2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Healthwatch v2.3 Release Notes ................................ 12


v2.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
v2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
v2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
v2.3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
New Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Healthwatch exporter dashboard showing no data . . . . . . . . . . . . . . . . . . . . . . . . 15

Healthwatch Architecture ....................................... 16


Healthwatch Tile Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Component Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Networking Rules for the Healthwatch Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Healthwatch Exporter for Tanzu Platform for Cloud Foundry Architecture . . . 18
Networking Rules for Healthwatch Exporter for Tanzu Platform for Cloud Foundry . . 18
Healthwatch Exporter for TKGI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Networking Rules for Healthwatch Exporter for TKGI . . . . . . . . . . . . . . . . . . . . . . . 19
Configuration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Monitoring Tanzu Platform for Cloud Foundry on a Single Tanzu Operations Manager
Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Monitoring TKGI on a Single Tanzu Operations Manager Foundation . . . . . . . . . . . . 20
Monitoring Tanzu Platform for Cloud Foundry and TKGI on a Single Tanzu Operations
Manager Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Configure Multi-Foundation Monitoring Using Direct Scraping . . . . . . . . . . . . . . . . . 21
Configure Multi-Foundation Monitoring Using Federation . . . . . . . . . . . . . . . . . . . . 21

Upgrading Healthwatch ......................................... 22


Installing ....................................................... 23
Installing a Tile Manually ....................................... 23
Install a Tile Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Installing, Configuring, and Deploying a Tile Through an


Automated Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3
Healthwatch for VMware Tanzu

Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Download and Install a Tile Using Platform Automation . . . . . . . . . . . . . . . . . . 25
Configure and Deploy your Tile Using the om CLI . . . . . . . . . . . . . . . . . . . . . . . 25

Configuring ..................................................... 28
Configuring Healthwatch ........................................ 28
Configure the Healthwatch Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Assign AZs and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Configure Prometheus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
(Optional) Configure Alertmanager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
(Optional) Configure Grafana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
(Optional) Configure Grafana Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
(Optional) Configure Grafana Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
(Optional) Configure Canary URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
(Optional) Configure Remote Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
(Optional) Configure TKGI Cluster Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(Optional) Configure Errands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(Optional) Configure Syslog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(Optional) Configure Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
(Optional) Configure for OpenTelemetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Deploy Healthwatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Configuring Healthwatch Exporter for Tanzu Platform for Cloud


Foundry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Configure the Healthwatch Exporter for Tanzu Platform for Cloud Foundry
Tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Assign AZs and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
(Optional) Configure Tanzu Platform for Cloud Foundry Metric Exporter VMs . . . . . . 46
Configure the BOSH Health Metric Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
(Optional) Configure the BOSH Deployment Metric Exporter VM . . . . . . . . . . . . . . . 48
Create a UAA Client for the BOSH Deployment Metric Exporter VM . . . . . . . . . . . . . 48
(Optional) Configure Errands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
(Optional) Configure Syslog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
(Optional) Configure Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Deploy Healthwatch Exporter forTanzu Platform for Cloud Foundry . . . . . . . . 53
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for
Cloud Foundry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for Cloud
Foundry in Healthwatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for Cloud
Foundry in an External Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Configuring Healthwatch Exporter for TKGI ..................... 54


Configure the Healthwatch Exporter for TKGI Tile . . . . . . . . . . . . . . . . . . . . . . 56
Assign AZs and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
(Optional) Configure TKGI and Certificate Expiration Metric Exporter VMs . . . . . . . . 56
(Optional) Configure the TKGI SLI Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Configure the BOSH Health Metric Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4
Healthwatch for VMware Tanzu

(Optional) Configure the BOSH Deployment Metric Exporter VM . . . . . . . . . . . . . . . 58


Create a UAA Client for the BOSH Deployment Metric Exporter VM . . . . . . . . . . . . . 59
(Optional) Configure Errands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
(Optional) Configure Syslog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
(Optional) Configure Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Deploy Healthwatch Exporter for TKGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Configure a Scrape Job for Healthwatch Exporter for TKGI . . . . . . . . . . . . . . . 63
Configure a Scrape Job for Healthwatch Exporter for TKGI in Healthwatch . . . . . . . . 64
Configure a Scrape Job for Healthwatch Exporter for TKGI in an External Monitoring
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Configuring Multi-Foundation Monitoring ........................ 64


Configuring Multi-Foundation Monitoring Through Direct Scraping . . . . . . . . . . 65
Configuring Multi-Foundation Monitoring Through Federation . . . . . . . . . . . . . 65

Configuring Direct Scraping for Multi-Foundation Monitoring .... 66


Configure Direct Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Configuring Federation for Multi-Foundation Monitoring ......... 71


Set up your Multi-Foundation Tanzu Platform for Cloud Foundry Deployment . 72
Set up your Multi-Foundation TKGI Deployment . . . . . . . . . . . . . . . . . . . . . . . . 73
Configure Scrape Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Automating the Healthwatch tile configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Test your Federation Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Federation for a High Availability Healthwatch Deployment . . . . . . . . . . . . . . . 76

Configuring TKGI Cluster Discovery ............................. 77


Overview of TKGI Cluster Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Configure TKGI Cluster Discovery in Healthwatch . . . . . . . . . . . . . . . . . . . . . . . 77
Configure TKGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Troubleshooting TKGI Cluster Discovery Failure . . . . . . . . . . . . . . . . . . . . . . . . 80

Configuring DNS for the Grafana Instance ....................... 80


Overview of DNS for the Grafana instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Configure DNS for a Load Balancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Configure DNS for a Single Grafana VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Creating a Firewall Policy for the Grafana Instance ............. 82


Overview of Firewall Policies for the Grafana Instance . . . . . . . . . . . . . . . . . . 82
Create a Firewall Policy in AWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Create a Firewall Policy in Azure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Create a Firewall Policy in GCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Create a Firewall Policy in vSphere NSX-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Configuring Grafana Authentication ............................. 86


Overview of Grafana Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Configuring Basic Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Allow Basic Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Disallow Basic Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Configuring Other Authentication Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Configure Generic OAuth Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5
Healthwatch for VMware Tanzu

Configure UAA Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90


Configure LDAP Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Optional Configuration .......................................... 92


Configuring Alerting ............................................ 92
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Configure Alerting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Configure Alert Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Configure an Email Alert Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Configure a PagerDuty Alert Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Configure a Slack Alert Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Configure a Webhook Alert Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Combining Alert Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Silence Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Monitoring Certificate Expiration .............................. 102


Overview of Certificate Expiration Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . 103
Reserving a Static IP Address for the Certificate Expiration Metric Exporter
VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Configuring Authentication with a UAA Instance on a Different


Tanzu Operations Manager Foundation . . . . . . . . . . . . . . . . . . . . . . . . 103
Overview of UAA Authentication with the Grafana UI . . . . . . . . . . . . . . . . . . . 104
Create a UAA Client for the Grafana Instance . . . . . . . . . . . . . . . . . . . . . . . . 104
Configure the Grafana UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Configuring TKGI Cluster Discovery ............................ 107


Overview of TKGI Cluster Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Configure TKGI Cluster Discovery in Healthwatch . . . . . . . . . . . . . . . . . . . . . . 108
Configure TKGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Troubleshooting TKGI Cluster Discovery Failure . . . . . . . . . . . . . . . . . . . . . . . 110

Configuration File Reference Guide ............................ 110


Overview of Configuration Files in Healthwatch . . . . . . . . . . . . . . . . . . . . . . . 111
Configuring the Prometheus Configuration File . . . . . . . . . . . . . . . . . . . . . . . 111
Prometheus Configuration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Remote Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Configuring the Alertmanager Configuration File . . . . . . . . . . . . . . . . . . . . . . 112
Configuring the Grafana Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Using Healthwatch Dashboards in the Grafana UI .............. 117


Using Dashboards in the Grafana UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
View Your Healthwatch Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Default Dashboards in the Grafana UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Healthwatch Metrics .......................................... 123


Overview of Healthwatch Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
BOSH SLIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
BOSH Health Metric Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
BOSH Deployment Metric Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Platform Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6
Healthwatch for VMware Tanzu

Tanzu Platform for Cloud Foundry SLI Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . 126


TKGI SLI Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Certificate Expiration Metric Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Prometheus VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
SVM Forwarder VM - Platform Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Healthwatch Component Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
TKGI Metric Exporter VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Healthwatch Exporter for Tanzu Platform for Cloud Foundry Metric Exporter VMs . . 134
Prometheus Exposition Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
SVM Forwarder VM - Healthwatch Component Metrics . . . . . . . . . . . . . . . . . . . . . 136

Troubleshooting Healthwatch .................................. 137


Accessing VM UIs for Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Access the Prometheus UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Access the Alertmanager UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Troubleshooting Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Smoke Tests Errand Fails When Deploying Healthwatch . . . . . . . . . . . . . . . . . . . . 139
TKGI Metric Exporter VM Fails to Connect to the BOSH Director . . . . . . . . . . . . . . . 140
BOSH Health Metrics Cause Errors When Two Healthwatch Exporter Tiles Are
Installed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts . . . . . . . 141
BBR Backup Snapshots Fill Disk Space on Prometheus VMs . . . . . . . . . . . . . . . . . 142
Troubleshooting Missing TKGI Cluster Metrics . . . . . . . . . . . . . . . . . . . . . . . . 142
Diagnose Prometheus Scrape Job Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
No Data on Kubernetes Nodes Dashboard for TKGI v1.10 . . . . . . . . . . . . . . . . . . . 143
Configure DNS for Your TKGI Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Troubleshooting Healthwatch Exporter Tiles Using Grafana UI Dashboards . 145
Viewing Healthwatch Exporter Tile Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Troubleshooting Healthwatch Exporter for Tanzu Platform for Cloud Foundry . . . . . 145

Healthwatch Components and Resource Requirements ........ 147


Overview of Healthwatch Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Healthwatch Component VMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Resource Requirements for the Healthwatch Tile . . . . . . . . . . . . . . . . . . . . . . 148
Scale Healthwatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Scale Healthwatch Component VMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Configure Healthwatch for High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Remove Grafana from Healthwatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7
Healthwatch for VMware Tanzu

Healthwatch for VMware Tanzu

This topic provides an overview of Healthwatch for VMware Tanzu features and functionality. For
information about new features and breaking changes, see the Healthwatch Release Notes.

Tanzu Application Service is now called Tanzu Platform for Cloud Foundry. The current
version of Tanzu Platform for Cloud Foundry is 10.x.

Healthwatch allows you to monitor metrics related to the functionality of your Tanzu Operations Manager
platform.

A complete Healthwatch installation includes the Healthwatch tile, and at least one Healthwatch Exporter
tile. There are Healthwatch Exporter tiles for both the VMware Tanzu Platform for Cloud Foundry and
VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) runtimes.

You must install a Healthwatch Exporter tile on each Tanzu Operations Manager foundation you want to
monitor. You can install the Healthwatch tile on the same foundation or on a different foundation, depending
on your desired monitoring configuration.

You can also configure the Healthwatch Exporter tiles to expose metrics to a service or database located
outside your Tanzu Operations Manager foundation, such as an external time-series database (TSDB) or an
installation of the Healthwatch tile on a separate Tanzu Operations Manager foundation. This does not
require you to install the Healthwatch tile on the same Tanzu Operations Manager foundation as the
Healthwatch Exporter tiles.

For a detailed explanation of the Healthwatch architecture, a list of open ports required for each component,
and possible configurations for monitoring metrics with Tanzu Operations Manager or an external service or
database, see Healthwatch Architecture.

Product Snapshot
The following table provides version and version support information about Healthwatch:

Element Compatible versions

Version v2.3.3

Release date July 22, 2025

Tanzu Operations Manager versions v3.1

TAS for VMs versions v4.0, v6.0

Tanzu Platform for Cloud Foundry v10.x


(Tanzu Platform for CF)

TKGI versions v1.16 to v1.22

8
Healthwatch for VMware Tanzu

Element Compatible versions

IaaS support AWS, Azure, GCP, OpenStack, vSphere

Only supported versions are listed in the table.


Tanzu Operations Manager 2.10 is compatible, but is out of support.
TAS for VMs v5.0 is compatible, but is out of support.

Overview of the Healthwatch Tile


The Healthwatch tile monitors metrics from one or more Tanzu Operations Manager foundations by scraping
them from Healthwatch Exporter tiles installed on each foundation.

Healthwatch deploys instances of Prometheus and Grafana. The Prometheus instance scrapes and stores
metrics from the Healthwatch Exporter tiles and allows you to configure alerts with Alertmanager.

Healthwatch then exports the collected metrics to dashboards in the Grafana UI, allowing you to visualize
the data with charts and graphs and create customized dashboards for long-term monitoring and
troubleshooting.

Healthwatch includes the following features:

Prometheus:

Scrapes /metrics endpoints for Healthwatch Exporter tiles, collecting metrics related to
the functionality of platform and runtime-level components that include the following:
Service level indicators (SLIs) for the BOSH Director

SLIs for Tanzu Platform for CF components

SLIs for TKGI components

When Tanzu Operations Manager certificates are due to expire

Canary URL tests for Tanzu Platform for CF apps

Counter, gauge, and container metrics for Tanzu Platform for CF from the
Loggregator Firehose

Super value metrics (SVMs)

BOSH system metrics for TKGI

VMs deployed by Healthwatch Exporter tiles

Stores metrics for up to six weeks

Can write to remote storage in addition to its local TSDB

Grafana: Allows you to visualize the collected metrics data in charts and graphs, as well as create
customized dashboards for easier monitoring and troubleshooting

Alertmanager: Manages and sends alerts according to the alerting rules you configure

Overview of the Healthwatch Exporter tile


The Healthwatch Exporter tile can be used with the two supported runtimes:

9
Healthwatch for VMware Tanzu

Tanzu Platform for Cloud Foundry

Tanzu Kubernetes Grid Integrated Edition

Healthwatch Exporter for Tanzu Platform for Cloud Foundry


The Healthwatch Exporter tile for Tanzu Platform for Cloud Foundry deploys metric exporter VMs to
generate each type of metric related to the health of your Tanzu Platform for CF deployment.

Healthwatch Exporter sends metrics through the Loggregator Firehose to a Prometheus exposition endpoint
on the associated metric exporter VMs. The Prometheus instance that exists within your metrics monitoring
system then scrapes the exposition endpoints on the metric exporter VMs and imports those metrics into
your monitoring system.

Healthwatch Exporter for Tanzu Platform for CF exposes the following metrics related to the functionality of
Tanzu Platform for CF components, Tanzu Platform for CF apps, and the Healthwatch Exporter tile:

SLIs for Tanzu Platform for CF components

Canary URL tests for Tanzu Platform for CF apps

Counter, gauge, and container metrics for Tanzu Platform for CF from the Loggregator Firehose

SVMs

VMs deployed by Healthwatch Exporter for Tanzu Platform for Cloud Foundry

Healthwatch Exporter for TKGI Tile


The Healthwatch Exporter for TKGI tile deploys metric exporter VMs to generate SLIs related to the health
of your TKGI deployment.

The Prometheus instance that exists within your metrics monitoring system then scrapes the Prometheus
exposition endpoints on the metric exporter VMs and imports those metrics into your monitoring system.

Healthwatch Exporter for TKGI exposes the following metrics related to the functionality of TKGI
components and the Healthwatch Exporter for TKGI tile:

SLIs for TKGI components

BOSH system metrics for TKGI

VMs deployed by Healthwatch Exporter for TKGI

Healthwatch v2.3 Limitations


Healthwatch v2.3 has the following limitation:

Currently the OpenTelemetry Collector is co-located on the Tanzu Platform for Cloud Foundry VMs
defined within the Tanzu Platform for Cloud Foundry tile, which means that it can only collect
metrics from VMs running within the Tanzu Platform for Cloud Foundry tile. As a result, metrics
from other service tiles are not available in Healthwatch when you enable the OpenTelemetry
Collector.

Assumed Risks of Using Healthwatch v2.3


The following problem can arise when using Healthwatch v2.3:

10
Healthwatch for VMware Tanzu

With Healthwatch v2.3, you can collect metrics using an OpenTelemetry Collector, but be aware
that the OpenTelemetry Collector used in Tanzu Platform for Cloud Foundry is the Beta version.

11
Healthwatch for VMware Tanzu

Healthwatch v2.3 Release Notes

This topic contains release notes for Healthwatch for VMware Tanzu v2.3.

For information about the risks and limitations of Healthwatch v2.3, see Assumed Risks of Using
Healthwatch v2.3 and Healthwatch v2.3 Limitations.
For more information about the new v2.3 features, see New Features.

Tanzu Application Service is now called Tanzu Platform for Cloud Foundry. The current
version of Tanzu Platform for Cloud Foundry is 10.x.

v2.3.3
Release Date: June 22, 2025

[Known Issue] If you have only TKGI v1.16 or later installed, the Healthwatch Exporter
troubleshooting dashboard in the Grafana UI shows no data.

Dependencies updated in this release:

bpm-release: Updated to v1.4.20

bosh-cli: Updated to v1.4.20

loggregator-agent-release: Updated to v1.4.20

TKGi-cli: Updated to v1.22.0

pxc-release: Updated to v1.0.40

Stemcell: Changed Stemcell to latest Ubuntu Jammy 1.842

Healthwatch v2.3.3 uses the following open-source component versions:

Component Packaged Version

Prometheus 2.55.0

Grafana 10.3.10

Alertmanager 0.27.0

PXC 1.0.40

v2.3.2
Release Date: April 15, 2025

12
Healthwatch for VMware Tanzu

[Feature] OpenTelemetry (OTel) dashboards are now added into Grafana default dashboards (if
OTel is enabled).

[Bug Fix] Duplicate VM details in the Foundation Job details dashboard is now fixed for Tanzu
Platform for Cloud Foundry.

[Known Issue] If you have only TKGI v1.16 or later installed, the Healthwatch Exporter
troubleshooting dashboard in the Grafana UI shows no data, see No Data for Healthwatch
Exporter troubleshooting.

[Security Fix] The following CVE was fixed by upgrading Grafana release version:
CVE-2024-8118

Healthwatch v2.3.2 uses the following open-source component versions:

Component Packaged Version

Prometheus 2.55.0

Grafana 10.3.10

Alertmanager 0.27.0

PXC 1.0.34

v2.3.1
Release Date: October 3, 2024

[Feature] Healthwatch 2.3.1 can now receive metrics through OpenTelemetery (OTel), even if
Firehose is enabled.

[Bug Fix] Duplicate VM details in Foundation Job details dashboard is now fixed.

[Known Issue] If you have only TKGI v1.16 or later installed, the Healthwatch Exporter
troubleshooting dashboard in the Grafana UI shows no data, see No Data for Healthwatch
Exporter troubleshooting.

v2.3.0
Release Date: June 15th, 2024

[Breaking Changes] Applicable only when using the OpenTelemetry Collector. You will notice
changes in the Healthwatch metrics when you start collecting metrics using the OpenTelemetry
Collector. The changes you come across are listed below:

Metrics from the Firehose-specific components, such as reverse log proxy, are not
available after enabling the OpenTelemetry Collector.

Currently, the OpenTelemetry Collector is collocated on TAS VMs and only collects metrics
from these VMs. As a result, metrics from other tiles are not collected by the
OpenTelemetry Collector.

The Prometheus exporter in the OpenTelemetry Collector does not allow leading
underscore, except in private metrics. Therefore, the OpenTelemetry Collector removes
any leading underscores in metric names.

13
Healthwatch for VMware Tanzu

The SVM forwarder VM does not work with OpenTelemetry. You need to switch off the
SVM forwarder VM if it is switched on.

The metrics listed in the table below have the prefix grafana at the beginning of the metric
name when you use the OpenTelemetery Collector.

Metrics name with Firehose Metric name with the OpenTelemetry Collector

access_evaluation_duration_bucket grafana_access_evaluation_duration_bucket

access_evaluation_duration_count grafana_access_evaluation_duration_count

access_evaluation_duration_sum grafana_access_evaluation_duration_sum

access_permissions_duration_bucket grafana_access_permissions_duration_bucket

access_permissions_duration_count grafana_access_permissions_duration_count

access_permissions_duration_sum grafana_access_permissions_duration_sum

[Feature] Healthwatch 2.3.0 supports using the OpenTelemetry Collector (instead of Firehose) to
collect metrics For more information, see Configure for OpenTelemetry. Once the OpenTelemetry
Collector is configured, Healthwatch automatically switches to collecting metrics from the
OpenTelemetery Collector. Healthwatch expects OpenTelemetry Collector data on port 65331.

[Feature] Healthwatch 2.3.0 is upgraded to Grafana 10 and provides the new features offered by
Grafana 10.

[Security Fix] The following CVE was fixed by upgrading Grafana release version: CVE-2023-
49569

Healthwatch v2.3.0 uses the following open-source component versions:

Component Packaged Version

Prometheus 2.52.0

Grafana 10.1.10

Alertmanager 0.27.0

PXC 1.0.29

New Features
Healthwatch v2.3 includes the following features:

Runs a suite of service level indicator (SLI) tests to test the functionality of the TKGI API and
collects metrics from those tests in the TKGI Control Plane dashboard in the Grafana UI. For
more information, see TKGI SLI Exporter VM in Healthwatch Metrics.

Separates Diego capacity metrics by isolation segment in the Diego/Capacity dashboard in the
Grafana UI.

No longer displays metrics from BOSH smoke test deployments in the Jobs and Job Details
dashboards in the Grafana UI.

14
Healthwatch for VMware Tanzu

Allows you to include optional dashboards for the VMware Tanzu RabbitMQ for VMs (Tanzu
RabbitMQ) and VMware Tanzu for MySQL on Cloud Foundry tiles in the Grafana UI.

Allows you to monitor super value metrics (SVMs). For more information about SVMs, see
Configure Prometheus in Configuring Healthwatch, SVM Forwarder VM - Healthwatch Component
Metrics in Healthwatch Metrics, and SVM Forwarder VM - Platform Metrics in Healthwatch
Metrics.

Automatically detects which version of Tanzu Platform for CF or TKGI is installed on your Tanzu
Operations Manager foundation and creates the appropriate dashboard in the Grafana UI.

Allows you to use the Tanzu Operations Manager syslog forwarding feature to forward log
messages from Healthwatch component VMs to an external destination for troubleshooting, such
as a remote server or external syslog aggregation service. For more information about how to
configure syslog forwarding, see (Optional) Configure Syslog in Configuring Healthwatch.

Known Issues
This section contains known issues you might encounter.

Healthwatch exporter dashboard showing no data


This known issue is applicable if you have TKGi installed in your TAS for VMs/Tanzu Platform for Cloud
Foundry environment. Metrics for the pks-sli-exporter VM will not be shown in the troubleshooting
dashboard. However, users will continue to see metrics from pas-sli-exporter.

15
Healthwatch for VMware Tanzu

Healthwatch Architecture

This topic describes the architecture of the Healthwatch for VMware Tanzu, Healthwatch Exporter for
VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu Kubernetes Grid
Integrated Edition (TKGI) tiles. This topic also describes the possible configurations for monitoring metrics
across multiple VMware Tanzu Operations Manager foundations.

There are three tiles that form the Healthwatch architecture: Healthwatch, Healthwatch Exporter for Tanzu
Platform for Cloud Foundry, and Healthwatch Exporter for TKGI.

A complete Healthwatch installation includes the Healthwatch tile, as well as at least one Healthwatch
Exporter tile. However, you can deploy and use each tile separately as part of an alternate monitoring
configuration.

You must install a Healthwatch Exporter tile on each Tanzu Operations Manager foundation you want to
monitor. You can install the Healthwatch tile on the same Tanzu Operations Manager foundation or on a
different Tanzu Operations Manager foundation, depending on your desired monitoring configuration.

You can also configure the Healthwatch Exporter tiles to expose metrics to a service or database located
outside your Tanzu Operations Manager foundation, such as an external time-series database (TSDB) or an
installation of the Healthwatch tile on a separate Tanzu Operations Manager foundation. This does not
require you to install the Healthwatch tile.

For a detailed explanation of the architecture for each tile, a list of open ports required for each component,
and the possible configurations for monitoring metrics across Tanzu Operations Manager foundations, see
the following sections:

Healthwatch Tile Architecture

Healthwatch Exporter for Tanzu Platform for Cloud Foundry Architecture

Healthwatch Exporter for TKGI Architecture

Configuration Options

Healthwatch Tile Architecture


When you install the Healthwatch tile, Healthwatch deploys instances of Prometheus, Grafana, and
MySQL. Healthwatch also deploys an Nginx proxy in front of the Prometheus instance for load-balancing.

The Prometheus instance scrapes and stores metrics from the Prometheus endpoints on the metric
exporter VMs that the Healthwatch Exporter tiles deploy. Prometheus also allows you to configure alerts
with Alertmanager.

Healthwatch then exports these metrics to dashboards in the Grafana UI, where you can see the data in
charts and graphs. You can also use Grafana to create customized dashboards for long-term monitoring and
troubleshooting.

16
Healthwatch for VMware Tanzu

The MySQL instance that the Healthwatch tile deploys stores only your Grafana settings,
and does not store any time-series data.

The diagram below illustrates how metrics travel from the Healthwatch Exporter tiles through Prometheus
and to Grafana. It also shows how metrics travel through Prometheus to Alertmanager.

High Availability
You can deploy the Healthwatch tile in high availability (HA) mode with three MySQL nodes and two MySQL
Proxy nodes, or in non-HA mode with one MySQL node and one MySQL Proxy node.

Component Scaling
Healthwatch deploys two Prometheus VMs by default to create an HA Prometheus instance. If you do not
need Prometheus to be HA, you can scale the Prometheus instance vertically to one Prometheus VM. To
further scale the Prometheus instance, you can scale it horizontally by increasing the disk size of each VM
in the Prometheus instance.

Healthwatch deploys a single Grafana VM by default. If you want to make the Grafana instance HA, you
can scale the Grafana instance horizontally.

If you do not want to use any Grafana instances in your Healthwatch deployment, you can set the number
of Grafana, MySQL, and MySQL Proxy instances for your Healthwatch deployment to 0 in the Resource
Config pane of the Healthwatch tile.

For more information about scaling Healthwatch resources, see Healthwatch Components and Resource
Requirements.

Networking Rules for the Healthwatch Tile


The following table describes the ports you must open for each Healthwatch component:

This component ... Must communicate with ... Default TCP Port Notes

17
Healthwatch for VMware Tanzu

grafana tsdb 4449 Additional networking rules may be


required for any external connections
pxc-proxy 3306
listed. For example, 443 for UAA.
External alerting
URLs

External data
sources

External
authentication

External SMTP
server

blackbox-exporter External canary target URLs N/A Additional networking rules may be
required, depending on your external
canary target URL configuration.

tsdb blackbox- 9090

exporter

All VMs deployed


by Healthwatch
Exporter tiles

tsdb (for TKGI cluster For each cluster: 8443 You need to open these ports only if
discovery) Kube API Server you configure TKGI cluster discovery.
10252
Kube Controller
10251
Manager
10200
Kube Scheduler

etcd (Telegraf
output plugin)

Healthwatch Exporter for Tanzu Platform for Cloud Foundry


Architecture
The Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile deploys metric exporter VMs to
generate each type of metric related to the health of your Tanzu Platform for Cloud Foundry deployment.

Healthwatch Exporter for Tanzu Platform for Cloud Foundry sends metrics through the Loggregator Firehose
to a Prometheus exposition endpoint on the associated metric exporter VMs. The Prometheus instance in
your metrics monitoring system then scrapes the exposition endpoints on the metric exporter VMs and
imports those metrics into your monitoring system.

You can scale the VMs that Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys vertically,
but you should not scale them horizontally.

Networking Rules for Healthwatch Exporter for Tanzu Platform for


Cloud Foundry
The following table describes the ports you must open for each Healthwatch Exporter for Tanzu Platform for
Cloud Foundry component:

18
Healthwatch for VMware Tanzu

This component ... Must communicate with ... Default TCP Port

bosh-deployments-exporter BOSH Director UAA 8443

BOSH Director 25555

bosh-health-exporter BOSH Director UAA 8443

BOSH Director 25555

cert-expiration-exporter Tanzu Operations Manager 443

pas-exporter-counter Reverse Log Proxy (RLP) nozzle 8082

pas-exporter-gauge RLP nozzle 8082

tsdb OpenTelemetry Collector 65331

pas-sli-exporter CAPI 443

UAA 443

Healthwatch Exporter for TKGI Architecture


The Healthwatch Exporter for TKGI tile deploys metric exporter VMs to generate SLIs related to the health
of your TKGI deployment.

The Prometheus instance in your metrics monitoring system then scrapes the Prometheus exposition
endpoints on the metric exporter VMs and imports those metrics into your monitoring system.

You can scale the VMs that Healthwatch Exporter for TKGI deploys vertically, but you should not scale
them horizontally.

Networking Rules for Healthwatch Exporter for TKGI


The following table describes the ports you must open for each Healthwatch Exporter for TKGI component:

This component ... Must communicate with ... Default TCP Port

bosh-deployments-exporter BOSH Director UAA 8443

BOSH Director 25555

bosh-health-exporter BOSH Director UAA 8443

BOSH Director 25555

cert-expiration-exporter Tanzu Operations Manager 443

pks-sli-exporter TKGI API UAA 8443

TKGI API 9021

Configuration Options

19
Healthwatch for VMware Tanzu

Healthwatch can be configured in multiple ways, allowing you to monitor metrics across a variety of
platform and foundation configurations. The sections below describe the most common configuration
scenarios:

Monitoring Tanzu Platform for Cloud Foundry on a Single Tanzu Operations Manager Foundation

Monitoring TKGI on a Single Tanzu Operations Manager Foundation

Monitoring Tanzu Platform for Cloud Foundry and TKGI on a Single Tanzu Operations Manager
Foundation

Configure Multi-Foundation Monitoring Using Direct Scraping

Configure Multi-Foundation Monitoring Using Federation

Monitoring Tanzu Platform for Cloud Foundry on a Single Tanzu


Operations Manager Foundation
If you only want to monitor a single Tanzu Operations Manager foundation that has Tanzu Platform for
Cloud Foundry installed, install the Healthwatch tile and Healthwatch Exporter on the same foundation. The
Healthwatch tile automatically detects Healthwatch Exporter on the same Tanzu Operations Manager
foundation and adds a scrape job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry to the
Prometheus instance.

For more information about installing and configuring the Healthwatch tile and Healthwatch Exporter for
Tanzu Platform for Cloud Foundry, see the following topics:

Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated
Pipeline

Configuring Healthwatch

Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry

Monitoring TKGI on a Single Tanzu Operations Manager Foundation


If you only want to monitor a single Tanzu Operations Manager foundation that has TKGI installed, install
the Healthwatch tile and Healthwatch Exporter for TKGI on the same foundation. The Healthwatch tile
automatically detects Healthwatch Exporter for TKGI on the same Tanzu Operations Manager foundation
and adds a scrape job for Healthwatch Exporter for TKGI to the Prometheus instance.

For more information about installing and configuring the Healthwatch tile and Healthwatch Exporter for
TKGI, see the following topics:

Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated
Pipeline

Configuring Healthwatch

Configuring Healthwatch Exporter for TKGI

Monitoring Tanzu Platform for Cloud Foundry and TKGI on a Single


Tanzu Operations Manager Foundation
If you only want to monitor a single Tanzu Operations Manager foundation that has both Tanzu Platform for
Cloud Foundry and TKGI installed, install the Healthwatch tile, Healthwatch Exporter for Tanzu Platform for

20
Healthwatch for VMware Tanzu

Cloud Foundry, and Healthwatch Exporter for TKGI on the same foundation. The Healthwatch tile
automatically detects Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter
for TKGI on the same Tanzu Operations Manager foundation and adds scrape jobs for both Healthwatch
Exporter tiles to the Prometheus instance.

For more information about installing and configuring the Healthwatch tile, Healthwatch Exporter for Tanzu
Platform for Cloud Foundry, and Healthwatch Exporter for TKGI, see the following topics:

Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated
Pipeline

Configuring Healthwatch

Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry

Configuring Healthwatch Exporter for TKGI

Configure Multi-Foundation Monitoring Using Direct Scraping


You can monitor several Tanzu Operations Manager foundations from a single monitoring Tanzu Operations
Manager foundation using direct scraping.

When you configure direct scraping for your multi-foundation Healthwatch deployment, the Prometheus
instance in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes metrics
directly from the metric exporter VMs deployed by the Healthwatch Exporter tiles installed on the Tanzu
Operations Manager foundation you monitor.

To configure your Healthwatch deployment to monitor several Tanzu Operations Manager foundations from
a single monitoring Tanzu Operations Manager foundation using direct scraping, see Configure Multi-
Foundation Monitoring Using Direct Scraping.

Configure Multi-Foundation Monitoring Using Federation


You can monitor several Tanzu Operations Manager foundations from a single monitoring Tanzu Operations
Manager foundation using federation.

When you configure federation for your multi-foundation Healthwatch deployment, the Prometheus instance
in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes a subset of metrics
from the Prometheus instances in the Healthwatch tiles installed on the Tanzu Operations Manager
foundations you monitor.

To configure your Healthwatch deployment to monitor several Tanzu Operations Manager foundations from
a single monitoring Tanzu Operations Manager foundation using federation, see Configure Multi-Foundation
Monitoring Using Federation.

21
Healthwatch for VMware Tanzu

Upgrading Healthwatch

This topic describes how to upgrade Healthwatch for VMware Tanzu.

Healthwatch uses the open-source components Prometheus, Grafana, and Alertmanager to scrape, store,
view metrics, and configure alerts.

For information about the new Healthwatch features, see the Healthwatch Release Notes.

To upgrade Healthwatch:

1. Review the limitations and risks of using of Healthwatch. See:

Healthwatch Limitations

Assumed Risks of Using Healthwatch

2. Review the configuration options for Healthwatch to determine which tiles you must install on your
Tanzu Operations Manager foundations. For more information, see Configuration Options in
Healthwatch Architecture.

3. Install the Healthwatch tile and Healthwatch Exporter tiles on the Tanzu Operations Manager
foundations you want to monitor according to the configuration you identified in the previous step.
For more information about installing the Healthwatch tile and Healthwatch Exporter tiles, see the
topic that applies to your configuration:

Installing a Tile Manually

Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

4. Configure the Healthwatch component VMs through the Healthwatch tile and Healthwatch Exporter
tile UIs in the Tanzu Operations Manager Installation Dashboard and deploy the tiles.
For more information about configuring and deploying the tiles, see:

Configuring Healthwatch

Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry

Configuring Healthwatch Exporter for TKGI

22
Healthwatch for VMware Tanzu

Installing

The topics in this section describe how to install the Healthwatch for VMware Tanzu, Healthwatch Exporter
for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu Kubernetes
Grid Integrated Edition (TKGI) tiles:

Installing a Tile Manually

Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

Installing a Tile Manually


This topic describes how to manually install the Healthwatch for VMware Tanzu, Healthwatch Exporter for
VMware Tanzu, and Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI)
tiles.

To install, configure, and deploy these tiles through an automated pipeline, see Installing,
Configuring, and Deploying a Tile Through an Automated Pipeline.

To manually install the Healthwatch and Healthwatch Exporter tiles, you must download and install the tiles
from the Broadcom Support Portal.

After you have installed the Healthwatch and Healthwatch Exporter tiles, you can configure them through
the Tanzu Ops Manager Installation Dashboard. See Next Steps after completing the installation.

There are risks to using Healthwatch, including missed email notifications, overwritten
dashboards, and minor data loss during upgrades. For more information about how to
prepare for or prevent these problems, see Assumed Risks of Using Healthwatch.

Install a Tile Manually


To install the Healthwatch, Healthwatch Exporter for Tanzu Platform for Cloud Foundry, or Healthwatch
Exporter for TKGI tile:

1. Go to the Healthwatch release you want on the Broadcom Support Portal.

2. Click the name of the tile you want to install to download the .pivotal file for the tile.

3. Log in to the Tanzu Ops Manager Installation Dashboard.

4. Click Import a Product.

5. Select the .pivotal file that you downloaded from Broadcom Support Portal.

23
Healthwatch for VMware Tanzu

6. Click Open. If the tile is successfully uploaded, it appears in the product list beneath the Import a
Product button.

7. After the upload is complete, click the + icon next to the tile listing to add the tile to your staging
area.

For more information, see the Tanzu Operations Manager documentation.

Next Steps
After you have successfully installed the Healthwatch tile and the Healthwatch Exporter tiles for the Tanzu
Operations Manager foundations you want to monitor, continue to one of the following topics to configure
each of the tiles you installed:

To configure and deploy the Healthwatch tile, see Configuring Healthwatch.

To configure and deploy the Healthwatch Exporter tile for Tanzu Platform for Cloud Foundry, see
Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

To configure and deploy the Healthwatch Exporter for TKGI tile, see Configuring Healthwatch
Exporter for TKGI.

Installing, Configuring, and Deploying a Tile Through an


Automated Pipeline
This topic describes how to install, configure, and deploy the Healthwatch for VMware Tanzu, Healthwatch
Exporter for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu
Kubernetes Grid Integrated Edition (TKGI) tiles through an automated pipeline.

To install the Healthwatch and Healthwatch Exporter tiles manually, see Installing a Tile
Manually.

Automated pipelines allow you to install, configure, and deploy Tanzu Operations Manager tiles through
automation scripts. For more information, see the Platform Automation documentation.

There are risks to using Healthwatch, including missed email notifications, overwritten
dashboards, and minor data loss during upgrades. For more information about how to
prepare for or prevent these problems, see Assumed Risks of Using Healthwatch.

Prerequisites
Before you use an automated pipeline to install, configure, and deploy a tile, you must have the following:

An existing Concourse pipeline. For an example pipeline configuration, see the Platform Automation
documentation.

A fully-configured Tanzu Operations Manager foundation

A fully-configured BOSH Director instance

24
Healthwatch for VMware Tanzu

The Platform Automation Toolkit Docker image imported into Docker. For more information, see the
Platform Automation documentation.

Download and Install a Tile Using Platform Automation


As you follow these instructions, refer to the Platform Automation documentation for details.

To download and install a tile using Platform Automation:

1. Create a configuration file for the download-product task of your automated pipeline. This
configuration file fetches the tile you want to install from Broadcom Support portal. Copy and paste
one of the following sets of properties into your configuration file.

For Healthwatch:

---
pivnet-api-token: token
pivnet-file-glob: "healthwatch-[^pas|pks].*pivotal"
pivnet-product-slug: p-healthwatch
product-version-regex: 2.2.*

For Healthwatch Exporter for Tanzu Platform for Cloud Foundry:

---
pivnet-api-token: token
pivnet-file-glob: "healthwatch-pas-*.pivotal"
pivnet-product-slug: p-healthwatch
product-version-regex: 2.2.*

For Healthwatch Exporter for TKGI:

---
pivnet-api-token: token
pivnet-file-glob: "healthwatch-pks-*.pivotal"
pivnet-product-slug: p-healthwatch
product-version-regex: 2.2.*

2. Upload and stage the tile to the Tanzu Ops Manager Installation Dashboard by adding the upload-
stemcell and upload-and-stage-product jobs to your configuration file.

Configure and Deploy your Tile Using the om CLI


After you download and install your tile, you can configure and deploy it using the om CLI.

Unless you are an advanced user, VMware recommends configuring and deploying your tile manually the
first time, so you can see which properties in the automation script map to the configuration settings
present in the tile UI before you modify and re-deploy the tile with the om CLI. For more information about
configuring and deploying tiles with the om CLI, see the om repository on GitHub.

The following is an example of an automation script that configures and deploys the Healthwatch tile:

product-name: p-healthwatch2
product-properties:
.properties.scrape_configs:

25
Healthwatch for VMware Tanzu

value:
- ca: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
scrape_job: |
job_name: foundation1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:9090"
- "5.6.7.8:9090"
server_name: pasexporter
tls_certificates:
cert_pem: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
private_key_pem: |
-----BEGIN RSA PRIVATE KEY-----
SECRET
-----END RSA PRIVATE KEY-----
- ca: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
scrape_job: |
job_name: foundation2
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "9.10.11.12:9090"
server_name: pasexporter
tls_certificates:
cert_pem: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
private_key_pem: |
-----BEGIN RSA PRIVATE KEY-----
SECRET
-----END RSA PRIVATE KEY-----
.properties.enable_basic_auth:
selected_option: enabled
value: enabled
.properties.grafana_authentication:
selected_option: uaa
value: uaa
.tsdb.canary_exporter_port:
value: 9115
.tsdb.scrape_interval:
value: 15s
network-properties:
network:
name: subnet1
other_availability_zones:
- name: us-central1-f

26
Healthwatch for VMware Tanzu

- name: us-central1-c
- name: us-central1-b
singleton_availability_zone:
name: us-central1-f
resource-config:
pxc:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 5
pxc-proxy:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 5
tsdb:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 1
grafana:
instances: automatic
persistent_disk:
size_mb: automatic
instance_type:
id: automatic
internet_connected: true
max_in_flight: 5
errand-config:
smoke-test:
post-deploy-state: true
update-admin-password:
post-deploy-state: true

27
Healthwatch for VMware Tanzu

Configuring

The topics in this section describe how to configure the Healthwatch for VMware Tanzu, Healthwatch
Exporter for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware Tanzu
Kubernetes Grid Integrated Edition (TKGI) tiles:

Configuring Healthwatch

Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry

Configuring Healthwatch Exporter for TKGI

Multi-Foundation Monitoring

Configuring Multi-Foundation Monitoring

Configuring Direct Scraping for Multi-Foundation Monitoring

Configuring Federation for Multi-Foundation Monitoring

Configuring TKGI Cluster Discovery

Configuring DNS for the Grafana Instance

Creating a Firewall Policy for the Grafana Instance

Configuring Grafana Authentication

Optional Configuration

Configuring Alerting

Monitoring Certificate Expiration

Configuring Authentication with a UAA Instance on a Different Operations Manager


Foundation

Configuration File Reference Guide

Configuring Healthwatch
This topic describes how to manually configure and deploy the Healthwatch for VMware Tanzu tile.

To install, configure, and deploy Healthwatch through an automated pipeline, see Installing,
Configuring, and Deploying a Tile Through an Automated Pipeline.

The Healthwatch tile monitors metrics across one or more Tanzu Operations Manager foundations by
scraping metrics from Healthwatch Exporter tiles installed on each foundation. For more information about
the architecture of the Healthwatch tile, see Healthwatch Tile in Healthwatch Architecture.

28
Healthwatch for VMware Tanzu

After installing Healthwatch, you configure Healthwatch component VMs, including the configuration files
associated with them, through the tile UI. You can also configure errands and system logging, scale VM
instances up or down, and configure load balancers for multiple VM instances.

To quickly deploy the Healthwatch tile to ensure that it deploys successfully before you fully configure it,
you only need to configure the Assign AZ and Networks pane. What follows is an overview of the
configure and deploy procedure.

To configure and deploy the Healthwatch tile:

1. Go to the Healthwatch tile in the Tanzu Ops Manager Installation Dashboard. For more information,
see Configure to the Healthwatch Tile.

2. Assign jobs to your Availability Zones (AZs) and networks. For more information, see Assign AZs
and Networks.

3. Configure the Prometheus pane. For more information, see Configure Prometheus.

4. (Optional) Configure the Alertmanager pane. For more information, see (Optional) Configure
Alertmanager.

5. (Optional) Configure the Grafana pane. For more information, see (Optional) Configure Grafana.

6. (Optional) Configure the Grafana Authentication pane. For more information, see (Optional)
Configure Grafana Authentication.

7. (Optional) Configure the Grafana Dashboards pane. For more information, see (Optional) Configure
Grafana Dashboards.

8. (Optional) Configure the Canary URLs pane. For more information, see (Optional) Configure Canary
URLs.

9. (Optional) Configure the Remote Write pane. For more information, see (Optional) Configure
Remote Write.

10. (Optional) Configure the TKGI Cluster Discovery pane. For more information, see (Optional)
Configure TKGI Cluster Discovery.

11. (Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands.

12. (Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog.

13. (Optional) Configure the Resource Config pane. For more information, see (Optional) Configure
Resources.

14. (Optional) Configure for OpenTelemetry. For more information, see (Optional) Configure for
OpenTelemetry.

15. Deploy Healthwatch. For more information, see Deploy Healthwatch.

After you have configured and deployed the Healthwatch tile, you can configure and deploy the Healthwatch
Exporter tiles for the Tanzu Operations Manager foundations you want to monitor. For more information, see
Next Steps.

Configure the Healthwatch Tile


To navigate to the Healthwatch tile:

1. Go to the Tanzu Ops Manager Installation Dashboard.

29
Healthwatch for VMware Tanzu

2. Click the Healthwatch tile.

Assign AZs and Networks


In the Assign AZ and Networks pane, you assign jobs to your AZs and networks.

To configure the Assign AZ and Networks pane:

1. Select Assign AZs and Networks.

2. Under Place singleton jobs in, select the first AZ. Tanzu Operations Manager runs any job with a
single instance in this AZ.

3. Under Balance other jobs in, select one or more other AZs. Tanzu Operations Manager balances
instances of jobs with more than one instance across the AZs that you specify.

4. From the Network dropdown, select the runtime network that you created when configuring the
BOSH Director tile.

5. Click Save.

Configure Prometheus
In the Prometheus pane, you configure the Prometheus instance in the Healthwatch tile to scrape metrics
from the Healthwatch Exporter tiles installed on each Tanzu Operations Manager foundation, as well as any
external services or databases from which you want to collect metrics.

The values that you configure in the Prometheus pane also configure their corresponding properties in the
Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch,
Prometheus Configuration Options, and the Prometheus documentation.

To configure the Prometheus pane:

1. Select Prometheus.

2. For Scrape interval, specify the frequency at which you want the Prometheus instance to scrape
Prometheus exposition endpoints for metrics. The Prometheus instance scrapes all Prometheus
exposition endpoints at once through a global scrape. You can enter a value string that specifies
ns, us, µs, ms, s, m, or h. To scrape detailed metrics without consuming too much storage, VMware
recommends using the default value of 15s, (15 seconds).

3. (Optional) To configure the Prometheus instance to scrape metrics from the Healthwatch Exporter
tiles installed on other Tanzu Operations Manager foundations or from external services or
databases, configure additional scrape jobs under Additional scrape jobs. You can configure
scrape jobs for any app or service that exposes metrics using a Prometheus exposition format,
such as Concourse CI. For more information about Prometheus exposition formats, see the
Prometheus documentation.

The Prometheus instance automatically discovers and scrapes Healthwatch


Exporter tiles that are installed on the same Tanzu Operations Manager foundation
as the Healthwatch tile. You do not need to configure scrape jobs for these
Healthwatch Exporter tiles. You only need to configure scrape jobs for Healthwatch
Exporter tiles that are installed on other Tanzu Operations Manager foundations.

30
Healthwatch for VMware Tanzu

1. Click Add.

2. For Scrape job configuration parameters, provide the configuration YAML for the scrape
job you want to configure. This job can use any of the properties defined by Prometheus
except those in the tls_config section. Do not prefix the configuration YAML with a dash.
For example:

job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:9090"
- "5.6.7.8:9090"

For more information, see the Prometheus documentation.

For the job_name property, do not use the following job names:
Healthwatch-view-pas-exporter

Healthwatch-view-pks-exporter

tsdb

grafana

pks-master-kube-scheduler

pks-master-kube-controller-manager

3. (Optional) To allow the Prometheus instance to communicate with the server for your
external service or database over TLS:

1. For Certificate and private key for TLS, provide a certificate and private key for
the Prometheus instance to use for TLS connections to the server for your external
service or database.

2. For CA certificate for TLS, provide a certificate for the certificate authority (CA)
that the server for your external service or database uses to verify TLS
certificates.

3. For Target server name, enter the name of the server for your external service or
database as it appears on the server’s TLS certificate.

4. If the certificate you provided in Certificate and private key for TLS is signed by
a self-signed CA certificate or a certificate that is signed by a self-signed CA
certificate, select the Skip TLS certificate verification checkbox. When this
checkbox is selected, the Prometheus instance does not verify the identity of the
server for your external service or database. This checkbox is unselected by
default.

4. For Chunk size to calculate Diego_AvailableFreeChunksDisk SVM, enter in MB the


size that you want to specify for free chunks of disk. The default value is 6144.
Healthwatch uses this free chunk size to calculate the available free disk chunks super

31
Healthwatch for VMware Tanzu

value metric (SVM), which it then uses to calculate the Diego_AvailableFreeChunksDisk


metric. If you configure Healthwatch Exporter for Tanzu Platform for Cloud Foundry to
deploy the SVM Forwarder VM, the SVM Forwarder VM sends
theDiego_AvailableFreeChunksDisk metric back into the Loggregator Firehose so third-
party nozzles can send it to external destinations, such as a remote server or external
aggregation service. For more information about SVMs, see SVM Forwarder VM - Platform
Metrics and SVM Forwarder VM - Healthwatch Component Metrics in Healthwatch Metrics.
For more information about deploying the SVM Forwarder VM, see (Optional) Configure
Resources in Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

If you are using the OpenTelemetry Collector, this step does not apply.

5. For Chunk size to calculate Diego_AvailableFreeChunksMemory SVM, enter the size


that you want to specify for free chunks of memory, in MB. The default value is 4096.
Healthwatch uses this free chunk size to calculate the available free memory chunks SVM,
which it then uses to calculate the Diego_AvailableFreeChunksMemory metric. If you
configure Healthwatch Exporter for Tanzu Platform for Cloud Foundry to deploy the SVM
Forwarder VM, the SVM Forwarder VM sends the Diego_AvailableFreeChunksMemory
metric back into the Loggregator Firehose so third-party nozzles can send it to external
destinations, such as a remote server or external aggregation service. For more information
about SVMs, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM -
Healthwatch Component Metrics in Healthwatch Metrics. For more information about
deploying the SVM Forwarder VM, see (Optional) Configure Resources in Configuring
Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

If you are using the OpenTelemetry Collector, this step does not apply.

4. (Optional) For Static IP addresses for Prometheus VMs, enter a comma-separated list of valid
static IP addresses that you want to reserve for the Prometheus instance. You must enter a
separate IP address for each VM in the Prometheus instance. These IP addresses must not be
within the reserved IP ranges you configured in the BOSH Director tile. To find the IP addresses of
the Prometheus VMs:

1. Select the Status tab.

2. In the TSDB row, record the IP addresses of each Prometheus VM from the IPs column.

The Prometheus instance includes two VMs by default. For more information
about viewing or scaling your VMs, see Healthwatch Components and Resource
Requirements.

5. Click Save.

(Optional) Configure Alertmanager


In the Alertmanager pane, you configure alerting for Healthwatch. To configure alerting for Healthwatch,
you configure the alerting rules that Alertmanager follows and the alert receivers to which Alertmanager
sends alerts.

32
Healthwatch for VMware Tanzu

To configure the Alertmanager pane, see Configuring Alerting.

(Optional) Configure Grafana


In the Grafana pane, you configure the route for the Grafana UI. You can also configure email alerts and
HTTP and HTTPS proxy request settings for the Grafana instance.

The values that you configure in the Grafana pane also configure their corresponding properties in the
Grafana configuration file. For more information, see Overview of Configuration Files in Healthwatch,
Grafana, and the Grafana documentation.

To configure the Grafana pane:

1. Select Grafana.

2. Under Grafana UI route, configure the route used to access the Grafana UI by selecting one of the
following options:

Automatically configure in TAS for VMs: If you are installing Healthwatch on an Tanzu
Operations Manager foundation with Tanzu Platform for Cloud Foundry installed,
Healthwatch automatically configures a route for the Grafana UI in Tanzu Platform for
Cloud Foundry. VMware recommends selecting this option when available. You access the
Grafana UI by navigating to https://grafana.sys.DOMAIN in a browser window, where
DOMAIN is the system domain you configured in the Domains pane of the Tanzu Platform
for Cloud Foundry tile. For more information, see the Tanzu Platform for Cloud Foundry
documentation.

Manually configure: Reveals the configuration fields described in the following steps,
where you manually configure the URL and TLS settings for the Grafana UI. To manually
configure the URL and TLS settings for the Grafana UI:

1. For Grafana root URL, enter the URL used to access the Grafana UI. Configuring
this field allows a generic OAuth provider or UAA to redirect users to the Grafana
UI. Alertmanager also uses this URL to generate links to the Grafana UI in alert
messages.

Healthwatch does not automatically assign a default root URL to


the Grafana UI. You must manually configure a root URL for the
Grafana UI in the Grafana root URL field.

After you deploy the Healthwatch tile for the first time, you must configure a DNS
entry for the Grafana instance in the console for your IaaS using this root URL and
the IP address of either the Grafana VMs or the load balancer associated with the
Grafana instance. The Grafana instance listens on either port 443 or 80, depending
on whether you provide a TLS certificate in the following Certificate and private
key for HTTPS fields. For more information about configuring DNS entries for the
Grafana instance, see Configuring DNS for the Grafana Instances.

2. (Optional) To allow HTTPS connections to one or more Grafana instances, you


must provide a certificate and private key for the Grafana instance to use for TLS
connections in Certificate and private key for HTTPS.
VMware recommends also providing a certificate signed by a trusted third-party CA

33
Healthwatch for VMware Tanzu

in CA certificate for HTTPS. You can generate a self-signed certificate using the
Tanzu Operations Manager root CA, but if you do, your browser warns you that
your CA is invalid every time you access the Grafana UI.

To use a certificate signed by a trusted third-party CA:

1. In Certificate and private key for HTTPS, provide a certificate


and private key for the Grafana instance to use for TLS
connections.

2. In CA certificate for HTTPS, provide the certificate of the third-


party CA that signs the certificate you provided in the previous
step.

To generate a self-signed certificate from the Tanzu Operations Manager


root CA:

1. Under Certificate and private key for HTTPS, click Change.

2. Click Generate RSA Certificate.

3. In the Generate RSA Certificate pop-up window, enter *.DOMAIN,


where DOMAIN is the domain of the DNS entry that you configured
for the Grafana instance. For example, if the DNS entry you
configured for the Grafana instance is grafana.example.com,
enter *.example.com. For more information about configuring a
DNS entry for the Grafana instance, see Configuring DNS for the
Grafana Instance.

4. Click Generate.

5. Go to the Tanzu Ops Manager Installation Dashboard.

6. From the dropdown in the upper-right corner of the Tanzu Ops


Manager Installation Dashboard, click Settings.

7. Select Advanced Options.

8. Click DOWNLOAD ROOT CA CERT.

9. Return to the Tanzu Ops Manager Installation Dashboard.

10. Click the Healthwatch tile.

11. Select Grafana.

12. For Certificate and private key for HTTPS, provide the Tanzu
Operations Manager root CA certificate that you downloaded in a
previous step.

3. (Optional) To configure an additional cipher suite for TLS connections to the


Grafana instance, enter a comma-separated list of ciphers in Additional ciphers
for TLS. For a list of supported cipher suites, see cipher_suites.go in the Go
repository on GitHub.

3. Under Grafana email alerts, choose whether to configure email alerts from the Grafana instance.
VMware recommends using Alertmanager to configure and manage alerts in Healthwatch. If you

34
Healthwatch for VMware Tanzu

require additional or alternative alerts, you can configure the SMTP server for the Grafana instance
to send email alerts.

To allow email alerts from the Grafana instance:


1. Select Configure.

2. For SMTP server host name, enter the host name of your SMTP server.

3. For SMTP server port, enter the port of your SMTP server.

4. For SMTP server username, enter your SMTP authentication username.

5. For SMTP server password, enter your SMTP authentication password.

6. (Optional) To allow the Grafana instance to skip TLS certificate verification when
communicating with your SMTP server over TLS, select the Skip TLS certificate
verification checkbox. When this checkbox is selected, the Grafana instance
does not verify the identity of your SMTP server. This checkbox is unselected by
default.

7. For From address, enter the sender email address that appears on outgoing email
alerts.

8. For From name, enter the sender name that appears on outgoing email alerts.

9. For EHLO client ID, enter the name for the client identity that your SMTP server
uses when sending EHLO commands.

10. For Certificate and private key for TLS, enter a certificate and private key for the
Grafana instance to use for TLS connections to your SMTP server.

To disallow email alerts from the Grafana instance, select Do not configure. Email alerts
are disallowed by default. For more information, see the Grafana documentation.

4. Under HTTP and HTTPS proxy request settings, you choose whether to allow the Grafana
instance to make HTTP and HTTPS requests through proxy servers:

You need to configure proxy settings only if you are deploying Healthwatch in an
air-gapped environment and want to configure alert channels to external
addresses, such as the external Slack webhook.

To allow the Grafana instance to make HTTP and HTTPS requests through a proxy server:
1. Select Configure.

2. For HTTP proxy URL, enter the URL for your HTTP proxy server. The Grafana
instance sends all HTTP and HTTPS requests to this URL, except those from
hosts you configure in the HTTPS proxy URL and Excluded hosts fields.

3. For HTTPS proxy URL, enter the URL for your HTTPS proxy server. The Grafana
instance sends all HTTPS requests to this URL, except those from hosts you
configure in the Excluded hosts field.

4. For Excluded hosts, enter a comma-separated list of the hosts you want to
exclude from proxying. VMware recommends including *.bosh and the range of
your internal network IP addresses so the Grafana instance can still access the
Prometheus instance without going though the proxy server. For example,

35
Healthwatch for VMware Tanzu

*.bosh,10.0.0.0/8,*.example.com allows the Grafana instance to access all


BOSH DNS addresses and all internal network IP addresses containing
10.0.0.0/8 or *.example.com directly, without going though the proxy server.

To disallow the Grafana instance from making HTTP and HTTPS requests through proxy
servers, select Do not configure. HTTP and HTTPS proxy requests are disallowed by
default.

5. (Optional) For Static IP addresses for Grafana VMs, enter a comma-separated list of valid static
IP addresses that you want to reserve for the Grafana instance. These IP addresses must not be
within the reserved IP ranges you configured in the BOSH Director tile.

6. (Optional) If you want to use Grafana legacy alerting instead of new Grafana Alerting, select the
Opt out of Grafana Alerting checkbox. Please note that this will delete any alerts and changes
made in Grafana Alerting.

7. (Optional) If you want to disable the gravatar, select the Disable gravatar checkbox.

8. (Optional) To log all access to Grafana, select the Enable router logging checkbox. This allows
auditing of all traffic into the system.

9. Click Save.

(Optional) Configure Grafana Authentication


In the Grafana Authentication pane, you configure how users log in to the Grafana UI.

To configure the Grafana Authentication pane, see Configuring Grafana Authentication.

(Optional) Configure Grafana Dashboards


In the Grafana Dashboards pane, you configure which dashboards the Grafana instance creates in the
Grafana UI. The Grafana instance can create dashboards for metrics from Tanzu Platform for Cloud
Foundry, TKGI, VMware Tanzu SQL with MySQL for VMs (Tanzu SQL for VMs), and VMware Tanzu
RabbitMQ for VMs (Tanzu RabbitMQ). For more information about these dashboards, see Default
Dashboards in the Grafana UI in Using Healthwatch Dashboards in the Grafana UI.

To configure the Grafana Dashboards pane:

1. Select Grafana Dashboards.

2. Under TAS for VMs, select one of the following options:

Include: The Grafana instance creates dashboards in the Grafana UI for metrics from
Tanzu Platform for Cloud Foundry. To specify the version of Tanzu Platform for CF for
which you want the Grafana instance to create dashboards, use the Version selector.
Select one of the following options:

The version of Tanzu Platform for Cloud Foundry that is installed on your Tanzu
Operations Manager foundation.

Auto-detect: The Grafana instance automatically discovers and creates


dashboards for the version of Tanzu Platform for Cloud Foundry that is installed on
your Tanzu Operations Manager foundation.

36
Healthwatch for VMware Tanzu

If you choose to include Tanzu Platform for Cloud Foundry dashboards,


you must configure Tanzu Platform for CF to forward system metrics to
the Loggregator Firehose. Otherwise, no metrics appear in the Router
dashboard in the Grafana UI. For more information, see Troubleshooting
Missing Router Metrics in Troubleshooting Healthwatch.

Exclude: The Grafana instance does not create dashboards in the Grafana UI for metrics
from Tanzu Platform for Cloud Foundry.

3. Under TKGI, select one of the following options:

Include: The Grafana instance creates dashboards in the Grafana UI for metrics from
TKGI. To specify the version of TKGI for which you want the Grafana instance to create
dashboards, use the Version dropdown. Select one of the following options:
The version of TKGI that is installed on your Tanzu Operations Manager
foundation.

Auto-detect: The Grafana instance automatically discovers and creates


dashboards for the version of TKGI that is installed on your Tanzu Operations
Manager foundation.

Exclude: The Grafana instance does not create dashboards in the Grafana UI for metrics
from TKGI.

4. Under Tanzu SQL for VMs, select one of the following options:

Include: The Grafana instance creates a dashboard in the Grafana UI for metrics from
Tanzu SQL for VMs.

Exclude: The Grafana instance does not create a dashboard in the Grafana UI for metrics
from Tanzu SQL for VMs.

5. Under Tanzu RabbitMQ, select one of the following options:

Include: The Grafana instance creates dashboards in the Grafana UI for metrics from
Tanzu RabbitMQ.

If you choose to include Tanzu RabbitMQ dashboards, set the Metrics


polling interval field in the Tanzu RabbitMQ tile to -1. This prevents the
Tanzu RabbitMQ tile from sending duplicate metrics to the Loggregator
Firehose. To configure this field, see the Tanzu RabbitMQ documentation.

Exclude: The Grafana instance does not create dashboards in the Grafana UI for metrics
from Tanzu RabbitMQ.

(Optional) Configure Canary URLs


In the Canary URLs pane, you configure target URLs to which the Blackbox Exporters in the Prometheus
instance sends canary tests. Testing a canary target URL allows you to gauge the overall health and
accessibility of an app, runtime, or deployment.

The Canary URLs pane configures the Blackbox Exporters in the Prometheus instance. For more
information, see the Blackbox exporter repository on GitHub.

37
Healthwatch for VMware Tanzu

The Blackbox Exporters in the Prometheus instance run canary tests on the fully-qualified domain name
(FQDN) of your Tanzu Operations Manager deployment by default. The results from these canary tests
appear in the Ops Manager Health dashboard in the Grafana UI.

To configure the Canary URLs pane:

1. Select Canary URLs.

2. For Port, specify the port that the Blackbox Exporter exposes to the Prometheus instance. The
default port is 9115. You do not need to specify a different port unless port 9115 is already in use
on the Prometheus instance.

3. (Optional) Under Additional target URLs, you can configure additional canary target URLs. The
Prometheus instance runs continuous canary tests to these URLs and records the results. To
configure additional canary target URLs:

1. Click Add.

2. For URL, enter the URL to which you want the Prometheus instance to send canary tests.

The Prometheus instance automatically creates scrape jobs for these URLs. You
do not need to create additional scrape jobs for them in the Prometheus pane.

4. Click Save.

(Optional) Configure Remote Write


In the Remote Write pane, you can configure the Prometheus instance to write to remote storage, in
addition to its local time series database (TSDB). Healthwatch stores monitoring data for six weeks before
deleting it. Configuring remote write allows Healthwatch to store data that is older than six weeks in a
remote database or storage endpoint. For a list of compatible remote databases and storage endpoints, see
the Prometheus documentation.

The values that you configure in the Remote Write pane also configure their corresponding properties in the
Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch,
Remote Write, and the Prometheus documentation.

To configure the Remote Write pane:

1. Select Remote Write.

2. Under Remote Write destinations, click Add.

3. For Remote storage URL, enter the URL for your remote storage endpoint. For example,
https://REMOTE-STORAGE-FQDN, where REMOTE-STORAGE-FQDN is the FQDN of your remote
storage endpoint.

4. In Remote timeout, enter in seconds the amount of time that the Prometheus VM tries to make a
request to your remote storage endpoint before the request fails.

5. If your remote storage endpoint requires a username and password for login, configure the following
fields:

1. For Remote storage username, enter the username that the Prometheus instance uses to
log in to your remote storage endpoint.

38
Healthwatch for VMware Tanzu

2. For Remote storage password, enter the password that the Prometheus instance uses to
log in to your remote storage endpoint.

If you configure a username and password for the Prometheus instance to


use when logging in to your remote storage endpoint, you cannot also
configure a bearer token.

6. If your remote storage endpoint requires a bearer token for log in, enter the bearer token that the
Prometheus instance uses to log in to your remote storage endpoint in Bearer token.

If you configure a bearer token for the Prometheus instance to use when logging in
to your remote storage endpoint, you cannot also configure a username and
password.

7. (Optional) To allow the Prometheus instance to communicate with the server for your remote
storage endpoint over TLS:

1. For Certificate and private key for TLS, provide a certificate and private key for the
Prometheus instance to use for TLS connections to your remote storage endpoint.

2. For CA certificate for TLS, provide the certificate for the CA that the server for your
remote storage endpoint uses to verify TLS certificates.

3. For Remote storage server name, enter the name of the server for your remote storage
endpoint as it appears on the server’s TLS certificate.

4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected, the
Prometheus instance does not verify the identity of the server for your remote storage
endpoint. This checkbox is unselected by default.

8. (Optional) To allow the Prometheus instance to make HTTP or HTTPS requests to your remote
storage endpoint through a proxy server, enter the URL for your proxy server in Proxy URL.

9. You can configure more granular settings for writing to your remote storage endpoint by specifying
additional parameters for the shards containing in-memory queues that read from the write-ahead
log in the Prometheus instance. To configure additional parameters for these shards:

1. For Queue capacity, enter how many samples your remote storage endpoint can queue in
memory per shard before the Prometheus instance blocks the queue from reading from the
write-ahead log.

2. For Minimum shards per queue, enter the minimum number of shards the Prometheus
instance can use for each remote write queue. This number is also the number of shards
the Prometheus instance uses when remote write begins after each deployment of the
Healthwatch tile.

3. For Maximum shards per queue, enter the maximum number of shards the Prometheus
instance can use for each remote write queue.

4. For Maximum samples per send, enter the maximum number of samples the Prometheus
instance can send to a shard at a time.

39
Healthwatch for VMware Tanzu

5. For Maximum batch wait time, enter in seconds the maximum amount of time the
Prometheus instance can wait before sending a batch of samples to a shard, whether that
shard has reached the limit configured in Maximum samples per send or not.

6. For Minimum backoff time, enter in milliseconds the minimum amount of time the
Prometheus instance can wait before retrying a failed request to your remote storage
endpoint.

7. For Maximum backoff time, enter in milliseconds the maximum amount of time the
Prometheus instance can wait before retrying a failed request to your remote storage
endpoint.
For more information about configuring these queue parameters, see the Prometheus
documentation.

10. Click Save.

(Optional) Configure TKGI Cluster Discovery


In the TKGI Cluster Discovery pane, you configure TKGI cluster discovery for Healthwatch. You need to
configure this pane only if you have Tanzu Operations Manager foundations with TKGI installed.

To configure TKGI cluster discovery, see Configuring TKGI Cluster Discovery.

(Optional) Configure Errands


Errands are scripts that Tanzu Operations Manager runs automatically when it installs or uninstalls a
product, such as a new version of Healthwatch. There are two types of errands: post-deploy errands run
after the product is installed, and pre-delete errands run before the product is uninstalled. However, there are
no pre-delete errands for Healthwatch.

By default, Tanzu Operations Manager always runs all errands.

In the Errands pane, you can select On to always run an errand or Off to never run it.

For more information about how Tanzu Operations Manager manages errands, see the Tanzu Operations
Manager documentation.

To configure the Errands pane:

1. Select Errands.

2. (Optional) Choose whether to always run or never run the following errands:

Smoke Test Errand: Verifies that the Grafana and Prometheus instances are running.

Update Grafana Admin Password: Updates the administrator password for the Grafana
UI.

3. Click Save.

(Optional) Configure Syslog


In the Syslog pane, you can configure system logging in Healthwatch to forward log messages from
Healthwatch component VMs to an external destination for troubleshooting, such as a remote server or
external syslog aggregation service.

To configure the Syslog pane:

40
Healthwatch for VMware Tanzu

1. Select Syslog.

2. Under Do you want to configure Syslog forwarding?, select one of the following options:

No, do not forward Syslog: Disallows syslog forwarding.

Yes: Allows syslog forwarding and allows you to edit the following configuration fields.

3. For Address, enter the IP address or DNS domain name of your external destination.

4. For Port, enter a port on which your external destination listens.

5. For Transport Protocol, select TCP or UDP from the dropdown. This determines which transport
protocol Healthwatch uses to forward system logs to your external destination.

6. (Optional) To transmit logs over TLS:

1. Select the Enable TLS checkbox. This checkbox is unselected by default.

2. For Permitted Peer, enter either the name or SHA1 fingerprint of the remote peer.

3. For SSL Certificate, enter the TLS certificate for your external destination.

7. (Optional) For Enviroment identifier, enter an identifier for your environment. This identifier is
included in the log lines.

8. (Optional) For Queue Size, specify the number of log messages Healthwatch can hold in a buffer
at a time before sending them to your external destination. The default value is 100000.

9. (Optional) To forward debug logs to your external destination, select the Forward Debug Logs
checkbox. This checkbox is unselected by default.

10. (Optional) To specify a custom syslog rule, enter it in Custom rsyslog configuration in
RainerScript syntax. For more information about custom syslog rules, see the Tanzu Platform for
Cloud Foundry documentation. For more information about RainerScript syntax, see the rsyslog
documentation.

11. Click Save Syslog Settings.

(Optional) Configure Resources


In the Resource Config pane, you can scale Healthwatch component VMs up or down according to the
needs of your deployment, and you can associate load balancers with a group of VMs. For example, you
can scale the persistent disk size of the Prometheus instance to allow longer data retention.

To configure the Resource Config pane:

1. Select Resource Config.

2. (Optional) To scale a job, select an option from the dropdown for the resource you want to modify:

Instances: Configures the number of instances each job has.

VM Type: Configures the type of VM used in each instance.

Persistent Disk Type: Configures the amount of persistent disk space to allocate to the
job.

3. (Optional) To add a load balancer to a job:

1. Click the icon next to the job name.

41
Healthwatch for VMware Tanzu

2. For Load Balancers, enter the name of your load balancer.

3. Ensure that the Internet Connected checkbox is unselected. Activating this checkbox
gives VMs a public IP address that allows outbound Internet access.

4. Click Save.

(Optional) Configure for OpenTelemetry


Follow these steps to configure Healthwatch and receive data through the OpenTelemetry Collector. These
instructions apply for Tanzu Application Service 6.0, and Tanzu Platform for Cloud Foundry 10.x:

1. Go to the Healthwatch Exporter tile in the Tanzu Ops Manager Installation Dashboard.
1. Select Settings > Resource Config.
1. Set the TAS Counter Exporter and TAS Gauge Exporter configurations to O
because they are only used by FireHose.

2. Click Save.

2. Click the Credentials tab:


1. Click Link to Credential next to Healthwatch Exporter Client Mtls.

2. Save the certificate content. The certificate and key are used when configuring the
Open Telemetry collector.

2. Go to the Tanzu Ops Manager Installation Dashboard.


1. From the dropdown in the upper-right corner of the Tanzu Ops Manager Installation
Dashboard, click Settings.

2. Select Advanced Options.

3. Click DOWNLOAD ROOT CA CERT. The Tanzu Operations Manager root CA is used when
configuring the Open Telemetry collector.

3. Open the VMware Tanzu Platform for Cloud Foundry for VMs tile.

1. Click Settings and select System Logging.

1. You can disable the Enable V1 Firehose and Enable V2 Firehose configurations
if there are no other dependencies on firehose data. These fields don’t exist on
Tanzu Platform for CF.

2. Scroll down to the OpenTelemetry Collector Metric Exporters (beta)


configuration.

3. Scroll to the bottom of the Open Telemetry configuration and add a collector for
Healthwatch.

4. For TAS for VMs 6.0, Healthwatch expects a prometheus Open Telemetry collector
that supports mTLS and sends data on port 65331. For example:

prometheus/healthwatch:
endpoint: ":65331"
add_metric_suffixes: false
tls:
ca_pem: "CA-CERT"

42
Healthwatch for VMware Tanzu

cert_pem: "CERT_PEM"
key_pem: "PRIVATE_KEY_PEM"

Where:

CA-CERT is the Tanzu Operations Manager root CA.

CERT_PEM is the cert_pem of the Healthwatch OTel Mtls credential.

PRIVATE_KEY_PEM is the private_key_pem of the Healthwatch OTel Mtls


credential.

2. For Tanzu Platform for Cloud Foundry 10.0, you can configure certificates under
OpenTelemetry Collector Secrets and refer to them in the OTel configuration. For example:

exporters:
prometheus/healthwatch:
endpoint: ":65331"
add_metric_suffixes: false
tls:
ca_pem: '{{ .healthwatch.ca }}'
cert_pem: '{{ .healthwatch.cert }}'
key_pem: '{{ .healthwatch.key }}'
service:
pipelines:
metrics:
exporters:
- prometheus/healthwatch

Add Secrets:

1. Click Add next to OpenTelemetry Collector Secrets.

2. In Name, enter healthwatch.

3. For Certificate Authority enter CA-CERT from the Operations Manager root CA.

4. The Client Certificate PEM is the cert_pem of the Healthwatch OTel mtls credential.

5. The Client Ceritificate Private Key PEM is the private_key_pem of the Healthwatch OTel
Mtls credential.

6. Remove the newline character (\n) from the certificates you copy: awk
'{gsub(/\n/,"\n")}1' <file_name> or printf -- "<CERT_DATA>"

7. Click Save.

4. If you made changes to the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile
configuration in Settings > TAS for VMs Metric Exporter VMs > Filter out custom application
metrics, deploy your changes to Healthwatch as explained in the next section.

Deploy Healthwatch
To complete your installation of the Healthwatch tile:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Click Review Pending Changes.

43
Healthwatch for VMware Tanzu

3. Click Apply Changes.

For more information, see the Tanzu Operations Manager documentation.

Next Steps
After you have successfully installed the Healthwatch tile, continue to one of the following topics to
configure and deploy the Healthwatch Exporter tiles for the Tanzu Operations Manager foundations you
want to monitor:

If you have Tanzu Platform for Cloud Foundry installed on a Tanzu Operations Manager foundation
you want to monitor, see Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

If you have TKGI installed on an a Tanzu Operations Manager foundation you want to monitor, see
Configuring Healthwatch Exporter for TKGI.

Configuring Healthwatch Exporter for Tanzu Platform for


Cloud Foundry
This topic describes how to manually configure and deploy the Healthwatch Exporter tile for Tanzu Platform
for Cloud Foundry.

To install, configure, and deploy Healthwatch Exporter for Tanzu Platform for Cloud Foundry through an
automated pipeline, see Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.

When installed on an Tanzu Operations Manager foundation you want to monitor, Healthwatch Exporter for
Tanzu Platform for Cloud Foundry deploys metric exporter VMs to generate each type of metric related to
the health of your Tanzu Platform for CF deployment. Healthwatch Exporter for Tanzu Platform for Cloud
Foundry sends metrics through the Loggregator Firehose to a Prometheus exposition endpoint on the
associated metric exporter VMs. The Prometheus instance in your metrics monitoring system then scrapes
the exposition endpoints on the metric exporter VMs and imports those metrics into your monitoring
system. For more information about the architecture of the Healthwatch Exporter for Tanzu Platform for
Cloud Foundry tile, see Healthwatch Exporter for Tanzu Platform for Cloud Foundry in Healthwatch
Architecture.

After installing Healthwatch Exporter for Tanzu Platform for Cloud Foundry, you configure the metric
exporter VMs deployed by Healthwatch Exporter for Tanzu Platform for Cloud Foundry through the tile UI.
You can also configure errands and system logging, and you can scale VM instances up or down and
configure load balancers for multiple VM instances.

If you want to quickly deploy the Healthwatch Exporter for Tanzu Platform for Cloud
Foundry tile to ensure that it deploys successfully before you fully configure it, you only
need to configure the Assign AZ and Networks and BOSH Health Metric Exporter VM
panes.

To configure and deploy the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile:

1. Go to the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile in the Tanzu Ops
Manager Installation Dashboard.

2. Assign jobs to your availability zones (AZs) and networks. For more information, see Assign AZs
and Networks.

44
Healthwatch for VMware Tanzu

3. (Optional) Configure the TAS for VMs Metric Exporter VMs pane. For more information, see
(Optional) Configure Tanzu Platform for Cloud Foundry Metric Exporter VMs.

4. Configure the BOSH Health Metric Exporter VM pane. For more information, see Configure the
BOSH Health Metric Exporter VM.

5. (Optional) Configure the BOSH Deployment Metric Exporter VM pane. For more information, see
(Optional) Configure the BOSH Deployment Metric Exporter VM.

6. (Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands.

7. (Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog.

8. (Optional) Configure the Resource Config pane. For more information, see (Optional) Configure
Resources.

9. From the Tanzu Ops Manager Installation Dashboard, deploy the Healthwatch Exporter for Tanzu
Platform for Cloud Foundry tile. For more information, see Deploy Healthwatch Exporter for Tanzu
Platform for Cloud Foundry.

10. After you have finished installing, configuring, and deploying Healthwatch Exporter for Tanzu
Platform for Cloud Foundry, configure a scrape job for Healthwatch Exporter for Tanzu Platform for
Cloud Foundry in the Prometheus VM in your monitoring system. For more information, see
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

You don't need to configure a scrape job for installations of Healthwatch Exporter
for Tanzu Platform for Cloud Foundry that are on the same Tanzu Operations
Manager foundation as your Healthwatch for VMware Tanzu tile. The Prometheus
instance in the Healthwatch tile automatically discovers and scrapes Healthwatch
Exporter tiles that are installed on the same Tanzu Operations Manager foundation
as the Healthwatch tile.

Configure the Healthwatch Exporter for Tanzu Platform for


Cloud Foundry Tile
To start configuring the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile.

Assign AZs and Networks


In the Assign AZ and Networks pane, you assign jobs to your AZs and networks.

To configure the Assign AZ and Networks pane:

1. Select Assign AZs and Networks.

2. Under Place singleton jobs in, select the first AZ. Tanzu Operations Manager runs any job with a
single instance in this AZ.

3. Under Balance other jobs in, select one or more other AZs. Tanzu Operations Manager balances
instances of jobs with more than one instance across the AZs that you specify.

45
Healthwatch for VMware Tanzu

4. From the Network dropdown, select the runtime network that you created when configuring the
BOSH Director tile. For more information about Tanzu Platform for Cloud Foundry networks, see the
Tanzu Operations Manager documentation.

5. (Optional) If you want to assign jobs to a service network in addition to your runtime network, select
it from the Services Network dropdown. For more information about Tanzu Platform for Cloud
Foundry service networks, see the Tanzu Operations Manager documentation.

6. Click Save.

(Optional) Configure Tanzu Platform for Cloud Foundry Metric


Exporter VMs
In the TAS for VMs Metric Exporter VMs pane, you configure static IP addresses for the metric exporter
VMs that collect metrics from the Loggregator Firehose in Tanzu Platform for Cloud Foundry. There are two
metric exporter VMs that each collect a single metric type from the Loggregator Firehose: counter or gauge.
You can deploy one or both VMs. After generating these metrics, the metric exporter VMs convert them to a
Prometheus exposition format on a secured endpoint.

You can also deploy two other VMs: the Tanzu Platform for Cloud Foundry service level indicator (SLI)
exporter VM and the certificate expiration metric exporter VM.

The IP addresses you configure in the TAS for VMs Metric Exporter VMs pane must not
be within the reserved IP ranges you configured in the BOSH Director tile.

To configure the TAS for VMs Metric Exporter VMs pane:

1. Select TAS for VMs Metric Exporter VMs.

2. (Optional) For Static IP address for counter metric exporter VM, enter a valid static IP address
that you want to reserve for the counter metric exporter VM.

3. (Optional) For Static IP address for gauge metric exporter VM, enter a valid static IP address
that you want to reserve for the gauge metric exporter VM.

4. (Optional) For Static IP address for TAS for VMs SLI exporter VM, enter a valid static IP address
that you want to reserve for the Tanzu Platform for Cloud Foundry SLI exporter VM. The Tanzu
Platform for Cloud Foundry SLI exporter VM generates SLIs that allow you to monitor whether the
core functions of the Cloud Foundry Command-Line Interface (cf CLI) are working as expected. The
cf CLI allows developers to create and manage apps through Tanzu Platform for Cloud Foundry. For
more information, see Tanzu Platform for Cloud Foundry SLI Exporter VM in Healthwatch Metrics.

5. (Optional) For Static IP address for certificate expiration metric exporter VM, enter a valid static
IP address that you want to reserve for the certificate expiration metric exporter VM. The certificate
expiration metric exporter VM collects metrics that show when certificates in your Tanzu
Operations Manager deployment are due to expire. For more information, see Certificate Expiration
Metric Exporter VM and Monitoring Certificate Expiration.

If you have both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for TKGI installed on the same Tanzu Operations Manager
foundation, scale the certificate expiration metric exporter VM to zero instances in
the Resource Config pane in one of the Healthwatch Exporter tiles. Otherwise,

46
Healthwatch for VMware Tanzu

the two certificate expiration metric exporter VMs create redundant sets of
metrics.

6. (Optional) If your Tanzu Operations Manager deployment uses self-signed certificates, select the
Skip TLS certificate verification for certificate metric exporter VM checkbox. When this
checkbox is selected, the certificate expiration metric exporter VM does not verify the identity of
the Tanzu Operations Manager VM. This checkbox is unselected by default.

7. Under cf CLI version, select from the dropdown the version of the cf CLI that your Tanzu Platform
for Cloud Foundry deployment uses:

If you have TAS for VMs v4.0 or later, or Tanzu Platform for Cloud Foundry installed, select
CF CLI 8. This allows the SLI exporter VM to run SLI tests for cf CLI v8.

Tanzu Application Service versions 4.0 and later are supported. This includes
Tanzu Platform for Cloud Foundry 10.x+.

8. (Optional) If Metric Registrar is configured in your Tanzu Platform for Cloud Foundry tile, and
you do not want Healthwatch to scrape custom application metrics, select the Filter out custom
application metrics checkbox.

If you configured Healthwatch to receive data through the OpenTelemetry Collector


for Tanzu Application Service 6.0 and above, and you made changes to the Filter
out custom application metrics checkbox, you must apply the changes to the
Healthwatch tile. For detailed instructions, see Deploy Healthwatch.

9. Click Save.

Configure the BOSH Health Metric Exporter VM


In the BOSH Health Metric Exporter VM pane, you configure the AZ and VM type of the BOSH health
metric exporter VM. Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys the BOSH health
metric exporter VM, which creates a BOSH deployment called bosh-health every ten minutes. The bosh-
health deployment deploys another VM, bosh-health-check, that runs a suite of SLI tests to validate the
functionality of the BOSH Director. After the SLI tests are complete, the BOSH health metric exporter VM
collects the metrics from the bosh-health-check VM, then deletes the bosh-health deployment and the
bosh-health-check VM. For more information, see BOSH Health Metric Exporter VM in Healthwatch
Metrics.

To configure the BOSH Health Metric Exporter VM pane:

1. Select BOSH Health Metric Exporter VM.

2. Under Availability zone, select the AZ on which you want Healthwatch Exporter for Tanzu Platform
for Cloud Foundry to deploy the BOSH health metric exporter VM.

3. Under VM type, select from the dropdown the type of VM you want Healthwatch Exporter for Tanzu
Platform for Cloud Foundry to deploy.

4. Click Save.

47
Healthwatch for VMware Tanzu

If you have both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for TKGI installed on the same Tanzu Operations Manager
foundation, scale the BOSH health metric exporter VM to zero instances in the
Resource Config pane in one of the Healthwatch Exporter tiles. Otherwise, the
two sets of BOSH health metric exporter VM metrics cause a 401 error in your
BOSH Director deployment, and one set of metrics reports that the BOSH Director
is down in the Grafana UI. For more information, see BOSH Health Metrics Cause
Errors When Two Healthwatch Exporter Tiles Are Installed in Troubleshooting
Healthwatch.

(Optional) Configure the BOSH Deployment Metric Exporter VM


In the BOSH Deployment Metric Exporter VM pane, you configure the authentication credentials and a
static IP address for the BOSH deployment metric exporter VM. This VM checks every 30 seconds
whether any BOSH deployments other than the one created by the BOSH health metric exporter VM are
running. For more information, see BOSH Deployment Metric Exporter VM in Healthwatch Metrics.

To configure the BOSH Deployment Metric Exporter VM pane:

1. Select BOSH Deployment Metric Exporter VM.

2. (Optional) For UAA client credentials, enter the username and secret for the UAA client that the
BOSH deployment metric exporter VM uses to access the BOSH Director VM. For more
information, see Create a UAA Client for the BOSH Deployment Metric Exporter VM.

3. (Optional) For Static IP address for BOSH deployment metric exporter VM, enter a valid static
IP address that you want to reserve for the BOSH deployment metric exporter VM. This IP address
must not be within the reserved IP ranges you configured in the BOSH Director tile.

4. Click Save.

If you have both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for TKGI installed on the same Tanzu Operations Manager
foundation, scale the BOSH deployment metric exporter VM to zero instances in the
Resource Config pane in one of the Healthwatch Exporter tiles. Otherwise, the two BOSH
deployment metric exporter VMs create redundant sets of metrics.

Create a UAA Client for the BOSH Deployment Metric Exporter VM


To allow the BOSH deployment metric exporter VM to access the BOSH Director VM and view BOSH
deployments, you must create a new UAA client for the BOSH deployment metric exporter VM. The
procedure to create this UAA client differs depending on the authentication settings of your Tanzu
Operations Manager deployment.

To create a UAA client for the BOSH deployment metric exporter VM:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Record the IP address for the BOSH Director VM and the login and administrator credentials for the
BOSH Director UAA instance. For more information about internal authentication settings for your
Tanzu Operations Manager deployment, see the Tanzu Operations Manager documentation.

48
Healthwatch for VMware Tanzu

If your Tanzu Operations Manager deployment uses internal authentication:

1. Click the BOSH Director tile.

2. Select the Status tab.

3. Record the IP address in the IPs column of the BOSH Director row.

4. Select the Credentials tab.

5. In the Uaa Admin Client Credentials row of the BOSH Director section, click
Link to Credential.

6. Record the value of password. This value is the secret for Uaa Admin Client
Credentials.

7. Return to the Credentials tab.

8. In the Uaa Login Client Credentials row of the BOSH Director section, click Link
to Credential.

9. Record the value of password. This value is the secret for Uaa Login Client
Credentials.

If your Tanzu Operations Manager deployment uses SAML authentication:

1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.

2. Click Settings.

3. Select SAML Settings.

4. Select the Provision an Admin Client in the BOSH UAA checkbox.

5. Click Enable SAML Authentication.

6. Return to the Tanzu Ops Manager Installation Dashboard.

7. Click the BOSH Director tile.

8. Select the Status tab.

9. Record the IP address in the IPs column of the BOSH Director row.

10. Select the Credentials tab.

11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.

12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.

If your Tanzu Operations Manager deployment uses LDAP authentication:

1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.

2. Click Settings.

3. Select LDAP Settings.

4. Select the Provision an Admin Client in the BOSH UAA checkbox.

49
Healthwatch for VMware Tanzu

5. Click Enable LDAP Authentication.

6. Return to the Tanzu Ops Manager Installation Dashboard.

7. Click the BOSH Director tile.

8. Select the Status tab.

9. Record the IP address in the IPs column of the BOSH Director row.

10. Select the Credentials tab.

11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.

12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.

3. SSH into the Tanzu Operations Manager VM by following the procedure in the Tanzu Operations
Manager documentation.

4. Target the UAA instance for the BOSH Director by running:

uaac target https://BOSH-DIRECTOR-IP:8443 --skip-ssl-validation

Where BOSH-DIRECTOR-IP is the IP address for the BOSH Director VM that you recorded from the
Status tab in the BOSH Director tile in an earlier step.

5. Log in to the UAA instance:

If your Tanzu Operations Manager deployment uses internal authentication, log in to the
UAA instance by running:

uaac token owner get login -s UAA-LOGIN-CLIENT-SECRET

Where UAA-LOGIN-CLIENT-SECRET is the secret you recorded from the Uaa Login Client
Credentials row in the Credentials tab in the BOSH Director tile in an earlier step.

If your Tanzu Operations Manager deployment uses SAML or LDAP, log in to the UAA
instance by running:

uaac token client get bosh_admin_client -s BOSH-UAA-CLIENT-SECRET

Where BOSH-UAA-CLIENT-SECRET is the secret you recorded from the Uaa Bosh Client
Credentials row in the Credentials tab in the BOSH Director tile in a previous step.

6. When prompted, enter the UAA administrator client username admin and the secret you recorded
from the Uaa Admin Client Credentials row in the Credentials tab in the BOSH Director tile in a
previous step.

7. Create a UAA client for the BOSH deployment metric exporter VM by running:

uaac client add CLIENT-USERNAME \


--secret CLIENT-SECRET \
--authorized_grant_types client_credentials,refresh_token \
--authorities bosh.read \
--scope bosh.read

50
Healthwatch for VMware Tanzu

Where:

CLIENT-USERNAME is the username you want to set for the UAA client.

CLIENT-SECRET is the secret you want to set for the UAA client.

8. Return to the Tanzu Ops Manager Installation Dashboard.

9. Click the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile.

10. Select BOSH Deployment Metric Exporter VM.

11. For UAA client credentials, enter the username and secret for the UAA client you just created.

(Optional) Configure Errands


Errands are scripts that Tanzu Operations Manager runs automatically when it installs or uninstalls a
product, such as a new version of Healthwatch Exporter for Tanzu Platform for Cloud Foundry. There are
two types of errands: post-deploy errands run after the product is installed, and pre-delete errands run
before the product is uninstalled.

By default, Tanzu Operations Manager always runs all errands.

In the Errands pane, you can select On to always run an errand or Off to never run it.

For more information about how Tanzu Operations Manager manages errands, see the Tanzu Operations
Manager documentation.

To configure the Errands pane:

1. Select Errands.

2. (Optional) Choose whether to always run or never run the following errands:

Smoke Tests: Verifies that the metric exporter VMs are running.

Cleanup: Deletes any existing BOSH deployments created by the BOSH health metric
exporter VM for running SLI tests.

Remove CF SLI User: Deletes the user account that the Tanzu Platform for Cloud Foundry
SLI exporter VM creates to run the Tanzu Platform for Cloud Foundry SLI test suite. For
more information, see Tanzu Platform for Cloud Foundry SLI Exporter VM.

3. Click Save.

(Optional) Configure Syslog


In the Syslog pane, you can configure system logging in Healthwatch Exporter for Tanzu Platform for Cloud
Foundry to forward log messages from tile component VMs to an external destination for troubleshooting,
such as a remote server or external syslog aggregation service.

To configure the Syslog pane:

1. Select Syslog.

2. (Optional) Under Do you want to configure Syslog forwarding?, select one of the following
options:

No, do not forward Syslog: Disallows syslog forwarding.

51
Healthwatch for VMware Tanzu

Yes: Allows syslog forwarding and allows you to edit the configuration fields described
below.

3. For Address, enter the IP address or DNS domain name of your external destination.

4. For Port, enter a port on which your external destination listens.

5. For Transport Protocol, select TCP or UDP from the dropdown menu. This determines which
transport protocol Healthwatch Exporter for Tanzu Platform for Cloud Foundry uses to forward
system logs to your external destination.

6. (Optional) To transmit logs over TLS:

1. Select the Enable TLS checkbox. This checkbox is unselected by default.

2. In Permitted Peer, enter either the name or SHA1 fingerprint of the remote peer.

3. In SSL Certificate, enter the TLS certificate for your external destination.

7. (Optional) In Queue Size, specify the number of log messages Healthwatch Exporter for Tanzu
Platform for Cloud Foundry can hold in a buffer at a time before sending them to your external
destination. The default value is 100000.

8. (Optional) To forward debug logs to your external destination, select the Forward Debug Logs
checkbox. This checkbox is unselected by default.

9. (Optional) To specify a custom syslog rule, enter it in Custom rsyslog configuration in


RainerScript syntax. For more information about custom syslog rules, see the Tanzu Platform for
Cloud Foundry documentation. For more information about RainerScript syntax, see the rsyslog
documentation.

10. Click Save Syslog Settings.

(Optional) Configure Resources


In the Resource Config pane, you can scale VMs in Healthwatch Exporter for Tanzu Platform for Cloud
Foundry up or down according to the needs of your deployment, and you can associate load balancers with
a group of VMs. For example, you can scale the persistent disk size of a metric exporter VM to allow longer
data retention.

To configure the Resource Config pane:

1. Select Resource Config.

2. (Optional) To scale a job, select an option from the dropdown for the resource you want to modify:

Instances: Configures the number of instances each job has

VM Type: Configures the type of VM used in each instance

Persistent Disk Type: Configures the amount of persistent disk space to allocate to the
job

3. (Optional) To add a load balancer to a job:

1. Click the icon next to the job name.

2. In Load Balancers, enter the name of your load balancer.

52
Healthwatch for VMware Tanzu

3. Ensure that the Internet Connected checkbox is unselected. Selecting this checkbox
gives VMs a public IP address that allows outbound Internet access.

4. (Optional) The instance count for the SVM Forwarder VM is set to 0 by default. This VM emits
Healthwatch-generated super value metrics (SVMs) into the Loggregator Firehose. To deploy the
SVM Forwarder VM, increase the instance count by selecting from the Instances dropdown. You
do not need to deploy this VM unless you use a third-party nozzle that can export the SVMs to an
external system, such as a remote server or a syslog aggregation service. For more information
about the SVM Forwarder VM, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM
- Healthwatch Component Metrics in Healthwatch Metrics.

If you installed the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile
before installing the Healthwatch tile, you may need to re-deploy Healthwatch
Exporter after deploying the SVM Forwarder VM. For more information, see Deploy
Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

5. (Optional) Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys the counter and
gauge metric exporter VMs by default. If you do not want to collect both of these metric types, set
the instance count to 0 for the VMs associated with the metrics you do not want to collect.

6. Click Save.

Deploy Healthwatch Exporter forTanzu Platform for Cloud


Foundry
To complete your installation of the Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Click Review Pending Changes.

3. Click Apply Changes.

For more information, see the Tanzu Operations Manager documentation.

Configure a Scrape Job for Healthwatch Exporter for Tanzu


Platform for Cloud Foundry
After you have successfully deployed Healthwatch Exporter for Tanzu Platform for Cloud Foundry, you must
configure a scrape job in the Prometheus instance that exists within your metrics monitoring system. Follow
the procedure in one of the following sections, depending on which monitoring system you use:

If you monitor metrics using the Healthwatch tile on an Tanzu Operations Manager foundation, see
Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry in
Healthwatch.

You don't need to configure a scrape job for installations of Healthwatch Exporter
for Tanzu Platform for Cloud Foundry that are on the same Tanzu Operations
Manager foundation as your Healthwatch tile. The Prometheus instance in the
Healthwatch tile automatically discovers and scrapes Healthwatch Exporter tiles

53
Healthwatch for VMware Tanzu

that are installed on the same Tanzu Operations Manager foundation as the
Healthwatch tile.

If you monitor metrics using a service or database located outside your Tanzu Operations Manager
foundation, such as from an external TSDB, see Configure a Scrape Job for Healthwatch Exporter
for Tanzu Platform for Cloud Foundry in an External Monitoring System.

Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform


for Cloud Foundry in Healthwatch
To configure a scrape job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry in the Healthwatch
tile on your Tanzu Operations Manager foundation, see Configure Prometheus in Configuring Healthwatch.

Configure a Scrape Job for Healthwatch Exporter for Tanzu Platform


for Cloud Foundry in an External Monitoring System
To configure a scrape job for Healthwatch Exporter for Tanzu Platform for Cloud Foundry in a service or
database that is located outside your Tanzu Operations Manager foundation:

1. Open network communication paths from your external service or database to the metric exporter
VMs in Healthwatch Exporter for Tanzu Platform for Cloud Foundry. The procedure to open these
network paths differs depending on your Tanzu Operations Manager foundation’s IaaS. For a list of
TCP ports used by each metric exporter VM, see Required Networking Rules for Healthwatch
Exporter for Tanzu Platform for Cloud Foundry in Healthwatch Architecture.

2. In the scrape_config section of the Prometheus configuration file, create a scrape job for your
Tanzu Operations Manager foundation. Under static_config, specify the TCP ports of each
metric exporter VM as static targets for the IP address of your external service or database. For
example:

job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:8443"
- "1.2.3.4:25555"
- "1.2.3.4:443"
- "1.2.3.4:8082"

For more information about the scrape_config section of the Prometheus configuration file, see
the Prometheus documentation. For more information about the static_config section of the
Prometheus configuration file, see the Prometheus documentation.

Configuring Healthwatch Exporter for TKGI


This topic describes how to manually configure and deploy the Healthwatch Exporter for VMware Tanzu
Kubernetes Grid Integrated Edition (TKGI) tile.

To install, configure, and deploy Healthwatch Exporter for TKGI through an automated pipeline, see
Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.

54
Healthwatch for VMware Tanzu

When installed on a Tanzu Operations Manager foundation you want to monitor, Healthwatch Exporter for
TKGI deploys metric exporter VMs to generate service level indicators (SLIs) related to the health of your
TKGI deployment. The Prometheus instance in your metrics monitoring system then scrapes the
Prometheus exposition endpoints on the metric exporter VMs and imports those metrics into your
monitoring system. For more information about the architecture of the Healthwatch Exporter for TKGI tile,
see Healthwatch Exporter for TKGI in Healthwatch Architecture.

After installing Healthwatch Exporter for TKGI, you configure the metric exporter VMs deployed by
Healthwatch Exporter for TKGI through the tile UI. You can also configure errands and system logging, and
you can scale VM instances up or down and configure load balancers for multiple VM instances.

If you want to quickly deploy the Healthwatch Exporter for TKGI tile to ensure that it
deploys successfully before you fully configure it, you only need to configure the Assign
AZ and Networks and BOSH Health Metric Exporter VM panes.

To configure and deploy the Healthwatch Exporter for TKGI tile:

1. Go to the Healthwatch Exporter for TKGI tile in the Tanzu Ops Manager Installation Dashboard.

2. Assign jobs to your availability zones (AZs) and networks. For more information, see Assign AZs
and Networks.

3. (Optional) Configure the TKGI Metric Exporter VMs pane. For more information, see (Optional)
Configure TKGI and Certificate Expiration Metric Exporter VMs.

4. (Optional) Configure the TKGI SLI Exporter VM pane. For more information, see (Optional)
Configure TKGI SLI Exporter VMs.

5. Configure the BOSH Health Metric Exporter VM pane. For more information, see Configure the
BOSH Health Metric Exporter VM.

6. (Optional) Configure the BOSH Deployment Metric Exporter VM pane. For more information, see
(Optional) Configure the BOSH Deployment Metric Exporter VM.

7. (Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands.

8. (Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog.

9. (Optional) Configure the Resource Config pane. For more information, see (Optional) Configure
Resources.

10. From the Tanzu Ops Manager Installation Dashboard, deploy the Healthwatch Exporter for TKGI
tile. For more information, see Deploy Healthwatch Exporter for TKGI.

11. After you have finished installing, configuring, and deploying Healthwatch Exporter for TKGI,
configure a scrape job for Healthwatch Exporter for TKGI in the Prometheus instance in your
monitoring system. For more information, see Configure a Scrape Job for Healthwatch Exporter for
TKGI.

You don't need to configure a scrape job for installations of Healthwatch Exporter
for TKGI that are on the same Tanzu Operations Manager foundation as your
Healthwatch for VMware Tanzu tile. The Prometheus instance in the Healthwatch
tile automatically discovers and scrapes Healthwatch Exporter tiles that are

55
Healthwatch for VMware Tanzu

installed on the same Tanzu Operations Manager foundation as the Healthwatch


tile.

Configure the Healthwatch Exporter for TKGI Tile


To start configuring the Healthwatch Exporter for TKGI tile:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch Exporter for Tanzu Kubernetes Grid - Integrated tile.

Assign AZs and Networks


In the Assign AZ and Networks pane, you assign jobs to your AZs and networks.

To configure the Assign AZ and Networks pane:

1. Select Assign AZs and Networks.

2. Under Place singleton jobs in, select the first AZ. Tanzu Operations Manager runs any job with a
single instance in this AZ.

3. Under Balance other jobs in, select one or more other AZs. Tanzu Operations Manager balances
instances of jobs with more than one instance across the AZs that you specify.

4. From the Network dropdown, select the runtime network that you created when configuring the
BOSH Director tile. For more information about TKGI networks, see the Tanzu Operations Manager
documentation.

5. (Optional) If you want to assign jobs to a service network in addition to your runtime network, select
it from the Services Network dropdown. For more information about TKGI service networks, see
the Tanzu Operations Manager documentation.

6. Click Save.

(Optional) Configure TKGI and Certificate Expiration Metric Exporter


VMs
In the TKGI Metric Exporter VMs pane, you configure static IP addresses for the TKGI metric exporter and
certificate expiration metric exporter VMs. After generating these metrics, the metric exporter VMs expose
them in Prometheus exposition format on a secured endpoint.

The IP addresses you configure in the TKGI Metric Exporter VMs pane must not be within
the reserved IP ranges you configured in the BOSH Director tile.

To configure the TKGI Metric Exporter VMs pane:

1. Select TKGI Metric Exporter VMs.

2. (Optional) For Static IP address for TKGI metric exporter VM, enter a valid static IP address that
you want to reserve for the TKGI metric exporter VM. The TKGI metric exporter VM collects health
metrics from the BOSH Director. For more information, see TKGI Metric Exporter VM in
Healthwatch Metrics.

56
Healthwatch for VMware Tanzu

3. (Optional) For Static IP address for certificate expiration metric exporter VM, enter a valid static
IP address that you want to reserve for the certificate expiration metric exporter VM. The certificate
expiration metric exporter VM collects metrics that show when certificates in your Tanzu
Operations Manager deployment are due to expire. For more information, see Certificate Expiration
Metric Exporter VM and Monitoring Certificate Expiration.

If you have both Healthwatch Exporter for TKGI and Healthwatch Exporter for
Tanzu Platform for Cloud Foundry installed on the same Tanzu Operations
Manager foundation, scale the certificate expiration metric exporter VM to zero
instances in the Resource Config pane in one of the Healthwatch Exporter tiles.
Otherwise, the two certificate expiration metric exporter VMs create redundant
sets of metrics.

4. (Optional) If your Tanzu Operations Manager deployment uses self-signed certificates, select the
Skip TLS certificate verification checkbox. When this checkbox is selected, the certificate
expiration metric exporter VM does not verify the identity of the Tanzu Operations Manager VM.
This checkbox is unselected by default.

5. Click Save.

(Optional) Configure the TKGI SLI Exporter VM


In the TKGI SLI Exporter VM pane, you configure the TKGI SLI exporter VM. The TKGI SLI exporter VM
generates SLIs that allow you to monitor whether the core functions of the TKGI Command-Line Interface
(TKGI CLI) are working as expected. The TKGI CLI allows developers to create and manage Kubernetes
clusters through TKGI. For more information, see TKGI SLI Exporter VM in Healthwatch Metrics.

To configure the TKGI SLI Exporter VM pane:

1. Select TKGI SLI Exporter VM.

2. (Optional) For Static IP address for TKGI SLI exporter VM, enter a valid static IP address that
you want to reserve for the TKGI SLI exporter VM. This IP address must not be within the =
reserved IP ranges you configured in the BOSH Director tile.

3. For SLI test frequency, enter in seconds how frequently you want the TKGI SLI exporter VM to run
SLI tests.

4. (Optional) To allow TKGI SLI exporter VM to communicate with the TKGI API over TLS, configure
one of the following options:

To configure the TKGI SLI exporter VM to use a self-signed certificate authority (CA)
certificate or a certificate that is signed by a self-signed CA certificate when
communicating with the TKGI API over TLS:
1. For CA certificate for TLS, provide the CA certificate. If you provide a self-signed
CA certificate, it must be the for same CA that signs the certificate in the TKGI
API.

2. If you provide a self-signed CA certificate or a certificate that is signed by a self-


signed CA certificate, the Skip TLS certificate verification checkbox becomes
configurable. Deselect the Skip TLS certificate verification checkbox.

57
Healthwatch for VMware Tanzu

To configure the TKGI SLI exporter VM to skip TLS certificate verification when
communicating with the TKGI API over TLS, leave the CA certificate for TLS field blank.
The Skip TLS certificate verification checkbox is selected and not configurable by
default. When this checkbox is selected, the TKGI SLI exporter VM does not verify the
identity of the TKGI API. VMware does not recommend skipping TLS certificate verification
in a production environment.

5. Click Save.

Configure the BOSH Health Metric Exporter VM


In the BOSH Health Metric Exporter VM pane, you configure the AZ and VM type of the BOSH health
metric exporter VM. Healthwatch Exporter for TKGI deploys the BOSH health metric exporter VM, which
creates a BOSH deployment called bosh-health every ten minutes. The bosh-health deployment deploys
another VM, bosh-health-check, that runs a suite of SLI tests to validate the functionality of the BOSH
Director. After the SLI tests are complete, the BOSH health metric exporter VM collects the metrics from
the bosh-health-check VM, then deletes the bosh-health deployment and the bosh-health-check VM.
For more information, see BOSH Health Metric Exporter VM in Healthwatch Metrics.

To configure the BOSH Health Metric Exporter VM pane:

1. Select BOSH Health Metric Exporter VM.

2. Under Availability zone, select the AZ on which you want Healthwatch Exporter for TKGI to deploy
the BOSH health metric exporter VM.

3. Under VM type, select from the dropdown the type of VM you want Healthwatch Exporter for TKGI
to deploy.

4. Click Save.

If you have both Healthwatch Exporter for TKGI and Healthwatch Exporter for Tanzu
Platform for Cloud Foundry installed on the same Tanzu Operations Manager foundation,
scale the BOSH health metric exporter VM to zero instances in the Resource Config pane
in one of the Healthwatch Exporter tiles. Otherwise, the two sets of BOSH health metric
exporter VM metrics cause a 401 error in your BOSH Director deployment, and one set of
metrics reports that the BOSH Director is down in the Grafana UI. For more information,
see BOSH Health Metrics Cause Errors When Two Healthwatch Exporter Tiles Are
Installed in Troubleshooting Healthwatch.

(Optional) Configure the BOSH Deployment Metric Exporter VM


In the BOSH Deployment Metric Exporter VM pane, you configure the authentication credentials and a
static IP address for the BOSH deployment metric exporter VM. This VM checks every 30 seconds
whether any BOSH deployments other than the one created by the BOSH health metric exporter VM are
running. For more information, see BOSH Deployment Metric Exporter VM in Healthwatch Metrics.

To configure the BOSH Deployment Metric Exporter VM pane:

1. Select BOSH Deployment Metric Exporter VM.

2. (Optional) For UAA client credentials, enter the username and secret for the UAA client that the
BOSH deployment metric exporter VM uses to access the BOSH Director VM. For more

58
Healthwatch for VMware Tanzu

information, see Create a UAA Client for the BOSH Deployment Metric Exporter VM.

3. (Optional) For Static IP address for BOSH deployment metric exporter VM, enter a valid static
IP address that you want to reserve for the BOSH deployment metric exporter VM. This IP address
must not be within the reserved IP ranges you configured in the BOSH Director tile.

4. Click Save.

If you have both Healthwatch Exporter for TKGI and Healthwatch Exporter for Tanzu
Platform for Cloud Foundry installed on the same Tanzu Operations Manager foundation,
scale the BOSH deployment metric exporter VM to zero instances in the Resource Config
pane in one of the Healthwatch Exporter tiles. Otherwise, the two BOSH deployment metric
exporter VMs create redundant sets of metrics.

Create a UAA Client for the BOSH Deployment Metric Exporter VM


To allow the BOSH deployment metric exporter VM to access the BOSH Director VM and view BOSH
deployments, you must create a new UAA client for the BOSH deployment metric exporter VM. The
procedure to create this UAA client differs depending on the authentication settings of your Tanzu
Operations Manager deployment.

To create a UAA client for the BOSH deployment metric exporter VM:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Record the IP address for the BOSH Director VM and the login and administrator credentials for the
BOSH Director UAA instance. For more information about internal authentication settings for your
Tanzu Operations Manager deployment, see the Tanzu Operations Manager documentation.

If your Tanzu Operations Manager deployment uses internal authentication:

1. Click the BOSH Director tile.

2. Select the Status tab.

3. Record the IP address in the IPs column of the BOSH Director row.

4. Select the Credentials tab.

5. In the Uaa Admin Client Credentials row of the BOSH Director section, click
Link to Credential.

6. Record the value of password. This value is the secret for Uaa Admin Client
Credentials.

7. Return to the Credentials tab.

8. In the Uaa Login Client Credentials row of the BOSH Director section, click Link
to Credential.

9. Record the value of password. This value is the secret for Uaa Login Client
Credentials.

If your Tanzu Operations Manager deployment uses SAML authentication:

1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.

59
Healthwatch for VMware Tanzu

2. Click Settings.

3. Select SAML Settings.

4. Select the Provision an Admin Client in the BOSH UAA checkbox.

5. Click Enable SAML Authentication.

6. Return to the Tanzu Ops Manager Installation Dashboard.

7. Click the BOSH Director tile.

8. Select the Status tab.

9. Record the IP address in the IPs column of the BOSH Director row.

10. Select the Credentials tab.

11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.

12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.

If your Tanzu Operations Manager deployment uses LDAP authentication:

1. Click the user account menu in the upper-right corner of the Tanzu Ops Manager
Installation Dashboard.

2. Click Settings.

3. Select LDAP Settings.

4. Select the Provision an Admin Client in the BOSH UAA checkbox.

5. Click Enable LDAP Authentication.

6. Return to the Tanzu Ops Manager Installation Dashboard.

7. Click the BOSH Director tile.

8. Select the Status tab.

9. Record the IP address in the IPs column of the BOSH Director row.

10. Select the Credentials tab.

11. In the Uaa Bosh Client Credentials row of the BOSH Director section, click Link
to Credential.

12. Record the value of password. This value is the secret for Uaa Bosh Client
Credentials.

3. SSH into the Tanzu Operations Manager VM by following the procedure in the Tanzu Operations
Manager documentation.

4. Target the UAA instance for the BOSH Director by running:

uaac target https://BOSH-DIRECTOR-IP:8443 --skip-ssl-validation

Where BOSH-DIRECTOR-IP is the IP address for the BOSH Director VM that you recorded from the
Status tab in the BOSH Director tile in an earlier step.

60
Healthwatch for VMware Tanzu

5. Log in to the UAA instance:

If your Tanzu Operations Manager deployment uses internal authentication, log in to the
UAA instance by running:

uaac token owner get login -s UAA-LOGIN-CLIENT-SECRET

Where UAA-LOGIN-CLIENT-SECRET is the secret you recorded from the Uaa Login Client
Credentials row in the Credentials tab in the BOSH Director tile in an earlier step.

If your Tanzu Operations Manager deployment uses SAML or LDAP, log in to the UAA
instance by running:

uaac token client get bosh_admin_client -s BOSH-UAA-CLIENT-SECRET

Where BOSH-UAA-CLIENT-SECRET is the secret you recorded from the Uaa Bosh Client
Credentials row in the Credentials tab in the BOSH Director tile in an earlier step.

6. When prompted, enter the UAA administrator client username admin and the secret you recorded
from the Uaa Admin Client Credentials row in the Credentials tab in the BOSH Director tile in an
earlier step.

7. Create a UAA client for the BOSH deployment metric exporter VM by running:

uaac client add CLIENT-USERNAME \


--secret CLIENT-SECRET \
--authorized_grant_types client_credentials,refresh_token \
--authorities bosh.read \
--scope bosh.read

Where:

CLIENT-USERNAME is the username you want to set for the UAA client.

CLIENT-SECRET is the secret you want to set for the UAA client.

8. Return to the Tanzu Ops Manager Installation Dashboard.

9. Click the Healthwatch Exporter for Tanzu Kubernetes Grid - Integrated tile.

10. Select BOSH Deployment Metric Exporter VM.

11. For UAA client credentials, enter the username and secret for the UAA client you just created.

(Optional) Configure Errands


Errands are scripts that Tanzu Operations Manager runs automatically when it installs or uninstalls a
product, such as a new version of Healthwatch Exporter for TKGI. There are two types of errands: post-
deploy errands run after the product is installed, and pre-delete errands run before the product is uninstalled.
However, there are no pre-delete errands for Healthwatch Exporter for TKGI.

By default, Tanzu Operations Manager always runs all errands.

In the Errands pane, you can select On to always run an errand or Off to never run it.

For more information about how Tanzu Operations Manager manages errands, see the Tanzu Operations
Manager documentation.

61
Healthwatch for VMware Tanzu

To configure the Errands pane:

1. Select Errands.

2. (Optional) This tile has only one errand: choose whether to always run or never run the Smoke
Tests errand. This errand verifies that the metric exporter VMs are running.

3. Click Save.

(Optional) Configure Syslog


In the Syslog pane, you can configure system logging in Healthwatch Exporter for TKGI to forward log
messages from tile component VMs to an external destination for troubleshooting, such as a remote server
or external syslog aggregation service.

To configure the Syslog pane:

1. Select Syslog.

2. Under Do you want to configure Syslog forwarding?, select one of the following options:

No, do not forward Syslog: Disallows syslog forwarding

Yes: Allows syslog forwarding and allows you to edit the configuration fields described

3. For Address, enter the IP address or DNS domain name of your external destination.

4. For Port, enter a port on which your external destination listens.

5. For Transport Protocol, select TCP or UDP from the dropdown. This determines which transport
protocol Healthwatch Exporter for TKGI uses to forward system logs to your external destination.

6. (Optional) To transmit logs over TLS:

1. Select the Enable TLS checkbox. This checkbox is unselected by default.

2. In Permitted Peer, enter either the name or SHA1 fingerprint of the remote peer.

3. In SSL Certificate, paste in the TLS certificate for your external destination.

7. (Optional) For Queue Size, specify the number of log messages Healthwatch Exporter for TKGI
can hold in a buffer at a time before sending them to your external destination. The default value is
100000.

8. (Optional) To forward debug logs to your external destination, select the Forward Debug Logs
checkbox. This checkbox is unselected by default.

9. (Optional) To specify a custom syslog rule, enter it in Custom rsyslog configuration in


RainerScript syntax. For more information about custom syslog rules, see the Tanzu Platform for
Cloud Foundry documentation. For more information about RainerScript syntax, see the rsyslog
documentation.

10. Click Save Syslog Settings.

(Optional) Configure Resources


In the Resource Config pane, you can scale VMs in Healthwatch Exporter for TKGI VMs up or down
according to the needs of your deployment, and you can associate load balancers with a group of VMs. For
example, you can scale the persistent disk size of a metric exporter VM to allow longer data retention.

62
Healthwatch for VMware Tanzu

To configure the Resource Config pane:

1. Select Resource Config.

2. (Optional) To scale a job, select an option from the dropdown for the resource you want to modify:

Instances: Configures the number of instances each job has

VM Type: Configures the type of VM used in each instance

Persistent Disk Type: Configures the amount of persistent disk space to allocate to the
job

3. (Optional) To add a load balancer to a job:

1. Click the icon next to the job name.

2. In Load Balancers, enter the name of your load balancer.

3. Ensure that the Internet Connected checkbox is unselected. Selecting this checkbox
gives VMs a public IP address that allows outbound Internet access.

4. Click Save.

Deploy Healthwatch Exporter for TKGI


To complete your installation of the Healthwatch Exporter for TKGI tile:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Click Review Pending Changes.

3. Click Apply Changes.

For more information, see the Tanzu Operations Manager documentation.

Configure a Scrape Job for Healthwatch Exporter for TKGI


After you have successfully deployed Healthwatch Exporter for TKGI, you must configure a scrape job in
the Prometheus instance that exists within your metrics monitoring system, unless you installed
Healthwatch Exporter for TKGI on the same Tanzu Operations Manager foundation as the Healthwatch tile.
Follow the procedure in one of the following sections, depending on which monitoring system you use:

If you monitor metrics using the Healthwatch tile on an Tanzu Operations Manager foundation, see
Configure a Scrape Job for Healthwatch Exporter for TKGI in Healthwatch.

You don't need to configure a scrape job for installations of Healthwatch Exporter
for TKGI that are on the same Tanzu Operations Manager foundation as your
Healthwatch tile. The Prometheus instance in the Healthwatch tile automatically
discovers and scrapes Healthwatch Exporter tiles that are installed on the same
Tanzu Operations Manager foundation as the Healthwatch tile.

If you monitor metrics using a service or database located outside your Tanzu Operations Manager
foundation, such as from an external TSDB, see Configure a Scrape Job for Healthwatch Exporter
for TKGI in an External Monitoring System.

63
Healthwatch for VMware Tanzu

Configure a Scrape Job for Healthwatch Exporter for TKGI in


Healthwatch
To configure a scrape job for Healthwatch Exporter for TKGI in the Healthwatch tile on your Tanzu
Operations Manager foundation, see Configure Prometheus in Configuring Healthwatch.

Configure a Scrape Job for Healthwatch Exporter for TKGI in an


External Monitoring System
To configure a scrape job for Healthwatch Exporter for TKGI in a service or database that is located outside
your Tanzu Operations Manager foundation:

1. Open network communication paths from your external service or database to the metric exporter
VMs in Healthwatch Exporter for TKGI. The procedure to open these network paths differs
depending on your Tanzu Operations Manager foundation’s IaaS. For a list of TCP ports used by
each metric exporter VM, see Required Networking Rules for Healthwatch Exporter for TKGI in
Healthwatch Architecture.

2. In the scrape_config section of the Prometheus configuration file, create a scrape job for your
Tanzu Operations Manager foundation. Under static_config, specify the TCP ports of each
metric exporter VM as static targets for the IP address of your external service or database. For
example:

job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:8443"
- "1.2.3.4:25555"
- "1.2.3.4:443"
- "1.2.3.4:25595"
- "1.2.3.4:9021"

For more information about the scrape_config section of the Prometheus configuration file, see
the Prometheus documentation. For more information about the static_config section of the
Prometheus configuration file, see the Prometheus documentation.

Configuring Multi-Foundation Monitoring


This topic describes how to configure Healthwatch for VMware Tanzu to monitor multiple VMware Tanzu
Operations Manager foundations.

You can monitor several Tanzu Operations Manager foundations that have VMware Tanzu Platform for
Cloud Foundry or VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) installed from a Healthwatch tile
that you install on a separate Tanzu Operations Manager foundation.

There are two ways to monitor several Tanzu Operations Manager foundations from a single monitoring
Tanzu Operations Manager foundation:

Direct scraping: The Prometheus instance in the Healthwatch deployment on your monitoring
Tanzu Operations Manager foundation scrapes metrics directly from the metric exporter VMs
deployed by the Healthwatch Exporter tiles installed on the Tanzu Operations Manager foundation

64
Healthwatch for VMware Tanzu

you monitor. Direct scraping allows you to easily scrape metrics from the Healthwatch Exporter
tiles on the Tanzu Operations Manager foundations you monitor and store them in a single
Prometheus instance. For more information, see Configuring Multi-Foundation Monitoring Through
Direct Scraping.

Federation: The Prometheus instance in the Healthwatch deployment on your monitoring Tanzu
Operations Manager foundation federates metrics from the Prometheus instances in the
Healthwatch deployments on the Tanzu Operations Manager foundations you monitor. Federation
allows you to to monitor a subset of metrics from multiple Tanzu Operations Manager foundations
without storing all metrics from those Tanzu Operations Manager foundations in a single
Prometheus instance. For more information, see Configuring Multi-Foundation Monitoring Through
Federation.

With both methods, you can label the metrics with the name of the Tanzu Operations Manager foundation
from which they were collected. This allows you to see all metrics for a specific Tanzu Operations Manager
foundation or compare certain metrics across Tanzu Operations Manager foundations.

Configuring Multi-Foundation Monitoring Through Direct


Scraping
When you configure direct scraping for your multi-foundation Healthwatch deployment, the Prometheus
instance in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes metrics
directly from the metric exporter VMs deployed by the Healthwatch Exporter tiles installed on the Tanzu
Operations Manager foundation you monitor.

Direct scraping allows you to easily scrape the metrics you want to monitor from the Healthwatch Exporter
tiles on the Tanzu Operations Manager foundations you monitor. If you want to monitor component metrics
and SLIs related to the health of your Tanzu Platform for CF or TKGI deployments, and you do not want to
monitor metrics for Kubernetes clusters for any TKGI deployments, VMware recommends configuring direct
scraping for your multi-foundation Healthwatch deployment.

However, the Prometheus instance in the Healthwatch deployment on your monitoring Tanzu Operations
Manager foundation cannot directly scrape metrics for Kubernetes clusters created through TKGI
deployments on other Tanzu Operations Manager foundations. If you want to also scrape metrics for
Kubernetes clusters for TKGI deployments on the Tanzu Operations Manager foundations you monitor, you
must monitor your multi-foundation Healthwatch deployment through federation instead. For more
information, see Configure Federation for TKGI in the next section.

To configure direct scraping for your multi-foundation Healthwatch deployment, you must install the
Healthwatch tile on your monitoring Tanzu Operations Manager foundation and only the Healthwatch
Exporter tile for Tanzu Platform for Cloud Foundry or the Healthwatch Exporter tile for TKGI on the Tanzu
Operations Manager foundations you want to monitor.

To configure direct scraping for your multi-foundation Healthwatch deployment, see Configuring Direct
Scraping for Multi-Foundation Monitoring.

Configuring Multi-Foundation Monitoring Through


Federation

65
Healthwatch for VMware Tanzu

When you configure federation for your multi-foundation Healthwatch deployment, the Prometheus instance
in the Healthwatch tile on a monitoring Tanzu Operations Manager foundation scrapes a subset of metrics
from the Prometheus instances in the Healthwatch tiles installed on the Tanzu Operations Manager
foundations you monitor.

Federation allows you to to monitor a subset of metrics from multiple Tanzu Operations Manager
foundations without storing all metrics from those Tanzu Operations Manager foundations in a single
Prometheus instance. Because federation allows you to choose which metrics the Healthwatch deployment
on your monitoring Tanzu Operations Manager foundation receives, you can monitor a large number of
Tanzu Operations Manager foundations without overwhelming the Prometheus instance in the Healthwatch
deployment on your monitoring Tanzu Operations Manager foundation. If you want to monitor component
metrics, SLIs related to the health of your Tanzu Platform for CF or TKGI deployments, and metrics for
Kubernetes clusters for TKGI deployments, or if you want to monitor a large number of Tanzu Operations
Manager foundations, VMware recommends configuring federation for your multi-foundation Healthwatch
deployment.

Federating all metrics from an Tanzu Operations Manager foundation you monitor
negatively affects the performance of the Prometheus instance in the Healthwatch tile
installed on your monitoring Tanzu Operations Manager foundation, sometimes even
causing it to crash. To avoid this, VMware recommends limiting federation to only certain
metrics, such as service level indicator (SLI) metrics, from each Tanzu Operations
Manager foundation you monitor. For more information about the metrics you can collect,
see Healthwatch Metrics.

Federation also reduces firewall and network complexity for your multi-foundation Healthwatch deployment,
since the Prometheus instance in the Healthwatch tile on your monitoring Tanzu Operations Manager
foundation scrapes metrics only from the Prometheus instance on each of the Tanzu Operations Manager
foundations you monitor, rather than from each metric exporter VM deployed by the Healthwatch Exporter
tile on each of the Tanzu Operations Manager foundations you monitor.

To configure federation for your multi-foundation Healthwatch deployment, you must install the Healthwatch
tile on your monitoring Tanzu Operations Manager foundation and on each Tanzu Operations Manager
foundation you want to monitor, in addition to installing the Healthwatch Exporter tile on each Tanzu
Operations Manager foundation you want to monitor. Then, you must configure the Healthwatch tile on your
monitoring Tanzu Operations Manager foundation to federate metrics from the Prometheus installed on the
Tanzu Operations Manager foundations you want to monitor.

To configure federation for your multi-foundation Healthwatch deployment, see Configuring Federation for
Multi-Foundation Monitoring.

Configuring Direct Scraping for Multi-Foundation


Monitoring
This topic describes how to configure direct scraping for your multi-foundation Healthwatch for VMware
Tanzu deployment. This is the recommended method to use if you want to monitor component metrics and
SLIs related to the health of your VMware Tanzu Platform for Cloud Foundry or VMware Tanzu Kubernetes
Grid Integrated Edition (TKGI) deployments, and you do not want to monitor metrics for Kubernetes clusters
for any TKGI deployments.

66
Healthwatch for VMware Tanzu

The Prometheus instance in the Healthwatch deployment on your monitoring Tanzu


Operations Manager foundation cannot directly scrape metrics for Kubernetes clusters
created through TKGI deployments on other Tanzu Operations Manager foundations. If you
want to also scrape metrics Kubernetes clusters for TKGI deployments on the Tanzu
Operations Manager foundations you monitor, you must monitor your multi-foundation
Healthwatch deployment through federation instead. For more information, see Configuring
Multi-Foundation Monitoring through Federation.

When you configure direct scraping for your multi-foundation Healthwatch deployment, the Prometheus
instance in the Healthwatch tile on a monitoring VMware Tanzu Operations Manager foundation scrapes
metrics directly from the metric exporter VMs deployed by the Healthwatch Exporter tiles installed on the
Tanzu Operations Manager foundation you monitor.

Direct scraping allows you to easily scrape the metrics you want to monitor from the Healthwatch Exporter
tiles on the Tanzu Operations Manager foundations you monitor.

To configure direct scraping for your multi-foundation Healthwatch deployment, you must install the
Healthwatch tile on your monitoring Tanzu Operations Manager foundation and only the Healthwatch
Exporter for Tanzu Platform for CF tile or Healthwatch Exporter for TKGI tile on the Tanzu Operations
Manager foundations you want to monitor.

Configure Direct Scraping


To configure direct scraping for your multi-foundation Healthwatch deployment:

1. Install and configure the Healthwatch tile on your monitoring Tanzu Operations Manager foundation.
To install and configure the Healthwatch tile, see the following topics:

Installing a Tile Manually or


Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

Configuring Healthwatch

2. Install and configure either Healthwatch Exporter for Tanzu Platform for Cloud Foundry or
Healthwatch Exporter for TKGI on each Tanzu Operations Manager foundation you want to monitor.
To install and configure a Healthwatch Exporter tile, see the following topics:

Installing a Tile Manually or


Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry

Configuring Healthwatch Exporter for TKGI

3. For each Healthwatch Exporter tile you installed and configured, open the ports for the metric
exporter VMs that the Healthwatch Exporter tile deploys in the user console for your IaaS. For more
information about the ports you must open for each metric exporter VM, see either Networking
Rules for Healthwatch Exporter for Tanzu Platform for Cloud Foundry or Networking Rules for
Healthwatch Exporter for TKGI in Healthwatch Architecture.

4. Add a scrape job for each Healthwatch Exporter tile in the Prometheus pane of the Healthwatch
tile that you installed on your monitoring Tanzu Operations Manager foundation. To add a scrape job
for a Healthwatch Exporter tile:

67
Healthwatch for VMware Tanzu

1. Retrieve the Tanzu Operations Manager root certificate authority (CA) for the Tanzu
Operations Manager foundation you want to monitor. For more information, see the Tanzu
Operations Manager documentation.

2. Go to the Tanzu Ops Manager Installation Dashboard for the Tanzu Operations Manager
foundation you want to monitor.

3. Click the Healthwatch Exporter for Tanzu Platform for Cloud Foundry or Healthwatch
Exporter for Tanzu Kubernetes Grid - Integrated tile, depending on which Healthwatch
Exporter tile you installed on the Tanzu Operations Manager foundation you want to
monitor.

4. Select the Credentials tab.

5. In the row for Healthwatch Exporter Client Mtls, click Link to Credential.

6. Record the credentials for Healthwatch Exporter Client Mtls.

7. In a browser window, navigate to the user console for your Tanzu Operations Manager
deployment’s IaaS.

8. In the user console for your IaaS, record the public IP addresses of the metric exporter
VMs deployed by the Healthwatch Exporter tile you installed on the Tanzu Operations
Manager foundation you want to monitor, depending on which metrics you want to monitor
for that foundation:

For Healthwatch Exporter for Tanzu Platform for Cloud Foundry, record the public
IP addresses of any or all of the following metric exporter VMs:
pas-exporter-counter, the counter metric exporter VM

pas-exporter-gauge, the gauge metric exporter VM

pas-sli-exporter, the Tanzu Platform for CF SLI exporter VM

cert-expiration-exporter, the certificate expiration metric exporter VM

bosh-health-exporter, the BOSH health metric exporter VM

bosh-deployments-exporter, the BOSH deployment metric exporter VM

For Healthwatch Exporter for TKGI, record the public IP addresses of any or all of
the following metric exporter VMs:

pks-exporter, the TKGI metric exporter VM

cert-expiration-exporter, the certificate expiration metric exporter VM

pks-sli-exporter, the TKGI SLI exporter VM

bosh-health-exporter, the BOSH health metric exporter VM

bosh-deployments-exporter, the BOSH deployment metric exporter VM

Storing all metrics from multiple Tanzu Operations


Manager foundations in a single Prometheus instance on
your monitoring Tanzu Operations Manager foundation
negatively affects the performance of that Prometheus
instance. If you want to monitor a large number of Tanzu

68
Healthwatch for VMware Tanzu

Operations Manager foundations, or if some of the


foundations you want to monitor have particularly large
Tanzu Platform for CF or TKGI deployments, VMware
recommends configuring the Prometheus instance in the
Healthwatch deployment on your monitoring foundation to
scrape only from the metric exporter VMs that you need
to monitor the most. For more information about the
metrics that each metric exporter VM collects, see
Healthwatch Metrics.

To find the public IP addresses of deployed VMs in the user console for your IaaS, see the
documentation for your IaaS:

AWS: To find the public IP address of a Linux instance, see the AWS
documentation for Linux instances of Amazon EC2. To find the public IP address
for a Windows instance, see the AWS documentation for Windows instances of
Amazon EC2.

Azure: To create or view the public IP address for an Azure VM, see the Azure
documentation.

GCP: To find the public IP address for a GCP VM, see the GCP documentation.

OpenStack: To associate a floating IP address to an OpenStack VM, see the


OpenStack documentation.

vSphere: To find the public IP address of a vSphere VM, see the vSphere
documentation.

VMs deployed on one Tanzu Operations Manager foundation typically


cannot send or receive traffic from other Tanzu Operations Manager
foundations using their public IP addresses, by default. You must
configure the firewall for your IaaS to allow ingress traffic to port 9090 on
your monitoring Tanzu Operations Manager foundation.

9. Go to the Tanzu Ops Manager Installation Dashboard for your monitoring Tanzu Operations
Manager foundation.

10. Click the Healthwatch tile.

11. Select Prometheus.

12. Under Additional scrape jobs, click Add.

13. In Scrape job configuration parameters, provide configuration parameters in YAML


format for the scrape job for the Healthwatch Exporter tile you installed on the Tanzu
Operations Manager foundation you want to monitor:

For Healthwatch Exporter for Tanzu Platform for Cloud Foundry, provide
configuration parameters similar to the following example:

job_name: FOUNDATION-NAME
metrics_path: /metrics
scheme: https

69
Healthwatch for VMware Tanzu

static_configs:
- targets:
- "COUNTER-EXPORTER-VM-IP-ADDRESS:9090"
- "GAUGE-EXPORTER-VM-IP-ADDRESS:9090"
- "SLI-EXPORTER-VM-IP-ADDRESS:9090"
- "CERT-EXPIRATION-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-HEALTH-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-DEPLOYMENTS-EXPORTER-VM-IP-ADDRESS:9090"

Where:

FOUNDATION-NAME is the name of the Tanzu Operations Manager


foundation you want to monitor.

(Optional) COUNTER-EXPORTER-VM-IP-ADDRESS is the IP address of the


counter metric exporter VM that you recorded in a previous step.

(Optional) GAUGE-EXPORTER-VM-IP-ADDRESS is the IP address of the


gauge metric exporter VM that you recorded in a previous step.

(Optional) SLI-EXPORTER-VM-IP-ADDRESS is the IP address of the Tanzu


Platform for CF SLI exporter VM that you recorded in a previous step.

(Optional) CERT-EXPIRATION-EXPORTER-VM-IP-ADDRESS is the IP address


of the certificate expiration exporter VM that you recorded in a previous
step.

(Optional) BOSH-HEALTH-EXPORTER-VM-IP-ADDRESS is the IP address of


the BOSH health metric exporter VM that you recorded in a previous step.

(Optional) BOSH-DEPLOYMENTS-EXPORTER-VM-IP-ADDRESS is the IP


address of the BOSH deployment metric exporter VM that you recorded in
a previous step.

For Healthwatch Exporter for TKGI, provide configuration parameters similar to the
following example:

job_name: FOUNDATION-NAME
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "TKGI-EXPORTER-VM-IP-ADDRESS:9090"
- "CERT-EXPIRATION-EXPORTER-VM-IP-ADDRESS:9090"
- "SLI-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-HEALTH-EXPORTER-VM-IP-ADDRESS:9090"
- "BOSH-DEPLOYMENTS-EXPORTER-VM-IP-ADDRESS:9090"

Where:

FOUNDATION-NAME is the name of the Tanzu Operations Manager


foundation you want to monitor.

(Optional) TKGI-EXPORTER-VM-IP-ADDRESS is the IP address of the TKGI


metric exporter VM that you recorded in a previous step.

70
Healthwatch for VMware Tanzu

(Optional) CERT-EXPIRATION-EXPORTER-VM-IP-ADDRESS is the IP address


of the certificate expiration exporter VM that you recorded in a previous
step.

(Optional) SLI-EXPORTER-VM-IP-ADDRESS is the IP address of the TKGI


SLI exporter VM that you recorded in a previous step.

(Optional) BOSH-HEALTH-EXPORTER-VM-IP-ADDRESS is the IP address of


the BOSH health metric exporter VM that you recorded in a previous step.

(Optional) BOSH-DEPLOYMENTS-EXPORTER-VM-IP-ADDRESS is the IP


address of the BOSH deployment metric exporter VM that you recorded in
a previous step.

14. In Certificate and private key for TLS, enter the certificate and private key from
Healthwatch Exporter Client Mtls that you recorded from the Credentials tab in the
Healthwatch Exporter tile in a previous step.

15. In CA certificate for TLS, enter the Tanzu Operations Manager root CA that you retrieved
in a previous step.

16. In Target server name, enter the custom hostname resolver to use when verifying the TLS
certificates. Enter the name of the server that facilitates TLS communication between the
Prometheus instance in the Healthwatch tile and the metric exporter VMs that the
Healthwatch Exporter tile deploys. If the CN or SAN on the TLS certificate does not match
the URL or IP of the target server, enter what is on the TLS certificate.

Configuring Federation for Multi-Foundation Monitoring


This topic describes how to configure federation for your multi-foundation Healthwatch for VMware Tanzu
deployment. This is the recommended method to use if you want to monitor metrics for Kubernetes clusters
for TKGI deployments.

If you want to set up multi-foundation monitoring for a deployments without Kubernetes


clusters, use the Direct Scraping configuration. See Configuring Direct Scraping for Multi-
Foundation Monitoring for details.

When you configure your Healthwatch deployment to federate metrics, the Prometheus instance in the
Healthwatch tile on a monitoring VMware Tanzu Operations Manager foundation scrapes a subset of
metrics from the Prometheus instances in the Healthwatch tiles installed on the Tanzu Operations Manager
foundations you monitor. This is useful if you want to monitor a subset of metrics from multiple Tanzu
Operations Manager foundations without storing all metrics from those Tanzu Operations Manager
foundations in a single Prometheus instance. Because federation allows you to choose which metrics the
Healthwatch deployment on your monitoring Tanzu Operations Manager foundation receives, you can
monitor a large number of Tanzu Operations Manager foundations without overwhelming the Prometheus
instance in the Healthwatch deployment on your monitoring Tanzu Operations Manager foundation.

To configure federation for your Healthwatch deployment, you must install the Healthwatch tile on your
monitoring Tanzu Operations Manager foundation and on each Tanzu Operations Manager foundation you
want to monitor, in addition to installing the Healthwatch Exporter tile on each Tanzu Operations Manager
foundation you want to monitor. Then, you must configure the Healthwatch tile on your monitoring Tanzu

71
Healthwatch for VMware Tanzu

Operations Manager foundation to federate metrics from the Prometheus installed on the Tanzu Operations
Manager foundations you want to monitor. If you want to federate metrics from Tanzu Operations Manager
foundations with TKGI installed, you must also configure TKGI cluster discovery on the Tanzu Operations
Manager foundations you want to monitor.

To configure federation for your multi-foundation Healthwatch deployment:

1. Set up your multi-foundation deployment for federation by following the procedure in the section for
your runtime:

Set up your Multi-Foundation Tanzu Platform for Cloud Foundry Deployment

Set up your Multi-Foundation TKGI Deployment

2. Configure scrape jobs for the Prometheus instances in the Healthwatch tiles on the Tanzu
Operations Manager foundations you want to monitor. See Configure Scrape Jobs.

3. Test your federation configuration. See Test your Federation Configuration.

If your multi-foundation Healthwatch deployment contains one or more highly available (HA)
Healthwatch deployments, see Federation for a Highly Available Healthwatch Deployment.

For more information about federation, see the Prometheus documentation.

Federating all metrics from an Tanzu Operations Manager foundation you monitor
negatively affects the performance of the Prometheus instance in the Healthwatch
tile installed on your monitoring Tanzu Operations Manager foundation, sometimes
even causing it to crash. To avoid this, VMware recommends federating only
certain metrics, such as service level indicator (SLI) metrics, from each Tanzu
Operations Manager foundation you monitor. For more information about the
metrics you can collect, see Healthwatch Metrics.

Set up your Multi-Foundation Tanzu Platform for Cloud


Foundry Deployment
To configure Tanzu Platform for CF deployments on multiple Tanzu Operations Manager foundations to
federate metrics to a single monitoring Tanzu Operations Manager foundation:

1. Install and configure the Healthwatch and Healthwatch Exporter tiles on each Tanzu Operations
Manager foundation you want to monitor. To install and configure the Healthwatch and Healthwatch
Exporter tiles, see the following topics:

Installing a Tile Manually or


Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

Configuring Healthwatch

Configuring Healthwatch Exporter for Tanzu Platform for Cloud Foundry

2. Install and configure the Healthwatch tile on your monitoring Tanzu Operations Manager foundation.
To install and configure the Healthwatch tile, see the following topics:

Installing a Tile Manually or


Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

72
Healthwatch for VMware Tanzu

Configuring Healthwatch

3. In the Healthwatch tile on your monitoring Tanzu Operations Manager foundation, configure scrape
jobs for the Prometheus instances in the Healthwatch tiles on the Tanzu Operations Manager
foundations you want to monitor. See Configure Scrape Jobs.

4. Test your federation configuration. See Test your Federation Configuration.

Set up your Multi-Foundation TKGI Deployment


When you install the Healthwatch tile on an Tanzu Operations Manager foundation that has TKGI installed,
you can configure the Prometheus instance to detect on-demand Kubernetes clusters created through the
TKGI API and create scrape jobs for them. However, the Prometheus instance in a Healthwatch deployment
can only detect Kubernetes clusters for TKGI deployments on the same Tanzu Operations Manager
foundation.

For a Healthwatch deployment on one Tanzu Operations Manager foundation to receive metrics for
Kubernetes clusters created through TKGI deployments on other Tanzu Operations Manager foundations,
you must configure the Healthwatch Exporter for TKGI deployment on those Tanzu Operations Manager
foundations to federate metrics to the Prometheus instance in the Healthwatch deployment on the Tanzu
Operations Manager foundation you use to monitor the other Tanzu Operations Manager foundations. If you
do not configure federation for TKGI deployments on the Tanzu Operations Manager foundations you want
to monitor, the Healthwatch Exporter for TKGI deployments on those Tanzu Operations Manager
foundations can only send component metrics and SLIs related to the health of those TKGI deployments.

To configure TKGI deployments on multiple Tanzu Operations Manager foundations to federate metrics to a
single monitoring Tanzu Operations Manager foundation:

1. Install and configure the Healthwatch and Healthwatch Exporter for TKGI tiles on each Tanzu
Operations Manager foundation you want to monitor. To install and configure the Healthwatch and
Healthwatch Exporter for TKGI tiles, see the following topics:

Installing a Tile Manually or


Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

Configuring Healthwatch

Configuring Healthwatch Exporter for TKGI

2. Install and configure the Healthwatch tile on your monitoring Tanzu Operations Manager foundation.
To install and configure the Healthwatch tile, see the following topics:

Installing a Tile Manually or


Installing, Configuring, and Deploying a Tile Through an Automated Pipeline

Configuring Healthwatch

3. Configure TKGI cluster discovery in the Healthwatch tile on each Tanzu Operations Manager
foundation you want to monitor. Do not configure TKGI cluster discovery in the Healthwatch tile on
your monitoring foundation. To configure TKGI cluster discovery on the Tanzu Operations Manager
foundations you want to monitor, see Configuring TKGI Cluster Discovery.

4. In the Healthwatch tile on your monitoring Tanzu Operations Manager foundation, configure scrape
jobs for the Prometheus instances in the Healthwatch tiles on the Tanzu Operations Manager
foundations you want to monitor. To configure these scrape jobs, see Configure Scrape Jobs.

73
Healthwatch for VMware Tanzu

5. Test your federation configuration. See Test your Federation Configuration.

Configure Scrape Jobs


To configure the Prometheus instance in the Healthwatch tile on your monitoring Tanzu Operations Manager
foundation to scrape metrics from the Prometheus instances in the Healthwatch tiles on the Tanzu
Operations Manager foundations you want to monitor:

1. For each Tanzu Operations Manager foundation you want to monitor, open port 4450 for the
Prometheus instance in the Healthwatch tile in the user console for your IaaS. For more
information, see the documentation for your IaaS.

2. For each Tanzu Operations Manager foundation you want to monitor:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select the Credentials tab.

4. In the Promxy Client Mtls row of the TSDB section, click Link to Credential.

5. Record the values of private_key_pem and cert_pem. These values are the private key
and certificate for Promxy Client mTLS.

The values of private_key_pem and cert_pem are in JSON format and


contain several \n markers. Ensure that you convert all \n markers into
newlines before you use these values in an upcoming step.

6. Retrieve the certificate for the Tanzu Operations Manager root certificate authority (CA) of
the Tanzu Operations Manager foundation you want to monitor. For more information, see
the Tanzu Operations Manager documentation.

7. Go to the Tanzu Ops Manager Installation Dashboard for your monitoring Tanzu Operations
Manager foundation.

8. Click the Healthwatch tile.

9. Select Prometheus.

10. Under Additional scrape jobs, click Add.

11. For Scrape job configuration parameters, provide, in YAML format, the configuration
parameters for a scrape job for the Prometheus instance in the Healthwatch tile on the
Tanzu Operations Manager foundation you want to monitor. In the example below, the
scrape job federates all metrics with names that match the regular expression
^metric_name_regex.* from the Prometheus instance at the IP address listed under the
targets property:

job_name: example-job-name
scheme: https
metrics_path: '/federate'
params:
'match[]':
- '{__name__=~"^metric_name_regex.*"}'

74
Healthwatch for VMware Tanzu

static_configs:
- targets:
- 'source-tsdb-1:4450'
- 'source-tsdb-2:4450'

If you have configured a load balancer or DNS entry for the Prometheus
instance, include the IP address for your load balancer or DNS entry in
each target listed under the targets property instead of the IP address for
the Prometheus instance.

12. For Certificate and private key for TLS, enter the certificate and private key you recorded
from the Promxy Client mTLS row in the Credentials tab in the Healthwatch tile installed
on the Tanzu Operations Manager foundation you want to monitor in a previous step.

13. For CA certificate for TLS, enter the Tanzu Operations Manager root CA certificate for the
Tanzu Operations Manager foundation you want to monitor that you recorded in a previous
step.

14. For Target server name, enter promxy.

15. Click Save.

16. Test your federation configuration. See Test your Federation Configuration.

Automating the Healthwatch tile configuration


If you are using the om CLI to configure the Healthwatch tile, the example below shows how you would enter
the example configuration parameters described earlier in an automation script:

product-properties:
.properties.scrape_configs:
value:
- ca: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
scrape_job: |
job_name: example-job-name
scheme: https
metrics_path: '/federate'
params:
'match[]':
- '{__name__=~"^my_metric_name_regex.*"}'
static_configs:
- targets:
- 'source-prometheus-1:4450'
server_name: promxy
tls_certificates:
cert_pem: |
-----BEGIN CERTIFICATE-----
SECRET
-----END CERTIFICATE-----
private_key_pem: |
-----BEGIN RSA PRIVATE KEY-----

75
Healthwatch for VMware Tanzu

SECRET
-----END RSA PRIVATE KEY-----

For more information, see Configure and Deploy your Tile Using the om CLI in Installing, Configuring, and
Deploying a Tile Through an Automated Pipeline.

For more information about configuring scrape jobs, see:

Configure Prometheus

the Prometheus documentation

Test your Federation Configuration


To confirm that your federation configuration is working correctly:

1. In your web browser, navigate to the Grafana UI.

2. Log in to the Grafana UI.

3. On the left side of the Grafana UI homepage, click the Explore icon. An empty Explore tab
appears.

4. In the query field to the right of the Metrics browser menu tab, enter up.

5. Click Run query.

6. Under Table, review the query results. If your federation configuration is working, the job column
includes the job_name from the scrape jobs you configured for each Tanzu Operations Manager
foundation you monitor in Configure Federation.

Federation for a High Availability Healthwatch Deployment


In an High Availability (HA) Healthwatch deployment, each VM in the Prometheus instance in the
Healthwatch tile scrapes the same data from the metric exporter VMs that the Healthwatch Exporter tiles
deploy.

There are two ways to do this, each with its own pros and cons:

When federating metrics, you can configure the Prometheus instance in the Healthwatch tile on
your monitoring Tanzu Operations Manager foundation to scrape both copies of that data from the
Prometheus instance in the Healthwatch tile on each Tanzu Operations Manager foundation you
monitor.
To do this, include both VMs in each Prometheus instance from the Tanzu Operations Manager
foundations you want to monitor in the scrape job configuration parameters. While including both
VMs creates duplicate sets of metrics, it also ensures that you do not lose metrics data if one of
the VMs goes down. However, doubling the number of metrics that the Prometheus instance
collects also negatively affects the performance of the Prometheus instance.

Alternatively, you can create load balancers or DNS entries in your IaaS user console for the
Prometheus instances on each Tanzu Operations Manager foundation you monitor, then include the
IP addresses for each load balancer or DNS entry in the targets listed under the targets property
in your scrape job configuration parameters. For more information, see Configure Scrape Jobs.

76
Healthwatch for VMware Tanzu

In both cases, VMware recommends configuring static IP addresses for both VMs in each of the
Prometheus instances. For more information about configuring static IP addresses for Prometheus
instances, see Configure Prometheus in Configuring Healthwatch.

Configuring TKGI Cluster Discovery


This topic describes how to configure VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) cluster
discovery in Healthwatch for VMware Tanzu.

Overview of TKGI Cluster Discovery


In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure the Prometheus instance in the
Healthwatch tile to detect on-demand Kubernetes clusters created through the TKGI API and create scrape
jobs for them. You only need to configure this pane if you have Tanzu Operations Manager foundations with
TKGI installed.

The Prometheus instance detects and scrapes TKGI clusters by connecting to the Kubernetes API through
the TKGI API using a UAA client. To allow this, you must configure the Healthwatch tile, the Prometheus
instance in the Healthwatch tile, the UAA client that the Prometheus instance uses to connect to the TKGI
API, and the TKGI tile.

To configure TKGI cluster discovery:

1. Configure the TKGI Cluster Discovery pane in the Healthwatch tile. For more information, see
Configure TKGI Cluster Discovery in Healthwatch below.

2. Configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters. For more
information, see Configure TKGI below.

If TKGI cluster discovery fails after you have completed both parts of the procedure in this topic, see
Troubleshooting TKGI Cluster Discovery Failure below.

To collect additional BOSH system metrics related to TKGI and view them in the Grafana
UI, you must install and configure the Healthwatch Exporter for TKGI on your Tanzu
Operations Manager foundations with TKGI installed. To install the Healthwatch Exporter
for TKGI tile, see Installing a Tile Manually. To configure the Healthwatch Exporter for
TKGI tile, see Configuring Healthwatch Exporter for TKGI.

Configure TKGI Cluster Discovery in Healthwatch


In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure TKGI cluster discovery,
including the UAA client that the Prometheus instance uses to connect to the Kubernetes API through the
TKGI API.

To configure the TKGI Cluster Discovery pane:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select TKGI Cluster Discovery.

77
Healthwatch for VMware Tanzu

4. Under TKGI cluster discovery, select one of the following options:

On: This option allows TKGI cluster discovery and reveals the configuration fields
described in the steps below. TKGI cluster discovery is allowed by default when TKGI is
installed on your Tanzu Operations Manager foundation.

Off: This option disallows TKGI cluster discovery.

5. For Discovery interval, enter in seconds how frequently you want the Prometheus instance
detects and scrapes TKGI clusters. The minimum value is 60.

6. (Optional) To allow the Prometheus instance to communicate with the TKGI API over TLS,
configure one of the following options:

To configure the Prometheus instance to use a self-signed CA certificate or a certificate


that is signed by a self-signed CA certificate when communicating with the TKGI API over
TLS, provide the certificate for the CA in CA certificate for TLS. If you provide a self-
signed CA certificate, it must be for the same CA that signs the certificate in the TKGI API.
If the Prometheus instance uses certificates signed by a trusted third-party CA or the Skip
TLS certificate verification checkbox is selected, do not configure this field.

If you do not provide a self-signed CA certificate or a certificate that is signed by a self-


signed CA certificate, you can select the Skip TLS certificate verification checkbox.
When this checkbox is selected, the Prometheus instance does not verify the identity of
the TKGI API. This checkbox is unselected by default. VMware does not recommend
skipping TLS certificate verification in a production environment.

7. Click Save.

Configure TKGI
After you configure TKGI cluster discovery in the Healthwatch tile, you must configure TKGI to allow the
Prometheus instance to scrape metrics from TKGI clusters.

To configure TKGI:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Click the Tanzu Kubernetes Grid Integrated Edition tile.

3. Select Host Monitoring.

4. Under Enable Telegraf Outputs?, select Yes.

5. Select the Include etcd metrics checkbox to allow TKGI to send etcd server and debugging
metrics to Healthwatch.

6. Select the Include Kubernetes Controller Manager metrics checkbox to allow TKGI to send
Kubernetes Controller Manager metrics to Healthwatch.

7. If you are using TKGI v1.14.2 or later, select the Include Kubernetes Scheduler metrics
checkbox to allow TKGI to send Kubernetes Scheduler metrics to Healthwatch.

8. For Setup Telegraf Outputs, provide the following TOML configuration file:

[[outputs.prometheus_client]]
listen = ":10200"

78
Healthwatch for VMware Tanzu

metric_version = 2

You must use 10200 as the listening port to allow the Prometheus instance to scrape Telegraf
metrics from your TKGI clusters. For more information about creating a configuration file in TKGI,
see the TKGI documentation.

If you are configuring TKGI v1.12 or earlier, remove metric_version = 2 from the
TOML configuration file. TKGI v1.12 and earlier are out of support. Consider
upgrading to at least v1.17, which is currently the oldest supported version.

9. Click Save.

10. For each plan you want to monitor:

1. Select the plan you want to monitor. For example, Plan 2.

2. For (Optional) Add-ons - Use with caution, enter the following YAML snippet to create
the roles required to allow the Prometheus instance to scrape metrics from your TKGI
clusters:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: healthwatch
rules:
- resources:
- pods/proxy
- pods
- nodes
- nodes/proxy
- namespace/pods
- endpoints
- services
verbs:
- get
- watch
- list
apiGroups:
- ""
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: healthwatch
roleRef:
apiGroup: ""
kind: ClusterRole
name: healthwatch
subjects:
- apiGroup: ""
kind: User
name: healthwatch

79
Healthwatch for VMware Tanzu

If (Optional) Add-ons - Use with caution already contains other API resource definitions,
append the above YAML snippet to the end of the existing resource definitions, followed by
a newline character.

11. Click Save.

12. Select Errands.

13. Ensure that the Upgrade all clusters errand is running. Running this errand configures your TKGI
clusters with the roles you created in the (Optional) Add-ons - Use with caution field of the plans
you monitor in a previous step.

14. Click Save.

Troubleshooting TKGI Cluster Discovery Failure


TKGI cluster discovery can fail if the Prometheus instance fails to scrape metrics from your TKGI clusters.
To troubleshoot TKGI cluster discovery failure, see Troubleshooting Missing TKGI Cluster Metrics in
Troubleshooting Healthwatch.

Configuring DNS for the Grafana Instance


This topic describes how to configure a DNS entry for the Grafana instance in your Healthwatch for VMware
Tanzu installation.

Overview of DNS for the Grafana instance


Configuring a DNS entry for the Grafana instance allows users to access the Grafana UI more easily from
outside your BOSH network, including from links to the Grafana UI in alerts from Alertmanager.

On a Tanzu Operations Manager foundation that has VMware Tanzu Platform for Cloud Foundry installed,
Healthwatch can automatically configure a DNS record for the Grafana VM with the Gorouter in Tanzu
Platform for CF. For more information, see (Optional) Configure Grafana in Configuring Healthwatch.

On an Tanzu Operations Manager foundation that does not have Tanzu Platform for CF installed, you must
manually configure a DNS entry for either the public IP address of a single Grafana VM or the load balancer
in front of the Grafana instance.

Configure DNS for a Load Balancer


VMware recommends that you create a DNS entry that points to the load balancer associated with the
Grafana instance, especially if you want to make the Grafana instance highly available (HA).

To configure a DNS entry for the load balancer associated with the Grafana instance:

1. Ensure that you have associated a load balancer with the Grafana instance. For more information,
see Configure Resources in Configuring Healthwatch.

2. Find the public IP address for your load balancer. You may need to assign a public or elastic IP
address to your load balancer if it does not already have one. For more information, see the
documentation for your IaaS:

80
Healthwatch for VMware Tanzu

AWS: If your Tanzu Operations Manager deployment is on AWS, skip this step. You do not
need the public IP address of your load balancer for Grafana to configure a DNS entry in
your Amazon DNS.

Azure: For more information about finding the public IP address of your Azure load
balancer, see the Azure documentation.

GCP: For more information about finding the public IP address of your GCP load balancer,
see the GCP documentation.

OpenStack: For more information about assigning a floating IP address to your OpenStack
load balancer, see the OpenStack documentation.

vSphere: For more information about finding the public IP address of your vSphere load
balancer, see the vSphere documentation.

3. Create an A record in your DNS server named grafana that points to the public IP address of the
load balancer that you recorded in the previous step. For more information, see the documentation
for your IaaS:

AWS: For more information about configuring a DNS entry in the Amazon VPC console,
see the AWS documentation.

Azure: For more information about configuring an A record in Azure DNS, see the Azure
documentation.

GCP: For more information about adding an A record to Cloud DNS, see the GCP
documentation.

OpenStack: For more information about configuring a DNS entry in the OpenStack internal
DNS, see the OpenStack documentation.

vSphere: For more information about configuring a DNS entry in the vCenter Server
Appliance, see the vSphere documentation.

4. Wait for your DNS server to update.

5. Ensure that the Grafana UI login page appears as expected by navigating to the URL that you
configured in your DNS entry in a web browser. When you see the Grafana UI login page, you have
successfully created a DNS entry.

Configure DNS for a Single Grafana VM


To configure a DNS entry for a single Grafana VM:

1. In the user console for your IaaS, find the public IP address for the Grafana VM. For more
information, see the documentation for your IaaS:

AWS: To find the public IP address of a Linux instance, see the AWS documentation for
Linux instances of Amazon EC2. To find the public IP address for a Windows instance, see
the AWS documentation for Windows instances of Amazon EC2.

Azure: To create or view the public IP address for an Azure VM, see the Azure
documentation.

GCP: To find the public IP address for a GCP VM, see the GCP documentation.

81
Healthwatch for VMware Tanzu

OpenStack: To associate a floating IP address to an OpenStack VM, see the OpenStack


documentation.

vSphere: To find the public IP address of a vSphere VM, see the vSphere documentation.

2. Record the public IP address of the Grafana VM.

3. Create an A record in your DNS server named grafana that points to the public IP address of the
Grafana VM that you recorded in the previous step. For more information, see the documentation
for your IaaS:

AWS: For more information about configuring a DNS entry in the Amazon VPC console,
see the AWS documentation.

Azure: For more information about configuring an A record in Azure DNS, see the Azure
documentation.

GCP: For more information about adding an A record to Cloud DNS, see the GCP
documentation.

OpenStack: For more information about configuring a DNS entry in the OpenStack internal
DNS, see the OpenStack documentation.

vSphere: For more information about configuring a DNS entry in the vCenter Server
Appliance, see the vSphere documentation.

4. Wait for your DNS server to update.

5. Ensure that the Grafana UI login page appears as expected by navigating to the URL that you
configured in your DNS entry in a web browser. When you see the Grafana UI login page, you have
successfully created a DNS entry.

Creating a Firewall Policy for the Grafana Instance


This topic describes how to create a firewall policy for the Grafana instance in your Healthwatch for VMware
Tanzu installation.

Overview of Firewall Policies for the Grafana Instance


In the Healthwatch tile, allowing external access to individual VMs is disallowed by default. Creating a
firewall policy for the Grafana instance allows users to access the Grafana UI more easily from outside your
BOSH network, including from the links to the Grafana UI that Alertmanager provides in alert messages.

You create firewall policies in the console for your Tanzu Operations Manager deployment’s IaaS. To create
a firewall policy for the Grafana instance, see the section for your IaaS:

AWS: Create a Firewall Policy in AWS

Azure: Create a Firewall Policy in Azure

GCP: Create a Firewall Policy in GCP

vSphere: Create a Firewall Policy in vSphere NSX-V

Create a Firewall Policy in AWS

82
Healthwatch for VMware Tanzu

To create a firewall policy in AWS:

1. Log in to the Amazon EC2 dashboard.

2. Select Security Group.

3. Click Create Security Group.

4. For Security group name, enter the name you want to give the security group. For example,
grafana-port-access.

5. For Description, enter a description for your security group.

6. For VPC, select from the dropdown the VPC where the Grafana instance is deployed.

7. Select the Inbound tab.

8. To create the first rule:

1. Click Add rule.

2. For Type, select HTTPS from the dropdown.

3. For Protocol, select TCP from the dropdown.

4. For Port Range, enter 443.

5. For Source, enter 0.0.0.0/0.

6. Click Save rules.

9. To create the second rule:

1. Click Add rule.

2. For Type, select HTTP from the dropdown.

3. For Protocol, select TCP from the dropdown.

4. For Port Range, enter 80.

5. For Source, enter 0.0.0.0/0.

6. Click Save rules.

10. Click Create.

11. Select Instances.

12. Click the Grafana instance.

13. Click Actions.

14. Under Security, click Change security groups.

15. Select the checkbox next to the security group you created for the Grafana instance.

16. Click Add security group.

17. Click Save.

For more information about creating a firewall policy in AWS for a Linux instance, see the AWS
documentation for Linux instances of Amazon EC2. For more information about creating a firewall policy in
AWS for a Windows instance, see the AWS documentation for Windows instances of Amazon EC2.

83
Healthwatch for VMware Tanzu

Create a Firewall Policy in Azure


To create a firewall policy in Azure:

1. Log in to the Azure portal.

2. Select Resource groups.

3. Click Add.

4. Create a resource group for the Grafana instance. For more information, see the Azure
documentation.

5. Select the Network rule collection tab.

6. Click Add network rule collection.

7. For Name, enter the name you want to give the rule collection. For example, grafana-port-
access.

8. For Priority, enter 1000.

9. For Action, select Allow.

10. Click Rules.

11. Under IP addresses, configure the following fields for your first rule:

1. For Name, enter a name for the first rule.

2. For Protocol, select TCP from the dropdown.

3. For Source type, select IP address from the dropdown.

4. For Source, enter (*).

5. For Destination type, select IP address from the dropdown.

6. For Destination address, enter the public IP address of the Grafana instance or the load
balancer for the Grafana instance.

7. For Destination Ports, enter 443.

12. Under IP addresses, configure the following fields for your second rule:

1. For Name, enter a name for the second rule.

2. For Protocol, select TCP from the dropdown.

3. For Source type, select IP address from the dropdown.

4. For Source, enter (*).

5. For Destination type, select IP address from the dropdown.

6. For Destination address, enter the public IP address of the Grafana instance or the load
balancer for the Grafana instance.

7. For Destination Ports, enter 80.

13. Click Add.

14. Click Review + create.

84
Healthwatch for VMware Tanzu

15. Click Save.

For more information about creating a firewall policy in Azure, see the Azure documentation.

Create a Firewall Policy in GCP


To create a firewall policy in GCP:

1. Log in to the Google Cloud console.

2. Under VPC, select Firewall.

3. To create the first rule:

1. Click Create firewall rule.

2. For Name, enter a name for the first rule.

3. For Network, select from the dropdown the network where the Grafana instance is
deployed.

4. For Priority, enter 1000.

5. For Target tags, enter grafana.

6. For Source IP ranges, enter 0.0.0.0/0.

7. Under Protocols and ports, select Specified protocols and ports.

8. Select the tcp checkbox.

9. For tcp, enter 443.

10. Click Create.

4. To create the second rule:

1. Click Create firewall rule.

2. For Name, enter a name for the second rule.

3. For Network, select from the dropdown the network where the Grafana instance is
deployed.

4. For Priority, enter 1000.

5. For Target tags, enter grafana.

6. For Source IP ranges, enter 0.0.0.0/0.

7. Under Protocols and ports, select Specified protocols and ports.

8. Select the tcp checkbox.

9. For tcp, enter 80.

10. Click Create.

For more information about creating a firewall policy in GCP, see the GCP documentation.

Create a Firewall Policy in vSphere NSX-V

85
Healthwatch for VMware Tanzu

To create a firewall policy in vSphere NSX-V:

1. Log in to vSphere.

2. Click Networking & Security.

3. Select NSX Edges.

4. Double-click the Edge for your Tanzu Platform for CF deployment.

5. Select Manage.

6. Select Firewall.

7. To create the first rule:

1. Click the Add icon.

2. For Name, enter a name for the first rule.

3. For Source, select Any.

4. For Destination, enter the public IP address for the Grafana instance or the load balancer
for the Grafana instance.

5. For Service, select Any.

8. To create the second rule:

1. Click the Add icon.

2. For Name, enter a name for the first rule.

3. For Source, select Any.

4. For Destination, enter the public IP address for the Grafana instance or the load balancer
for the Grafana instance.

5. For Service, select Any.

9. Click Publish Changes.

For more information about adding an NSX Edge firewall rule, see the vSphere documentation.

Configuring Grafana Authentication


This topic describes how to configure different authentication methods for users to log in to the Grafana UI
when using it with Healthwatch for VMware Tanzu.

Overview of Grafana Authentication


In the Grafana Authentication pane, you configure how users log in to the Grafana UI.

By default, users log in to the Grafana UI using basic authentication. With basic authentication, all users log
in to the Grafana UI using the username admin and the administrator login credentials found in the
Credentials tab of the Healthwatch for VMware Tanzu tile.

However, you can configure the Grafana UI to use another authentication method alongside or instead of
basic authentication. This allows users to use their own credentials to log in to the Grafana UI.

86
Healthwatch for VMware Tanzu

The sections in this topic describe how to configure different authentication methods for users to log in to
the Grafana UI:

Basic authentication. For more information, see Configuring Basic Authentication below.

Generic OAuth, UAA, or LDAP. For more information, see Configuring Other Authentication
Methods below.

The values that you configure in the Grafana Authentication pane also configure their corresponding
properties in the Grafana configuration file. For more information, see Overview of Configuration Files in
Healthwatch in Configuration File Reference Guide, Grafana Authentication in Configuration File Reference
Guide, and the Grafana documentation.

Configuring Basic Authentication


When you configure basic authentication for the Grafana UI, all users log in to the Grafana UI using the
username admin and the login password found in the Credentials tab of the Healthwatch tile. If you disallow
basic authentication, users cannot log in with administrator credentials.

To allow or disallow basic authentication, see one of the sections below:

Allow Basic Authentication

Disallow Basic Authentication

Allow Basic Authentication


To configure basic authentication and obtain the login password for the Grafana UI:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select Grafana Authentication.

4. Under Basic authentication, select Allow.

5. Click Save.

6. Select the Credentials tab.

7. In the Admin Login Password row of the Grafana section, click Link to Credential.

8. Record the value of password. This value is the password that all users must use to log in to the
Grafana UI.

Disallow Basic Authentication


To disallow basic authentication:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select Grafana Authentication.

4. Under Basic authentication, select Do not allow.

5. Click Save.

87
Healthwatch for VMware Tanzu

Configuring Other Authentication Methods


You can configure the Grafana UI to use another authentication method alongside or instead of basic
authentication. This allows users to use their own credentials to log in to the Grafana UI.

For example, if you want to limit the number of users who can log in to the Grafana UI with the Grafana
administrator credentials found in the Credentials tab of the Healthwatch tile, but still allow non-
administrator users to log in to the Grafana UI, you can configure one of the following authentication
methods in addition to basic authentication:

Generic OAuth. For more information, see Configure Generic OAuth Authentication below.

User Account and Authentication (UAA). For more information, see Configure UAA Authentication
below.

LDAP. For more information, see Configure LDAP Authentication below.

If you want to only allow users with Grafana administrator credentials to log in to the Grafana UI, you can
configure the Grafana UI to use only basic authentication. For more information, see Configure Only Basic
Authentication below.

Configure Generic OAuth Authentication


When you configure generic OAuth authentication for the Grafana UI, users can log in to the Grafana UI
with their credentials for the Grafana-supported OAuth provider of your choice, including the UAA instance
for a runtime on a different Tanzu Operations Manager foundation. For more information about configuring
generic OAuth authentication, see the Grafana documentation. For a list of Grafana-supported
authentication integrations, see the Grafana documentation.

To configure generic OAuth authentication for the Grafana UI:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select Grafana Authentication.

4. Under Additional authentication methods, select Generic OAuth.

5. For Provider name, enter the name of your OAuth provider.

6. For Client ID, enter the client ID of your OAuth provider. The method to retrieve this client ID differs
depending on your OAuth provider. To find the client ID of your OAuth provider, see the Grafana
documentation.

7. For Client secret, enter the client secret of your OAuth provider. The method to retrieve this client
secret differs depending on your OAuth provider. To find the client secret of your OAuth provider,
see the Grafana documentation.

8. For Scopes, enter a comma-separated list of scopes that the OAuth provider adds to the user’s
token when they log in to the Grafana UI. These scopes differ depending on your OAuth provider.
To find the scopes for your OAuth provider, see the Grafana documentation.

9. For Authorization URL, enter the authorization URL of the server for your OAuth provider.

10. For Token URL, enter the token URL of the server for your OAuth provider.

11. For API URL, enter the API URL of the server for your OAuth provider.

88
Healthwatch for VMware Tanzu

12. For Logout URL, enter the URL to which users are redirected after logging out of the Grafana UI.

13. (Optional) For Email attribute, enter the attribute that contains the email address of the user. For
more information, see the Grafana documentation.

14. For Grafana domain, enter the domain of the Grafana instance. You must configure this field if
your OAuth provider requires a callback URL that uses the root URL of the Grafana instance. If
your OAuth provider does not require a callback URL, do not configure this field.

15. (Optional) To allow new users to create a new Grafana account when they log in with their existing
OAuth credentials for the first time, select the Allow new accounts with existing OAuth
credentials checkbox. This checkbox is selected by default. Unselecting this checkbox prevents
users without a pre-existing Grafana account from creating a new Grafana account or logging in to
the Grafana UI with their existing OAuth credentials.

16. (Optional) For Allowed domains, enter a comma-separated list of domains. Configuring this field
limits Grafana UI access to users who belong to one or more of the listed domains.

17. For Allowed teams, enter a comma-separated list of teams. Configuring this field limits Grafana UI
access to users who belong to one or more of the listed teams. Configure this field if your OAuth
provider allows you to separate users into teams. If your OAuth provider does not allow you to
separate users into teams, do not configure this field.

18. For Allowed organizations, enter a comma-separated list of organizations. Configuring this field
limits Grafana UI access to users who belong to one or more of the listed organizations. Configure
this field if your OAuth provider allows you to separate users into organizations. If your OAuth
provider does not allow you to separate users into organizations, do not configure this field.

19. For Allowed groups, enter a comma-separated list of groups. Configuring this field limits Grafana
UI access to users who belong to one or more of the listed groups. Configure this field if your
OAuth provider allows you to separate users into groups. If your OAuth provider does not allow you
to separate users into groups, do not configure this field.

20. (Optional) For Role attribute path, enter a JMESPath string that maps users to Grafana roles. For
example, contains(scope[*], 'healthwatch.admin') && 'Admin' || contains(scope[*],
'healthwatch.edit') && 'Editor' || contains(scope[*], 'healthwatch.read') &&
'Viewer'.

21. (Optional) To prevent users who are not mapped to a valid Grafana role from accessing the Grafana
UI, select the Deny access to users without Grafana roles checkbox. This checkbox is
unselected by default. Unselecting this checkbox assigns the Viewer role to users who cannot be
not mapped to a valid Grafana role by the string configured in the Role attribute path field.

22. (Optional) To allow the Grafana instance to communicate with the server for your OAuth provider
over TLS:

1. For Certificate and private key for TLS, provide a certificate and private key for the
Grafana instance to use for TLS connections to the server for your OAuth provider.

2. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that the
server for your OAuth provider uses to verify TLS certificates.

3. If you do not provide a self-signed CA certificate or a certificate that is signed by a self-


signed CA certificate, select the Skip TLS certificate verification checkbox. When this

89
Healthwatch for VMware Tanzu

checkbox is selected, the Grafana instance does not verify the identity of the server for
your OAuth provider. This checkbox is unselected by default.

23. Click Save.

Configure UAA Authentication


When you configure UAA authentication for the Grafana UI, users can log in to the Grafana UI with their
UAA credentials.

Healthwatch can automatically configure authentication with the UAA instance of the runtime that is
installed on the same Tanzu Operations Manager foundation as the Healthwatch tile, either VMware Tanzu
Platform for Cloud Foundry or Tanzu Kubernetes Grid Integrated Edition (TKGI). If you want to configure
authentication with the UAA instance of Tanzu Platform for CF or TKGI installed on the same Tanzu
Operations Manager foundation, follow the procedure below. If you want to configure authentication with the
UAA instance of a runtime that is installed on a different Tanzu Operations Manager foundation, you must
manually configure it using the fields described in Configure Generic OAuth Authentication above.

To configure UAA authentication for the Grafana UI:

1. Under Additional authentication methods, select UAA.

If you are configuring authentication with the UAA instance for a TKGI deployment,
Healthwatch does not add the UAA administrator user account to the
healthwatch.admin group by default. If you want to log in to the Grafana UI using
the UAA administrator credentials, you must manually add the UAA administrator
user account to the healthwatch.admin group.

2. To assign Admin, Editor, or Viewer Grafana roles to a user:

1. Target the UAA server by running:

uaac target UAA-URL

Where UAA-URL is the URL of the UAA instance with which you want to configure
authentication. For UAA instances for Tanzu Platform for CF, this URL is usually
https://login.SYSTEM-DOMAIN, where SYSTEM-DOMAIN is the domain you configured in
the System domain field in the Domains pane of the Tanzu Platform for Cloud Foundry
tile. For TKGI, this URL is usually https://TKGI-API-URL:8443, where TKGI-API-URL is
the URL of the TKGI API.

2. Assign Admin, Editor, or Viewer Grafana roles to a user by running:

uaac member add GROUP USERNAME

Where:

GROUP is either healthwatch.admin, healthwatch.edit, or healthwatch.read.


These groups map to the Admin, Editor, and Viewer Grafana roles, respectively.
For more information about the level of access each role provides, see the Grafana
documentation.

90
Healthwatch for VMware Tanzu

USERNAME is the username of the user to which you want to assign a Grafana role.

3. Click Save.

Configure LDAP Authentication


When you configure LDAP authentication for the Grafana UI, users can log in to the Grafana UI with their
LDAP credentials. You can also create mappings between LDAP group memberships and Grafana org user
roles. For more information about configuring LDAP authentication, see the Grafana documentation.

To configure LDAP authentication for the Grafana UI:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select Grafana Authentication.

4. Under Additional authentication methods, select LDAP.

5. For Host address, enter the network address of your LDAP server host.

6. (Optional) For Port, enter the port for your LDAP server host. The default port is 389 when the Use
TLS checkbox is selected, or 636 when the Use TLS checkbox is unselected.

7. (Optional) To allow new users to create a new Grafana account when they log in with their existing
LDAP credentials for the first time, select the Allow new accounts with existing LDAP
credentials checkbox. This checkbox is selected by default. Unselecting this checkbox prevents
users without a pre-existing Grafana account from creating a new Grafana account or logging in to
the Grafana UI with their existing LDAP credentials.

8. (Optional) To allow the LDAP server to communicate with the Grafana instance over TLS when
authenticating user credentials, select the Use TLS checkbox. This checkbox is unselected by
default.

9. (Optional) To allow the LDAP server to run the STARTTLS command when communicating with the
Grafana instance over TLS, select the Use STARTTLS checkbox. This checkbox is unselected by
default.

10. (Optional) To allow the Grafana instance to skip TLS certificate verification when communicating
with the LDAP server over TLS, select the Skip TLS certificate verification checkbox. This
checkbox is unselected by default.

11. (Optional) For Bind DN, enter the distinguished name (DN) for binding to the LDAP server. For
example, cn=admin,dc=grafana,dc=org.

12. (Optional) For Bind password, enter the password for binding to the LDAP server. For example,
grafana.

13. (Optional) For User search filter, enter a regex string that defines LDAP user search criteria. For
example, (cn=%s).

14. (Optional) For Search base DNs, enter an array of base DNs in the LDAP directory tree from which
any LDAP user search begins. The typical LDAP search base matches your domain name. For
example, dc=grafana,dc=org.

91
Healthwatch for VMware Tanzu

15. (Optional) For Group search filter, enter a regex string that defines LDAP group search criteria.
For example, (&(objectClass=posixGroup)(memberUid=%s)).

16. (Optional) For Group search base DNs, enter an array of base DNs in the LDAP directory tree from
which any LDAP group search begins. For example, ou=groups,dc=grafana,dc=org.

17. (Optional) For Group search filter user attribute, enter a value that defines which user attribute is
substituted for %s in the regex string you entered for Group search filter. You can use the value of
any property listed in Server attributes below. The default value is the value of the username
property.

18. (Optional) For Server attributes, enter in TOML format tables of the LDAP attributes that your
LDAP server uses. Each table must use the table name [servers.attributes].

19. (Optional) For Server group mappings, enter in TOML format an array of tables of LDAP groups
mapped to Grafana orgs and roles. Each table must use the table name
[[servers.group_mappings]]. For more information, see the Grafana documentation.

20. (Optional) To allow the Grafana instance to communicate with your LDAP server over TLS:

1. For Certificate and private key for TLS, provide a certificate and private key for the
Grafana instance to use for TLS connections to your LDAP server.

2. For CA certificate for TLS, provide a certificate for the CA that your LDAP server uses to
verify TLS certificates.

21. Click Save.

Optional Configuration
The topics in this section describe how to configure optional features in the Healthwatch for VMware Tanzu,
Healthwatch Exporter for VMware Tanzu Platform for Cloud Foundry, and Healthwatch Exporter for VMware
Tanzu Kubernetes Grid Integrated Edition (TKGI) tiles:

Configuring Alerting

Monitoring Certificate Expiration

Configuring Authentication with a UAA Instance on a Different Tanzu Operations Manager


Foundation

Configuring Alerting
This topic explains how to configure alerting in Healthwatch for VMware Tanzu.

Overview
In Healthwatch, you can configure the Prometheus instance to send alerts to Alertmanager according to
alerting rules you configure. Alertmanager then manages those alerts by removing duplicate alerts, grouping
alerts together, and routing those groups to alert receiver integrations such as email, PagerDuty, or Slack.
Alertmanager also silences and inhibits alerts according to the alerting rules you configure.

For more information, see the Prometheus documentation.

92
Healthwatch for VMware Tanzu

Configure Alerting
In the Alertmanager pane, you configure alerting rules, routing rules, and alert receivers for Alertmanager to
use.

The values that you configure in the Alertmanager pane also configure their corresponding properties in the
Alertmanager configuration file. For more information, see Overview of Configuration Files in Healthwatch in
Configuration File Reference Guide, Configuring the Alertmanager Configuration File in Configuration File
Reference Guide, and the Prometheus documentation.

To configure alerting through the Alertmanager pane:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select Alertmanager.

4. For Alerting rules, provide in YAML format the rule statements that define which alerts
Alertmanager sends to your alert receivers:

1. The following YAML files contain alerting rules for VMware Tanzu Platform for Cloud
Foundry and VMware Tanzu Kubernetes Grid Integrated Edition (TKGI). Choose the YAML
file below that corresponds to your runtime and replace OPS_MANAGER_URL with the fully-
qualified domain name (FQDN) of your Tanzu Operations Manager deployment:
Tanzu Platform for Cloud Foundry

TKGI

2. Modify the YAML file according to the observability requirements for your Tanzu Operations
Manager foundation.

3. Paste the contents of the YAML file into Alerting rules.

For more information about rule statements for Alertmanager, see the Prometheus
documentation.

5. For Routing rules, provide in YAML format the route block that defines where Alertmanager sends
alerts, how frequently Alertmanager sends alerts, and how Alertmanager groups alerts together. The
following example shows a possible set of routing rules:

receiver: 'example-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [cluster, alertname]

In new installations of Healthwatch v2.2, the Routing rules field is pre-


configured with a default set of routing rules. You can edit these routing
rules according to the needs of your deployment.

group_by gathers all alerts with the same label into a single alert. For
example, including cluster in the group_by property groups together all
alerts from the same cluster. You can see the labels for the metrics that

93
Healthwatch for VMware Tanzu

Healthwatch collects, such as cluster, index, deployment, and origin,


within the braces at the end of each metric.

You must define all route configuration parameters. For more information about the parameters you
must provide, see the Prometheus documentation.

6. (Optional) For Inhibit rules, provide in YAML format the rule statements that define which alerts
Alertmanager does not send to your alert receivers. For more information, see the Prometheus
documentation.

7. Configure the alert receivers that you specified in Routing rules in a previous step. For more
information, see Configure Alert Receivers below.

Configure Alert Receivers


You can configure email, PagerDuty, Slack, and webhook alert receivers in the Healthwatch tile. For more
information, see the Prometheus documentation.

You can also configure custom alert receiver integrations that are not natively supported by Alertmanager
through webhook receivers. For more information about configuring custom alert receiver integrations, see
the Prometheus documentation.

If you configure two or more alert receivers with the same name, Alertmanager merges them into a single
alert receiver. For more information, see Combining Alert Receivers below.

The following sections describe how to configure each type of alert receiver:

Configure an Email Alert Receiver

Configure a PagerDuty Alert Receiver

Configure a Slack Alert Receiver

Configure a Webhook Alert Receiver

If you want to provide authentication and TLS communication settings for your alert
receivers, you must provide them in the associated alert receiver configuration fields
described in the sections below. If the base configuration YAML for your alert receivers
include fields for authentication and TLS communication settings, do not include them
when you provide the configuration YAML for your alert receivers in the Alert receiver
configuration parameters fields.

Configure an Email Alert Receiver


To configure an email alert receiver:

1. Under Email alert receivers, click Add.

2. For Alert receiver name, enter the name you want to give your email alert receiver. The name you
enter in this field must match the name you specified in the route block you entered in the Routing
rules field in Configure Alerting above.

3. For Alert receiver configuration parameters, provide the configuration parameters for your email
alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a

94
Healthwatch for VMware Tanzu

set of required configuration parameters:

to: 'operator1@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587

At minimum, your configuration parameters must include the to, from, and smarthost properties.
The other properties you must include depend on both the SMTP server for which you are
configuring an alert receiver and the needs of your Tanzu Operations Manager foundation. For more
information about the properties you can include in this configuration, see the Prometheus
documentation.

If you exclude the html and headers properties or leave them blank,
Healthwatch automatically populates them with a default template. To view
the default email template for Healthwatch, see email_template.yml on
GitHub.

Do not include the auth_password property, the auth_secret property, or


the <tls_config> section in the configuration parameters for your email
alert receiver. You can configure these properties in the next steps of this
procedure.

4. (Optional) To configure SMTP authentication between Alertmanager and your email alert receiver,
configure one of the following fields:

If your SMTP server uses basic authentication, enter the authentication password for your
SMTP server in SMTP server authentication password.

If your SMTP server uses CRAM_MD5 authentication, enter the authentication secret for
your SMTP server in SMTP server authentication secret.

5. (Optional) To allow Alertmanager to communicate with your email alert receiver over TLS, configure
the following fields:

1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to your SMTP server.

2. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that your
SMTP server uses to verify TLS certificates.

3. For SMTP server name, enter the name of the SMTP server as it appears on the server’s
TLS certificate.

4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of your SMTP server. This checkbox is
unselected by default.

For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.

95
Healthwatch for VMware Tanzu

Configure a PagerDuty Alert Receiver


To configure a PagerDuty alert receiver:

1. Under PagerDuty alert receivers, click Add.

2. For Alert receiver name, enter the name you want to give your PagerDuty alert receiver. The name
you enter in this field must match the name you specified in the route block you entered in the
Routing rules field in Configure Alerting above.

3. For Alert receiver configuration parameters, provide the configuration parameters for your
PagerDuty alert receiver in YAML format. Do not prefix the YAML with a dash. The following
example shows a possible set of configuration parameters:

url: https://api.pagerduty.com/api/v2/alerts
client: '{{ template "pagerduty.example.client" . }}'
client_url: '{{ template "pagerduty.example.clientURL" . }}'
description: '{{ template "pagerduty.example.description" .}}'
severity: 'error'

The properties you must include depend on both the PagerDuty instance for which you are
configuring an alert receiver and the needs of your Tanzu Operations Manager foundation. For more
information about the properties you can include in this configuration, see the Prometheus
documentation.

If you exclude the description, details, or links properties or leave


them blank, Healthwatch automatically populates them with a default
template. To view the default PagerDuty template for Healthwatch, see
pagerduty_template.yml on GitHub.

Do not include the routing_key property, the service_key property, the


<http_config> section, or the <tls_config> section in the configuration
parameters for your PagerDuty alert receiver. You can configure these
properties in the next steps of this procedure.

4. Enter your PagerDuty integration key in one of the following fields:

If you selected Events API v2 as your integration type in PagerDuty, enter your PagerDuty
integration key in Routing key.

If you selected Prometheus as your integration type in PagerDuty, enter your PagerDuty
integration key in Service key.

5. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the PagerDuty
API, configure one of the following options:

To configure the HTTP client to authenticate the PagerDuty API using basic authentication,
enter the username and password associated with the HTTP client in Basic authentication
credentials.

To configure the HTTP client to authenticate the PagerDuty API using a bearer token, enter
the bearer token associated with the HTTP client in Bearer token.

96
Healthwatch for VMware Tanzu

For more information about configuring an HTTP client for Alertmanager, see the
Prometheus documentation.

6. (Optional) To allow Alertmanager to communicate with your PagerDuty alert receiver over TLS,
configure the following fields:

1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to the PagerDuty API server.

2. For CA certificate for TLS, provide a certificate for the CA that the PagerDuty API server
uses to verify TLS certificates.

3. For PagerDuty server name, enter the name of the PagerDuty API server as it appears on
the server’s TLS certificate.

4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of the PagerDuty API server. This checkbox is
unselected by default.

For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.

Configure a Slack Alert Receiver


To configure a Slack alert receiver:

1. Under Slack alert receivers, click Add.

2. For Alert receiver name, enter the name you want to give your Slack alert receiver. The name you
enter in this field must match the name you specified in the route block you entered in the Routing
rules field in Configure Alerting above.

3. (Optional) For Alert receiver configuration parameters, provide the configuration parameters for
your Slack alert receiver in YAML format. Do not prefix the YAML with a dash. The following
example shows a possible set of configuration parameters:

channel: '#operators'
username: 'Example Alerting Integration'

The properties you must include depend on both the Slack instance for which you are configuring
an alert receiver and the needs of your Tanzu Operations Manager foundation. For more information
about the properties you can include in this configuration, see the see the Prometheus
documentation.

If you exclude the title, title_link, or text properties or leave them


blank, Healthwatch automatically populates them with a default template.
To view the default PagerDuty template for Healthwatch, see
slack_template.yml on GitHub.

Do not include the api_url property, the api_url_file property, the


<http_config> section, or the <tls_config> section in the configuration

97
Healthwatch for VMware Tanzu

parameters for your Slack alert receiver. You can configure these
properties in the next steps of this procedure.

4. For Slack API URL, enter the webhook URL for your Slack instance from your Slack app directory.

5. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the server for
your Slack instance, configure one of the following options:

To configure the HTTP client to authenticate the server for your Slack instance using basic
authentication, enter the username and password associated with the HTTP client in Basic
authentication credentials.

To configure the HTTP client to authenticate the server for your Slack instance using a
bearer token, enter the bearer token associated with the HTTP client in Bearer token.

For more information about configuring an HTTP client for Alertmanager, see the
Prometheus documentation.

6. (Optional) To allow Alertmanager to communicate with your Slack alert receiver over TLS, configure
the following fields:

1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to the server for your Slack instance.

2. For CA certificate for TLS, provide a certificate for the CA that the server for your Slack
instance uses to verify TLS certificates.

3. For Slack server name, enter the name of the server for your Slack instance as it appears
on the server’s TLS certificate.

4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of the server for your Slack instance. This
checkbox is unselected by default.

For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.

Configure a Webhook Alert Receiver


To configure a webhook alert receiver:

1. Under Webhook alert receivers, click Add.

2. For Alert receiver name, enter the name you want to give your webhook alert receiver. The name
you enter in this field must match the name you specified in the route block you entered in the
Routing rules field in Configure Alerting above.

3. For Alert receiver configuration parameters, provide the configuration parameters for your
webhook alert receiver in YAML format. Do not prefix the YAML with a dash. The following example
shows a possible set of configuration parameters:

url: https://example.com/data/12345

98
Healthwatch for VMware Tanzu

max_alerts: 0

The properties you must include depend on both the webhook for which you are configuring an alert
receiver and the needs of your Tanzu Operations Manager foundation. For more information about
the properties you can include in this configuration, see the Prometheus documentation.

Do not include the <http_config> section or the <tls_config> section in


the configuration parameters for your webhook alert receiver. You can
configure these properties in the next steps of this procedure.

You can also configure custom alert receiver integrations that are not
natively supported by Alertmanager through webhook alert receivers. For
more information about configuring custom alert receiver integrations, see
the Prometheus documentation.

4. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the server that
processes your webhook, configure one of the following options:

To configure the HTTP client to authenticate the server that processes your webhook using
basic authentication, enter the username and password associated with the HTTP client in
Basic authentication credentials.

To configure the HTTP client to authenticate the server that processes your webhook using
a bearer token, enter the bearer token associated with the HTTP client in Bearer token.

For more information about configuring an HTTP client for Alermanager, see the
Prometheus documentation.

5. (Optional) To allow Alertmanager to communicate with your webhook alert receiver over TLS,
configure the following fields:

1. For Certificate and private key for TLS, provide a certificate and private key for
Alertmanager to use for TLS connections to the server that processes your webhook.

2. For CA certificate for TLS, provide a certificate for the CA that the server that processes
your webhook uses to verify TLS certificates.

3. For Webhook server name, enter the name of the server that processes your webhook as
it appears on the server’s TLS certificate.

4. If the certificate you provided in Certificate and private key for TLS is signed by a self-
signed CA certificate or a certificate that is signed by a self-signed CA certificate, select
the Skip TLS certificate verification checkbox. When this checkbox is selected,
Alertmanager does not verify the identity of the server that processes your webhook. This
checkbox is unselected by default.

For more information about configuring TLS communication for Alertmanager, see the
Prometheus documentation.

6. Click Save.

Combining Alert Receivers

99
Healthwatch for VMware Tanzu

If you configure two or more alert receivers with the same name, Alertmanager merges them into a single
alert receiver. For example, if you configure:

Two email receivers named “Foundation” with distinct email addresses

One PagerDuty receiver named “Foundation”

One email receiver named “Clusters”

Then Alertmanager merges them into the following alert receivers:

One alert receiver named “Foundation” containing two email configurations and a PagerDuty
configuration

One alert receiver named “Clusters” containing one email configuration

The example below shows how Alertmanager combines the alert receivers described above in its
configuration file:

receivers:
- name: 'Foundation'
email_configs:
- to: 'operator1@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnno
tations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
- to: 'operator2@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnno
tations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
pagerduty_configs:
- url: https://api.pagerduty.com/api/v2/alerts
client: '{{ template "pagerduty.example.client" . }}'
client_url: '{{ template "pagerduty.example.clientURL" . }}'
description: '{{ template "pagerduty.example.description" .}}'
severity: 'error'

- name: 'Clusters'
email_configs:
- to: 'operator1@example.org'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnno
tations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."

Silence Alerts
Alertmanager includes a command-line tool called amtool. You can use amtool to temporarily silence
Alertmanager alerts without modifying your alerting rules. For more information about how to use amtool,
see the Alertmanager documentation on GitHub.

100
Healthwatch for VMware Tanzu

You can also use the Alertmanager UI to view and silence alerts. To access the Alertmanager UI, see
Viewing the Alertmanager UI in Troubleshooting Healthwatch.

To silence alerts using amtool:

1. SSH into one of the Prometheus VMs deployed by the Healthwatch tile. Alertmanager replicates
any changes you make in one Prometheus VM to all other Prometheus VMs. To SSH into one of
the Prometheus VMs, see the Tanzu Operations Manager documentation.

2. Go to the amtool directory by running:

cd /var/vcap/jobs/alertmanager/packages/alertmanager/bin

3. View all of your currently running alerts by running:

amtool -o extended alert --alertmanager.url http://localhost:10401

This command returns a list of all currently running alerts that includes detailed information about
each alert, including the name of the alert and the Prometheus instance on which it runs.

You can also query the list of alerts by name and instance to view specific alerts.

To query alerts by name, run:

amtool -o extended alert query alertname="ALERT-NAME" --alertmanager.url


http://localhost:10401

Where ALERT-NAME is the name of the alert you want to silence. You can query the exact
name of the alert, or you can query a partial name and include the regular expression .* to
see all alerts that include the partial name, such as in the following example:

amtool -o extended alert query alertname=~"Test.*" --alertmanager.url htt


p://localhost:10401

To query alerts by instance, run:

amtool -o extended alert query instance=~".+INSTANCE-NUMBER" --alertmanag


er.url http://localhost:10401

Where INSTANCE-NUMBER is the number of the Prometheus instance for which you want to
silence alerts.

To query alerts by name and instance, run:

amtool -o extended alert query alertname=~"ALERT-NAME" instance=~".+INSTA


NCE-NUMBER" --alertmanager.url http://localhost:10401

Where:

ALERT-NAME is the name of the alert you want to silence.

INSTANCE-NUMBER is the number of the Prometheus instance for which you want to
silence an alert.

4. Run one of the following commands to silence either a specific alert or all alerts:

101
Healthwatch for VMware Tanzu

To silence a specific alert for a specified amount of time, run:

amtool silence add alertname=ALERT-NAME instance=~".+INSTANCE-NUMBER" --a


lertmanager.url http://localhost:10401

Where:

ALERT-NAME is the name of the alert you want to silence.

INSTANCE-NUMBER is the number of the Prometheus instance for which you want to
silence an alert.

To silence all alerts for a specified amount of time, run:

amtool silence add 'alertname=~.+' -d TIME-TO-SILENCE -c 'COMMENT' --aler


tmanager.url http://localhost:10401

Where:

TIME-TO-SILENCE is the amount of time in minutes or hours you want to silence


alerts. For example, 30m or 4h.

COMMENT is any notes about this silence you want to add.

~.+ is a regular expression that includes all alerts in the silence you set.

To silence an alert indefinitely, run:

amtool silence add ALERT-NAME -c 'COMMENT' --alertmanager.url http://loca


lhost:10401

Where: * ALERT-NAME is the name of the alert you want to silence. * COMMENT is a note
about why the alert is being silenced.

5. Record the ID string from the output. You can use this ID to unmute the alert

To unmute the alert, run:

amtool silence --alertmanager.url http://localhost:10401 expire SILENCED-


ID

Where: * SILENCED-ID is the recorded ID string.

6. If you would like to see what alerts are currently silenced:

Run:

amtool silence query

7. For more information, run amtool --help or see the Alertmanager documentation on GitHub.

Monitoring Certificate Expiration

102
Healthwatch for VMware Tanzu

This topic describes how to monitor the expiration of VMware Tanzu Operations Manager certificates using
metrics collected by the Healthwatch Exporter for VMware Tanzu Platform for Cloud Foundry and
Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) tiles.

Overview of Certificate Expiration Monitoring


The metrics in the Certificate Expiration dashboard in the Grafana UI show when Tanzu Operations
Manager certificates are due to expire. These certificates include the Tanzu Operations Manager root
certificate authority (CA) and CredHub-managed leaf certificates for product tiles and BOSH deployments.
For more information about these certificates, see the Tanzu Operations Manager documentation.

Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy the
certificate expiration metric exporter VM, cert-expiration-exporter. The certificate expiration metric
exporter VM uses the om CLI to send a GET request with the query parameter ?expires_within=1y to the
/api/v0/deployed/certificates Tanzu Operations Manager API endpoint. The Tanzu Operations
Manager API then returns the expiration dates of all certificates that are due to expire within the next year.
The Prometheus instance in your Healthwatch deployment scrapes the certificate expiration metrics from
the certificate expiration metric exporter VM and sends them to Grafana. For more information about the
/api/v0/deployed/certificates endpoint, see the Tanzu Operations Manager API documentation.

You cannot configure the certificate expiration metric exporter VM to specify a different
time period when it sends a GET request to the /api/v0/deployed/certificates
endpoint.

If your BOSH Director deployment uses custom CAs, you can configure them in the Trusted Certificates
field in the Security pane of the BOSH Director tile. Configuring custom CAs in the Trusted Certificates
field allows all BOSH-deployed components in your deployment to trust custom root certificates. For more
information about this field, see the Tanzu Operations Manager documentation.

If any CAs or leaf certificates for your Tanzu Operations Manager foundation are due to expire soon, rotate
them before they expire to avoid downtime for your foundation. To rotate CAs and leaf certificates, see the
Tanzu Operations Manager documentation.

Reserving a Static IP Address for the Certificate Expiration


Metric Exporter VM
You do not need to configure the certificate expiration metric exporter VM for it to collect certificate
expiration metrics. However, you can reserve a static IP address for the certificate expiration metric
exporter VM.

To configure a static IP address for the certificate expiration metric exporter VM, see the configuration topic
for your Healthwatch Exporter tile:

(Optional) Configure Tanzu Platform for Cloud Foundry Metric Exporter VMs.

(Optional) Configure TKGI and Certificate Expiration Metric Exporter VMs.

Configuring Authentication with a UAA Instance on a


Different Tanzu Operations Manager Foundation

103
Healthwatch for VMware Tanzu

This topic describes how to configure authentication with a User Account and Authentication (UAA) instance
on a different VMware Tanzu Operations Manager foundation for users to log in to the Grafana UI. This
configuration is in the context of Healthwatch for VMware Tanzu.

Overview of UAA Authentication with the Grafana UI


Healthwatch can automatically configure authentication with the UAA instance of the runtime that is
installed on the same Tanzu Operations Manager foundation as the Healthwatch tile, either VMware Tanzu
Platform for Cloud Foundry or VMware Tanzu Kubernetes Grid Integrated Edition (TKGI). When you select
UAA as your Grafana UI authentication method in the Grafana Authentication pane of the Healthwatch tile,
Healthwatch automatically configures authentication with the UAA instances in Tanzu Platform for Cloud
Foundry and TKGI for the Grafana UI.

If you want to configure authentication with the UAA instance of a runtime that is installed on a different
Tanzu Operations Manager foundation, you must select Generic OAuth and configure it manually through
the Grafana Authentication pane.

Create a UAA Client for the Grafana Instance


To authenticate with the the UAA instance of a runtime that is installed on a different Tanzu Operations
Manager foundation, the Grafana instance must access the UAA instance through a UAA client.

To create a UAA client for the Grafana instance:

1. Go to the Tanzu Ops Manager Installation Dashboard for the Tanzu Operations Manager foundation
with the UAA instance with which you want to configure authentication for the Grafana UI.

2. Click the VMware Tanzu Platform for Cloud Foundry or Tanzu Kubernetes Grid Integrated
Edition tile, depending on which runtime is installed on this Tanzu Operations Manager foundation.

3. Select the Credentials tab.

4. View and record the credentials to log in to the UAA instance for the runtime installed on this Tanzu
Operations Manager foundation:

For Tanzu Platform for Cloud Foundry:

1. In the Admin Client Credentials row of the UAA section, click Link to
Credential.

2. Record the value of password. This value is the secret for Admin Client
Credentials.

For TKGI:

1. In the Pks Uaa Management Admin Client row, click Link to Credential.

2. Record the value of secret. This value is the secret for Pks Uaa Management
Admin Client.

5. Target the server for the UAA instance for the runtime installed on this Tanzu Operations Manager
foundation using the User Account and Authentication Command Line Interface (UAAC). Run:

uaac target UAA-URL

104
Healthwatch for VMware Tanzu

Where UAA-URL is the URL of the UAA instance with which you want to configure authentication.
For UAA instances for Tanzu Platform for CF, this URL is usually https://login.SYSTEM-DOMAIN,
where SYSTEM-DOMAIN is the domain you configured in the System domain field in the Domains
pane of the Tanzu Platform for Cloud Foundry tile. For TKGI, this URL is usually https://TKGI-
API-URL:8443, where TKGI-API-URL is the URL of the TKGI API.

For more information about the UAAC, see the Tanzu Platform for Cloud Foundry documentation.

6. Log in to the UAA instance by running:

uaac token client get admin -s UAA-ADMIN-CLIENT-SECRET

Where UAA-ADMIN-CLIENT-SECRET is the UAA administrator client secret you recorded from the
Credentials tab in the runtime tile in a previous step.

7. Create a UAA client for the Grafana instance by running:

uaac client add grafana \


--scope openid,healthwatch.admin,healthwatch.edit,healthwatch.read \
--secret CLIENT-SECRET \
--authorities uaa.resource,refresh_token \
--authorized_grant_types authorization_code \
--redirect_uri PROTOCOL://GRAFANA-ROOT-URL/login/generic_oauth

Where:

CLIENT-SECRET is the secret you want to set for the UAA client.

PROTOCOL is either http or https, depending on the protocol you configured the Grafana
instance to use in the Grafana pane of the Healthwatch tile.

GRAFANA-ROOT-URL is the root URL for the Grafana instance that you use to access the
Grafana UI.

8. If you are using TKGI, you must manually create UAA user groups to map to administrator, editor,
and viewer permissions for Grafana. Run:

uaac group add healthwatch.admin


uaac group add healthwatch.edit
uaac group add healthwatch.read

If you are using Tanzu Platform for CF, you added the UAA client to UAA user groups mapped to
administrator, editor, and viewer permissions for Grafana in the previous step. Continue to the next
step.

9. Create a user account for the UAA client you created in a previous step to log in to the Grafana
instance. Run:

uaac user add USERNAME -p SECRET --emails EMAIL

Where:

USERNAME is the username you want to set for the user account.

SECRET is the secret you want to set for the user account.

105
Healthwatch for VMware Tanzu

EMAIL is the email address you want to associate with the user account.

10. Assign user permissions to the user account you created in the previous step by running:

uaac member add GROUP USERNAME

Where:

GROUP is either healthwatch.admin, healthwatch.edit, or healthwatch.read. These


groups map to the Admin, Editor, and Viewer Grafana roles, respectively. For more
information about the level of access each role provides, see the Grafana documentation.

USERNAME is the username you set for the user account you created in the previous step.

Configure the Grafana UI


To configure the Grafana UI to authenticate users with a UAA instance on a different Tanzu Operations
Manager foundation:

1. Go to the Tanzu Ops Manager Installation Dashboard for the Tanzu Operations Manager foundation
with the Grafana instance for which you want to configure UAA authentication.

2. Click the Healthwatch tile.

3. Select Grafana Authentication.

4. Under Additional authentication methods, select Generic OAuth.

5. For Provider name, enter a name that identifies the UAA instance with which you want to configure
authentication. For example, UAA.

6. For Client ID, enter the client ID of the UAA client that was created for the UAA instance with which
you want to configure authentication in Create a UAA Client for the Grafana Instance above.

7. For Client secret, enter the client secret of the UAA client that was created for the UAA instance
with which you want to configure authentication in Create a UAA Client for the Grafana Instance
above.

8. For Scopes, enter openid,healthwatch.admin,healthwatch.edit,healthwatch.read.

9. For Authorization URL, enter the authorization URL for your runtime:

For Tanzu Platform for CF, enter https://login.sys.DOMAIN/oauth/authorize, where


DOMAIN is the system domain of your Tanzu Operations Manager deployment.

For TKGI, enter https://api.pks.DOMAIN:8443/oauth/authorize, where DOMAIN is the


system domain of your Tanzu Operations Manager deployment.

10. For Token URL, enter the token URL for your runtime:

For Tanzu Platform for CF, enter https://login.sys.DOMAIN/oauth/token, where


DOMAIN is the system domain of your Tanzu Operations Manager deployment.

For TKGI, enter https://api.pks.DOMAIN:8443/oauth/token, where DOMAIN is the


system domain of your Tanzu Operations Manager deployment.

11. For API URL, enter http://localhost:3002/userinfo. This is the URL of a local proxy server
that Healthwatch can use to translate the UAA token into a format that is compatible with Grafana.

106
Healthwatch for VMware Tanzu

12. To allow new users to create a new Grafana account when they log in with their existing UAA
credentials for the first time, select the Allow new accounts with existing OAuth credentials
checkbox. This checkbox is selected by default. Unselecting this checkbox prevents users without
a pre-existing Grafana account from creating a new Grafana account or logging in to the Grafana UI
with their existing UAA credentials.

13. For Role attribute path, enter the following JMESPath string to map users to Grafana roles:
contains(scope[*], 'healthwatch.admin') && 'Admin' || contains(scope[*],
'healthwatch.edit') && 'Editor' || contains(scope[*], 'healthwatch.read') &&
'Viewer'.

14. (Optional) To prevent users who are not mapped to a valid Grafana role from accessing the Grafana
UI, select the Deny access to users without Grafana roles checkbox. This checkbox is
unselected by default. Unselecting this checkbox assigns the Viewer role to users who cannot be
not mapped to a valid Grafana role by the string configured in the Role attribute path field.

15. (Optional) To allow the Grafana instance to communicate with the server for your OAuth provider
over TLS:

1. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that the
UAA instance with which you want to configure authentication uses to verify TLS
certificates. You must configure this field if the UAA instance with which you want to
configure authentication uses a TLS certificate that is signed by an untrusted authority.

2. If you do not provide a self-signed CA certificate or a certificate that is signed by a self-


signed CA certificate, select the Skip TLS certificate verification checkbox. When this
checkbox is selected, the Grafana instance does not verify the identity of the UAA
instance with which you want to configure authentication. This checkbox is unselected by
default. VMware does not recommend skipping TLS certificate verification to a UAA
instance for a runtime on a different Tanzu Operations Manager foundation.

16. Click Save.

Configuring TKGI Cluster Discovery


This topic describes how to configure VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) cluster
discovery in Healthwatch for VMware Tanzu.

Overview of TKGI Cluster Discovery


In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure the Prometheus instance in the
Healthwatch tile to detect on-demand Kubernetes clusters created through the TKGI API and create scrape
jobs for them. You only need to configure this pane if you have Tanzu Operations Manager foundations with
TKGI installed.

The Prometheus instance detects and scrapes TKGI clusters by connecting to the Kubernetes API through
the TKGI API using a UAA client. To allow this, you must configure the Healthwatch tile, the Prometheus
instance in the Healthwatch tile, the UAA client that the Prometheus instance uses to connect to the TKGI
API, and the TKGI tile.

To configure TKGI cluster discovery:

107
Healthwatch for VMware Tanzu

1. Configure the TKGI Cluster Discovery pane in the Healthwatch tile. For more information, see
Configure TKGI Cluster Discovery in Healthwatch below.

2. Configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters. For more
information, see Configure TKGI below.

If TKGI cluster discovery fails after you have completed both parts of the procedure in this topic, see
Troubleshooting TKGI Cluster Discovery Failure below.

To collect additional BOSH system metrics related to TKGI and view them in the Grafana
UI, you must install and configure the Healthwatch Exporter for TKGI on your Tanzu
Operations Manager foundations with TKGI installed. To install the Healthwatch Exporter
for TKGI tile, see Installing a Tile Manually. To configure the Healthwatch Exporter for
TKGI tile, see Configuring Healthwatch Exporter for TKGI.

Configure TKGI Cluster Discovery in Healthwatch


In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure TKGI cluster discovery,
including the UAA client that the Prometheus instance uses to connect to the Kubernetes API through the
TKGI API.

To configure the TKGI Cluster Discovery pane:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select TKGI Cluster Discovery.

4. Under TKGI cluster discovery, select one of the following options:

On: This option allows TKGI cluster discovery and reveals the configuration fields
described in the steps below. TKGI cluster discovery is allowed by default when TKGI is
installed on your Tanzu Operations Manager foundation.

Off: This option disallows TKGI cluster discovery.

5. For Discovery interval, enter in seconds how frequently you want the Prometheus instance
detects and scrapes TKGI clusters. The minimum value is 60.

6. (Optional) To allow the Prometheus instance to communicate with the TKGI API over TLS,
configure one of the following options:

To configure the Prometheus instance to use a self-signed CA certificate or a certificate


that is signed by a self-signed CA certificate when communicating with the TKGI API over
TLS, provide the certificate for the CA in CA certificate for TLS. If you provide a self-
signed CA certificate, it must be for the same CA that signs the certificate in the TKGI API.
If the Prometheus instance uses certificates signed by a trusted third-party CA or the Skip
TLS certificate verification checkbox is selected, do not configure this field.

If you do not provide a self-signed CA certificate or a certificate that is signed by a self-


signed CA certificate, you can select the Skip TLS certificate verification checkbox.
When this checkbox is selected, the Prometheus instance does not verify the identity of

108
Healthwatch for VMware Tanzu

the TKGI API. This checkbox is unselected by default. VMware does not recommend
skipping TLS certificate verification in a production environment.

7. Click Save.

Configure TKGI
After you configure TKGI cluster discovery in the Healthwatch tile, you must configure TKGI to allow the
Prometheus instance to scrape metrics from TKGI clusters.

To configure TKGI:

1. Return to the Tanzu Ops Manager Installation Dashboard.

2. Click the Tanzu Kubernetes Grid Integrated Edition tile.

3. Select Host Monitoring.

4. Under Enable Telegraf Outputs?, select Yes.

5. Select the Include etcd metrics checkbox to allow TKGI to send etcd server and debugging
metrics to Healthwatch.

6. Select the Include Kubernetes Controller Manager metrics checkbox to allow TKGI to send
Kubernetes Controller Manager metrics to Healthwatch.

7. If you are using TKGI v1.14.2 or later, select the Include Kubernetes Scheduler metrics
checkbox to allow TKGI to send Kubernetes Scheduler metrics to Healthwatch.

8. For Setup Telegraf Outputs, provide the following TOML configuration file:

[[outputs.prometheus_client]]
listen = ":10200"
metric_version = 2

You must use 10200 as the listening port to allow the Prometheus instance to scrape Telegraf
metrics from your TKGI clusters. For more information about creating a configuration file in TKGI,
see the TKGI documentation.

If you are configuring TKGI v1.12 or earlier, remove metric_version = 2 from the
TOML configuration file. TKGI v1.12 and earlier are out of support. Consider
upgrading to at least v1.17, which is currently the oldest supported version.

9. Click Save.

10. For each plan you want to monitor:

1. Select the plan you want to monitor. For example, Plan 2.

2. For (Optional) Add-ons - Use with caution, enter the following YAML snippet to create
the roles required to allow the Prometheus instance to scrape metrics from your TKGI
clusters:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole

109
Healthwatch for VMware Tanzu

metadata:
name: healthwatch
rules:
- resources:
- pods/proxy
- pods
- nodes
- nodes/proxy
- namespace/pods
- endpoints
- services
verbs:
- get
- watch
- list
apiGroups:
- ""
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: healthwatch
roleRef:
apiGroup: ""
kind: ClusterRole
name: healthwatch
subjects:
- apiGroup: ""
kind: User
name: healthwatch

If (Optional) Add-ons - Use with caution already contains other API resource definitions,
append the above YAML snippet to the end of the existing resource definitions, followed by
a newline character.

11. Click Save.

12. Select Errands.

13. Ensure that the Upgrade all clusters errand is running. Running this errand configures your TKGI
clusters with the roles you created in the (Optional) Add-ons - Use with caution field of the plans
you monitor in a previous step.

14. Click Save.

Troubleshooting TKGI Cluster Discovery Failure


TKGI cluster discovery can fail if the Prometheus instance fails to scrape metrics from your TKGI clusters.
To troubleshoot TKGI cluster discovery failure, see Troubleshooting Missing TKGI Cluster Metrics in
Troubleshooting Healthwatch.

Configuration File Reference Guide

110
Healthwatch for VMware Tanzu

This topic describes the Healthwatch for VMware Tanzu component configuration file properties that you
configure through the Healthwatch tile.

Overview of Configuration Files in Healthwatch


The Prometheus, Alertmanager, and Grafana instances rely on configuration files. These configuration files
contain lists of properties that define their configuration settings. When you configure certain options in the
Healthwatch tile, those options configure their corresponding properties in the configuration files for those
Healthwatch components.

Through the Healthwatch tile, you configure properties in the configuration files through the following panes:

Prometheus. For more information, see Configuring the the Prometheus Configuration File below.

Alertmanager. For more information, see Configuring the Alertmanager Configuration File below.

Grafana. For more information, see Configuring the Grafana Configuration File below.

Configuring the Prometheus Configuration File


You configure sections of the Prometheus configuration file through the following panes in the Healthwatch
tile:

Prometheus. For more information, see Prometheus below.

Remote Write. For more information, see Remote Write below.

Prometheus Configuration Options


The following table lists which configuration options in the Prometheus pane in the Healthwatch tile
configure properties in the Prometheus configuration file:

Healthwatch configuration Configuration file property Configuration file section


option

Scrape interval scrape_interval scrape_config

Scrape job configuration See Configure Prometheus in Configuring scrape_config


parameters Healthwatch and the Prometheus
documentation.

Certificate and private key for cert_file, key_file scrape_config - tls_config


TLS

CA certificate for TLS ca_file scrape_config - tls_config

Target server name server_name scrape_config - tls_config

Skip TLS certificate verification insecure_skip_verify scrape_config - tls_config

For more information about configuring these properties, see Configure Prometheus in Configuring
Healthwatch and the Prometheus documentation.

Remote Write

111
Healthwatch for VMware Tanzu

The following table lists which configuration options in the Remote Write pane in the Healthwatch tile
configure properties in the Prometheus configuration file:

Healthwatch configuration Configuration file property Configuration file section


option

Remote storage URL url remote_write

Remote timeout remote_timeout remote_write

Remote storage username username remote_write - basic_auth

Remote storage password password remote_write - basic_auth

Certificate and private key for cert_file, key_file remote_write - tls_config


TLS

CA certificate for TLS ca_file remote_write - tls_config

Remote storage server name server_name remote_write - tls_config

Skip TLS certificate verification insecure_skip_verify remote_write - tls_config

Proxy URL proxy_url remote_write

Queue capacity capacity remote_write - queue_config

Minimum shards per queue min_shards remote_write - queue_config

Maximum shards per queue max_shards remote_write - queue_config

Maximum samples per send max_samples_per_send remote_write - queue_config

Maximum batch wait time batch_send_deadline remote_write - queue_config

Minimum backoff time min_backoff remote_write - queue_config

Maximum backoff time max_backoff remote_write - queue_config

For more information about configuring these properties, see Configure Remote Write in Configuring
Healthwatch and the Prometheus documentation.

Configuring the Alertmanager Configuration File


The following table lists which configuration options in the Alertmanager pane in the Healthwatch tile
configure properties in the Alertmanager configuration file:

Healthwatch configuration Configuration file property Configuration file section


option

Routing rules See Configure Alerting in Configuring route


Alerting and the Prometheus
documentation.

Inhibit rules See Configure Alerting in Configuring inhibit_rules


Alerting and the Prometheus
documentation.

Email alert receivers

112
Healthwatch for VMware Tanzu

Alert receiver name name receivers

Alert receiver configuration See Configure an Email Alert Receiver in receivers - email_configs
parameters Configuring Alerting and the Prometheus
documentation.

SMTP server authentication auth_password receivers - email_configs


password

SMTP server authentication auth_secret receivers - email_configs


secret

Certificate and private key for cert_file, key_file receivers - email_configs - tls_config
TLS

CA certificate for TLS ca_file receivers - email_configs - tls_config

SMTP server name server_name receivers - email_configs - tls_config

Skip TLS certificate verification insecure_skip_verify receivers - email_configs - tls_config

PagerDuty alert receivers

Alert receiver name name receivers

Alert receiver configuration See Configure a PagerDuty Alert receivers - pagerduty_configs


parameters Receiver in Configuring Alerting and the
Prometheus documentation.

Routing key routing_key receivers - pagerduty_configs

Service key service_key receivers - pagerduty_configs

Basic authentication credentials username, password receivers - pagerduty_configs -


http_config - basic_auth

Bearer token credentials receivers - pagerduty_configs -


http_config - authorization

Certificate and private key for cert_file, key_file receivers - pagerduty_configs -


TLS http_config - tls_config

CA certificate for TLS ca_file receivers - pagerduty_configs -


http_config - tls_config

PagerDuty server name server_name receivers - pagerduty_configs -


http_config - tls_config

Skip TLS certificate verification insecure_skip_verify receivers - pagerduty_configs -


http_config - tls_config

Slack alert receivers

Alert receiver name name receivers

Alert receiver configuration See Configure a Slack Alert Receiver in receivers - slack_configs
parameters Configuring Alerting and the Prometheus
documentation.

Slack API URL api_url receivers - slack_configs

113
Healthwatch for VMware Tanzu

Basic authentication credentials username, password receivers - slack_configs - http_config


- basic_auth

Bearer token credentials receivers - slack_configs - http_config


- authorization

Certificate and private key for cert_file, key_file receivers - slack_configs - http_config
TLS - tls_config

CA certificate for TLS ca_file receivers - slack_configs - http_config


- tls_config

Slack server name server_name receivers - slack_configs - http_config


- tls_config

Skip TLS certificate verification insecure_skip_verify receivers - slack_configs - http_config


- tls_config

Webhook alert receivers

Alert receiver name name receivers

Alert receiver configuration See Configure a Webhook Alert Receiver receivers - webhook_configs
parameters in Configuring Alerting and the
Prometheus documentation.

Basic authentication credentials username, password receivers - webhook_configs -


http_config - basic_auth

Bearer token credentials receivers - webhook_configs -


http_config - authorization

Certificate and private key for cert_file, key_file receivers - webhook_configs -


TLS http_config - tls_config

CA certificate for TLS ca_file receivers - webhook_configs -


http_config - tls_config

Webhook server name server_name receivers - webhook_configs -


http_config - tls_config

Skip TLS certificate verification insecure_skip_verify receivers - webhook_configs -


http_config - tls_config

For more information about configuring these properties, see Configuring Alerting and the Prometheus
documentation.

Configuring the Grafana Configuration File


The following table lists which configuration options in the Grafana pane in the Healthwatch tile configure
properties in the Grafana configuration file:

Healthwatch configuration Configuration file property Configuration file section


option

Grafana root URL root_url [server]

Basic authentication enabled [auth.basic]

114
Healthwatch for VMware Tanzu

Generic OAuth

Provider name name [auth.generic_oauth]

Client ID client_id [auth.generic_oauth]

Client secret client_secret [auth.generic_oauth]

Scopes scopes [auth.generic_oauth]

Authorization URL auth_url [auth.generic_oauth]

Token URL token_url [auth.generic_oauth]

API URL api_url [auth.generic_oauth]

Email attribute email_attribute_name [auth.generic_oauth]

Grafana domain domain [server]

Allow new accounts with existing allow_sign_up [auth.generic_oauth]


OAuth credentials

Allowed domains allowed_domains [auth.generic_oauth]

Allowed teams teams_ids [auth.generic_oauth]

Allowed organizations allowed_organization [auth.generic_oauth]

Allowed groups allowed_group [auth.generic_oauth]

Certificate and private key for tls_client_cert, tls_client_key [auth.generic_oauth]


TLS

CA certificate for TLS tls_client_ca [auth.generic_oauth]

Skip TLS certificate verification tls_skip_verify_insecure [auth.generic_oauth]

UAA

Client ID client_id [auth.generic_oauth]

Client secret client_secret [auth.generic_oauth]

UAA server root URL auth_url, token_url [auth.generic_oauth]

CA certificate for TLS tls_client_ca [auth.generic_oauth]

Skip TLS certificate verification tls_skip_verify_insecure [auth.generic_oauth]

LDAP

Host address host [auth.ldap]

Port port [auth.ldap]

Allow new accounts with existing allow_sign_up [auth.ldap]


LDAP credentials

Use SSL use_ssl [auth.ldap]

Use STARTTLS start_tls [auth.ldap]

Skip TLS certificate verification tls_skip_verify_insecure [auth.ldap]

115
Healthwatch for VMware Tanzu

Bind DN bind_dn [auth.ldap]

Bind password bind_password [auth.ldap]

User search filter search_filter [auth.ldap]

Search base DNs search_base_dns [auth.ldap]

Group search filter group_search_filter [auth.ldap]

Group search base DNs group_search_base_dns [auth.ldap]

Group search filter user attribute group_search_filter_user_attribut [auth.ldap]


e

Server attributes [servers.attributes] [auth.ldap]

Server group mappings [[servers.group_mappings]] [auth.ldap]

Certificate and private key for client_cert, client_key [auth.ldap]


TLS

CA certificate for TLS root_ca_cert [auth.ldap]

Email alerts

Configure / Do not configure enabled [smtp]

SMTP server host name host [smtp]

SMTP server port port [smtp]

SMTP server username user [smtp]

SMTP server password password [smtp]

Skip TLS certificate verification skip_verify [smtp]

From address from_address [smtp]

From name from_name [smtp]

EHLO client ID ehlo_identity [smtp]

Certificate and private key for cert_file, key_file [smtp]


TLS

For more information about configuring these properties, see Configure Grafana in Configuring Healthwatch,
Configuring Grafana Authentication, and the Grafana documentation.

116
Healthwatch for VMware Tanzu

Using Healthwatch Dashboards in the


Grafana UI

This topic describes how to access and use your Healthwatch for VMware Tanzu dashboards in the Grafana
user interface (UI).

Healthwatch uses Grafana to visualize metrics data in charts and graphs. Once you have installed and
configured the Healthwatch and Healthwatch Exporter tiles, you can log in to the Grafana UI to view
dashboards that show how your Tanzu Operations Manager foundations are performing.

Each dashboard contains charts and graphs called panels. Each dashboard and panel contains detailed
descriptions of the metrics they display and how to troubleshoot your Tanzu Operations Manager
foundations based on those metrics.

Using Dashboards in the Grafana UI


The diagram below shows a dashboard in the Grafana UI:

The following list describes each labeled section of the dashboard:

A - Filters: Use these dropdowns to filter the metrics that the dashboard displays. Not all
dashboards have filters.

B - Dashboard Header: Click this header to see a description of the metrics that the dashboard
displays, how to use those metrics for troubleshooting, and links to further documentation.

C - Information Icon: Hover over this icon to see a description of the metrics that the panel
displays and how to use those metrics for troubleshooting.

117
Healthwatch for VMware Tanzu

You can edit a dashboard by copying the dashboard and editing the dashboard copy. To further customize
your dashboards, see the Grafana documentation.

View Your Healthwatch Dashboards


To view your Healthwatch dashboards:

1. In your web browser, navigate to the URL that you configured in the DNS entry for the Grafana
instance. For more information, see Configuring DNS for the Grafana Instance.

2. Follow one of the procedures below to log in to the Grafana UI, according to the authentication
method you configured in the Healthwatch tile:

Basic authentication: To log in to the Grafana UI using basic authentication:


1. Enter the following login credentials:
For Email or username, enter admin.

For Password, enter the password for the Grafana UI that you find in the
Healthwatch tile. To find the password for the Grafana UI:
1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch tile.

3. Select the Credentials tab.

4. In the Admin Login Password row of the Grafana section, click


Link to Credential.

5. Record the value of password. This value is the password that all
users must use to log in to the Grafana UI.

2. Click Log in.

Generic OAuth: On the Grafana UI login page, click OAuth to log in with your OAuth
credentials.

UAA: On the Grafana UI login page, click UAA to log in with your UAA credentials.

LDAP: On the Grafana UI login page, click LDAP to log in with your LDAP credentials.

For more information about configuring an authentication method for the Grafana UI, see
Configuring Grafana Authentication.

3. On the left side of the Grafana UI homepage, hover over the Dashboards icon on the menu bar. A
navigation menu appears.

4. Click Manage. A list of folders appears.

5. View the dashboards in each folder in one of the following ways:

Click either the name or the expansion arrow on a folder to expand a list of the dashboards
it contains.

Click Go to folder next to a folder name to navigate to the folder page.

Click the list icon to view a single list of all dashboards contained in all folders. The folder
icon is selected by default and groups your dashboards by folder in expandable and
collapsible lists.

118
Healthwatch for VMware Tanzu

6. Click a dashboard name to open it.

For detailed descriptions of each default dashboard in the Grafana UI, see Default Dashboards in the
Grafana UI below.

Default Dashboards in the Grafana UI


There are 33 total dashboards that Healthwatch creates in the Grafana UI by default. The subsets of these
dashboards that appear in the Grafana UI depend on which tiles in your Tanzu Operations Manager
foundations you have installed and are monitoring.

The following list describes the default Healthwatch dashboards you may see in the Grafana UI:

Foundation: Includes the following dashboards:

All Jobs: Contains metrics related to the percentage of healthy VMs in your BOSH
Director, runtime, Healthwatch, and Healthwatch Exporter tile deployments. You can filter
these metrics by deployment.

BOSH Director Health: Contains metrics related to the health of the BOSH Director.

Canary App Health: Contains metrics related to the availability of the canary app that the
Blackbox Exporter uses to run canary tests. You can filter these metrics by canary URL.

Certificate Expiration: Contains metrics related to when certificates for your Tanzu
Operations Manager deployment are due to expire. For more information, see Monitoring
Certificate Expiration.

Job Details: Contains metrics related to the functionality of each component and VM in
your runtime, Healthwatch, and Healthwatch Exporter tile deployments. You can filter these
metrics by health status, deployment, job instance, and VM. If your Tanzu Operations
Manager foundation has neither Tanzu Platform for Cloud Foundry or VMware Tanzu
Kubernetes Grid Integrated Edition (TKGI) installed, you see this dashboard upon logging in
to the Grafana UI.

Ops Manager Health: Contains metrics related to the availability of your Tanzu Operations
Manager deployment.

System at a Glance: Contains an overview of metrics related to the health of your Tanzu
Operations Manager deployment. You see this dashboard upon logging in to the Grafana
UI.

Healthwatch: Includes the following dashboards:

Healthwatch - Exporter Troubleshooting: Contains metrics related to the functionality of


the metric exporter VMs that Healthwatch Exporter for VMware Tanzu Platform for Cloud
Foundry deploys.

Healthwatch SLOs: Contains metrics related to the availability of runtime, Healthwatch,


and Healtwatch Exporter tile components. You can filter these metrics by uptime service
level objective (SLO) target.

MySQL: Includes the following dashboard:

119
Healthwatch for VMware Tanzu

This folder only appears if you configure Healthwatch to monitor your Tanzu SQL
for VMs tile. For more information, see (Optional) Configure Grafana Dashboards in
Configuring Healthwatch.

MySQL Overview: Contains metrics related to the percentage of healthy VMware Tanzu
SQL with MySQL for VMs (Tanzu SQL for VMs) clusters and nodes in your deployment.

RabbitMQ: Includes the following dashboards:

This folder only appears if you configure Healthwatch to monitor your Tanzu
RabbitMQ tile. For more information, see (Optional) Configure Grafana Dashboards
in Configuring Healthwatch.

Erlang-Distribution: Contains metrics related to the functionality of distribution links


between VMware Tanzu RabbitMQ for VMs (Tanzu RabbitMQ) nodes and peers. You can
filter these metrics by namespace, Tanzu RabbitMQ cluster, and process type.

Erlang-Distributions-Compare: Contains metrics related to comparing the functionality of


distribution links between Tanzu RabbitMQ nodes and peers. You can filter these metrics
by namespace, Tanzu RabbitMQ cluster, PerfTest instance, percentile, host, and container.

Erlang-Memory-Allocators: Contains metrics related to the functionality of your Tanzu


RabbitMQ memory allocators. You can filter these metrics by namespace, Tanzu
RabbitMQ cluster, Tanzu RabbitMQ node, and Erlang memory allocator.

RabbitMQ Clusters Overview: Contains metrics related to the percentage of healthy


nodes across all Tanzu RabbitMQ clusters.

RabbitMQ-Overview: Contains metrics related to the functionality of your Tanzu RabbitMQ


cluster nodes. You can filter these metrics by namespace and Tanzu RabbitMQ cluster.

RabbitMQ-PerfTest: Contains metrics related to the functionality of the PerfTest instances


in your Tanzu RabbitMQ clusters. You can filter these metrics by PerfTest instance and
percentile.

RabbitMQ-Quorum-Queues-Raft: Contains metrics related to the functionality of the


quorum queues in your Tanzu RabbitMQ clusters. You can filter these metrics by
namespace and Tanzu RabbitMQ cluster.

Tanzu Platform for Cloud Foundry: Includes the following dashboards:

This folder only appears if you install and configure Healthwatch Exporter for
Tanzu Platform for Cloud Foundry on one or more of your Tanzu Operations
Manager foundations. For more information about Healthwatch Exporter, see
Healthwatch Exporter for Tanzu Platform for Cloud Foundry Architecture in
Healthwatch Architecture and Configuring Healthwatch Exporter for Tanzu Platform
for Cloud Foundry.

App Instances: Contains metrics related to the number and functionality of apps and tasks
running on your Tanzu Platform for CF deployments.

120
Healthwatch for VMware Tanzu

CLI Health: Contains metrics related to the functionality of the Cloud Foundry Command-
Line Interface (cf CLI).

Diego/Capacity: Contains metrics related to the capacity of all Diego Cells in your Tanzu
Platform for CF deployments. You can filter these metrics by memory chunk size and disk
size.

Logging and Metrics Pipeline: Contains metrics related to the health and functionality of
the Firehose in your Tanzu Platform for CF deployments.

Router: Contains metrics related to the health and functionality of the Gorouters in your
Tanzu Platform for CF deployments.

Tanzu Platform for Cloud Foundry SLOs: Contains metrics related to the availability of
the cf push command and the canary apps for your Tanzu Platform for CF deployments.
You can filter these metrics by cf push uptime SLO target and canary app uptime SLO
target.

TAS MySQL Health: Contains metrics related to the health of the MySQL databases for
your Tanzu Platform for CF deployments.

UAA: Contains metrics related to the health and functionality of the UAA instances in your
Tanzu Platform for CF deployments.

Usage Service: Contains metrics related to the health and functionality of the UAA
instances in your Tanzu Platform for CF deployments.

If you do not see any data in the Usage Service dashboard, make sure
that you have configured the Metric Registrar in the Tanzu Platform for
Cloud Foundry tile. If you do not configure the Metric Registrar, the Usage
Service API cannot emit metrics to the Loggregator Firehose, and
Healthwatch Exporter for Tanzu Platform for Cloud Foundry cannot collect
them. For more information, see the Tanzu Platform for Cloud Foundry
documentation.

Tanzu Kubernetes Grid Integrated: Includes the following dashboards:

This folder appears only if you installed and configured Healthwatch Exporter for
Tanzu Kubernetes Grid Integrated Edition (TKGI) on one or more of your Tanzu
Operations Manager foundations. For more information about Healthwatch Exporter
for TKGI, see Healthwatch Exporter for TKGI Architecture in Healthwatch
Architecture and Configuring Healthwatch Exporter for TKGI.

Kubernetes API Server: Contains metrics related to the functionality of the Kubernetes
API Server instances in your TKGI clusters. You can filter these metrics by cluster and
instance.

Kubernetes Cluster Detail: Contains metrics related to the functionality of the nodes in
your TKGI clusters. You can filter these metrics by cluster. If your Tanzu Operations
Manager foundation has only TKGI installed, you see this dashboard upon logging in to the
Grafana UI.

121
Healthwatch for VMware Tanzu

Kubernetes Clusters Overview: Contains metrics related to the percentage of healthy


control plane nodes across all TKGI clusters.

Kubernetes Controller Manager: Contains metrics related to the functionality of the


Kubernetes Controller Manager instances in your TKGI clusters. You can filter these
metrics by cluster and instance.

Kubernetes Etcd: Contains metrics related to the functionality of the etcd instances in
your TKGI clusters. You can filter these metrics by cluster and instance.

Kubernetes Nodes: Contains metrics related to the functionality of your TKGI cluster
nodes. You can filter these metrics by cluster and instance.

Kubernetes Scheduler: Contains metrics related to the functionality of the Kubernetes


Scheduler instances in your TKGI clusters. You can filter these metrics by cluster and
instance.

TKGI Control Plane: Contains metrics related to the functionality of the TKGI Control
Plane.

122
Healthwatch for VMware Tanzu

Healthwatch Metrics

This topic describes the metrics that the Healthwatch Exporter and the Healthwatch Exporter for VMware
Tanzu Kubernetes Grid Integrated Edition (TKGI) generate.

Overview of Healthwatch Metrics


Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy
metric exporter VMs to generate component metrics and Service Level Indicators (SLIs) related to the
health of your Tanzu Platform for CF and TKGI deployments:

BOSH SLIs

Platform Metrics

Healthwatch Component Metrics

Each metric exporter VM exposes these metrics and SLIs on a Prometheus exposition endpoint, /metrics.

The Prometheus instance that exists within your metrics monitoring system then scrapes each /metrics
endpoints on the metric exporter VMs and imports those metrics into your monitoring system. You can
configure the frequency at which the Prometheus instance scrapes the /metrics endpoints in the
Prometheus pane of the Healthwatch for VMware Tanzu tile. To configure the scrape interval for the
Prometheus instance, see Configure Prometheus in Configuring Healthwatch.

The name of each metric is in PromQL format. For more information, see the Prometheus documentation.

BOSH SLIs
In a VMware Tanzu Operations Manager foundation, the BOSH Director manages the VMs that each tile
deploys. If the BOSH Director fails or is not responsive, the VMs that the BOSH Director manages also fail.

Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy two
VMs that continuously test the functionality of the BOSH Director: the BOSH health metric exporter VM and
the BOSH deployment metric exporter:

BOSH Health Metric Exporter VM

BOSH Deployment Metric Exporter VM

BOSH Health Metric Exporter VM

123
Healthwatch for VMware Tanzu

The BOSH health metric exporter VM, bosh-health-exporter, creates a BOSH deployment called bosh-
health every ten minutes. This BOSH deployment deploys another VM, bosh-health-check, that runs a
suite of SLI tests to validate the functionality of the BOSH Director. After the SLI tests are complete, the
BOSH health metric exporter VM collects the metrics from the bosh-health-check VM, then deletes the
bosh-health deployment and the bosh-health-check VM.

The following table describes each metric the BOSH health metric exporter VM generates:

Metric Description

bosh_sli_duration_seconds_bucket{exported_job="bosh- The number of seconds the BOSH health SLI test suite
health-exporter"} takes to run, grouped by how many ran in less than a
certain amount of time. This metric is also called a
bucket of BOSH health SLI test suite duration metrics.

bosh_sli_duration_seconds_count{exported_job="bosh- The total number of duration metrics across all BOSH


health-exporter"} health SLI test suite duration metric buckets.

bosh_sli_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all BOSH
health-exporter"} health SLI test suite duration metric buckets.

bosh_sli_exporter_status{exported_job="bosh-health- The health status of the BOSH health metric exporter


exporter"} VM. A value of 1 indicates that the BOSH health metric
exporter VM is running and healthy.

bosh_sli_failures_total{exported_job="bosh-health- The total number of times the BOSH health SLI test
exporter"} suite fails. A failed test suite is one in which any number
of tests within the test suite fail.

bosh_sli_run_duration_seconds{exported_job="bosh- The number of seconds a single BOSH health SLI test


health-exporter"} suite takes to run.

bosh_sli_runs_total{exported_job="bosh-health- The total number of times the BOSH health SLI test
exporter"} suite runs. To see the failure rate of
bosh_sli_runs_total{exported_job="bosh-
health-exporter"}, divide the value of
bosh_sli_failures_total{exported_job="bosh-
health-exporter"} by the value of
bosh_sli_runs_total{exported_job="bosh-
health-exporter"}.

bosh_sli_task_duration_seconds_bucket{exported_job="bos The number of seconds it takes a task within the


h-health-exporter"} BOSH health SLI test suite to run, grouped by how
many ran in less than a certain amount of time. This
metric is also called a bucket of task duration metrics.

bosh_sli_task_duration_seconds_count{exported_job="bosh The total number of duration metrics across all task


-health-exporter"} duration metric buckets.

bosh_sli_task_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all task
health-exporter"} duration metric buckets.

bosh_sli_task_run_duration_seconds{exported_job="bosh- The number of seconds it takes the bosh delete-


health-exporter",task="delete"} deployment command test to run.

bosh_sli_task_run_duration_seconds{exported_job="bosh- The number of seconds it takes the bosh deploy


health-exporter",task="deploy"} command test to run.

bosh_sli_task_run_duration_seconds{exported_job="bosh- The number of seconds it takes the bosh


health-exporter",task="deployments"} deployments command test to run.

124
Healthwatch for VMware Tanzu

bosh_sli_task_runs_total{exported_job="bosh-health- The total number of times a task runs. To see the


exporter"} failure rate of
bosh_sli_task_runs_total{exported_job="bosh-
health-exporter"}, divide the value of
bosh_sli_task_failures{exported_job="bosh-
health-exporter"} by the value of
bosh_sli_task_runs{exported_job="bosh-
health-exporter"}.

bosh_sli_task_failures_total{exported_job="bosh-health- The total number of times the bosh delete-


exporter",task="delete"} deployment command fails.

bosh_sli_task_failures_total{exported_job="bosh-health- The total number of times the bosh deploy command


exporter",task="deploy"} fails.

bosh_sli_task_failures_total{exported_job="bosh-health- The total number of times the bosh deployments


exporter",task="deployments"} command fails.

BOSH Deployment Metric Exporter VM


The BOSH deployment metric exporter VM, bosh-deployments-exporter, checks every 30 seconds
whether any BOSH deployments other than the bosh-health deployment created by the BOSH health
metric exporter VM are running.

The following table describes each metric the BOSH deployment metric exporter VM generates:

Metric Description

bosh_deployments_status Whether any BOSH deployments other than bosh-


health are running. A value of 0 indicates that no other
BOSH deployments are running on the BOSH Director.
A value of 1 indicates that other BOSH deployments
are running on the BOSH Director.

bosh_sli_duration_seconds_bucket{exported_job="bosh- The number of seconds the BOSH deployment check


deployments-exporter"} takes to run, grouped by how many ran in less than a
certain amount of time. This metric is also called a
bucket of BOSH deployment check duration metrics.

bosh_sli_duration_seconds_count{exported_job="bosh- The total number of duration metrics across all BOSH


deployments-exporter"} deployment check duration metric buckets.

bosh_sli_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all BOSH
deployments-exporter"} deployment check duration metric buckets.

bosh_sli_exporter_status{exported_job="bosh- The health status of the BOSH deployment metric


deployments-exporter"} exporter VM. A value of 1 indicates that the BOSH
deployment metric exporter VM is running and healthy.

bosh_sli_failures_total{exported_job="bosh-deployments- The total number of times the BOSH deployment check


exporter"} fails.

bosh_sli_run_duration_seconds{exported_job="bosh- The number of seconds a single BOSH deployment


deployments-exporter"} check takes to run.

125
Healthwatch for VMware Tanzu

bosh_sli_runs_total{exported_job="bosh-deployments- The total number of times the BOSH deployment check


exporter"} runs. To see the failure rate of
bosh_sli_runs_total{exported_job="bosh-
deployments-exporter"}, divide the value of
bosh_sli_failures_total{exported_job="bosh-
deployments-exporter"} by the value of
bosh_sli_runs_total{exported_job="bosh-
deployments-exporter"}.

bosh_sli_task_duration_seconds_bucket{exported_job="bos The number of seconds it takes a task within the


h-deployments-exporter"} BOSH deployment check to run, grouped how many
ran in less than a certain amount of time. This metric is
also called a bucket of task duration metrics.

bosh_sli_task_duration_seconds_count{exported_job="bosh The total number of duration metrics across all task


-deployments-exporter"} duration metric buckets.

bosh_sli_task_duration_seconds_sum{exported_job="bosh- The total value of the duration metrics across all task
deployments-exporter"} duration metric buckets.

bosh_sli_task_run_duration_seconds{exported_job="bosh- The number of seconds it takes the bosh tasks


deployments-exporter",task="tasks"} command test to run.

bosh_sli_task_runs_total{exported_job="bosh- The total number of times a task runs. To see the


deployments-exporter"} failure rate of
bosh_sli_task_runs_total{exported_job="bosh-
deployments-exporter"}, divide the value of
bosh_sli_task_failures_total{exported_job="b
osh-deployments-exporter"} by the value of
bosh_sli_task_runs_total{exported_job="bosh-
deployments-exporter"}.

bosh_sli_task_failures_total{exported_job="bosh- The total number of times the bosh tasks command


deployments-exporter",task="tasks"} fails.

Platform Metrics
Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy VMs
that generate metrics regarding the health of several Tanzu Operations Manager and runtime components.

You can use the following Platform Metrics metrics to calculate percent availability and error budgets:

Tanzu Platform for Cloud Foundry SLI Exporter VM

TKGI SLI Exporter VM

Certificate Expiration Metric Exporter VM

Prometheus VM

SVM Forwarder VM - Platform Metrics

Tanzu Platform for Cloud Foundry SLI Exporter VM

126
Healthwatch for VMware Tanzu

Developers create and manage apps on Tanzu Platform for Cloud Foundry using the Cloud Foundry
Command Line Interface (cf CLI). Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys the
Tanzu Platform for Cloud Foundry SLI exporter VM, pas-sli-exporter, which continuously tests the
functionality of the cf CLI.

The following table describes each metric the Tanzu Platform for Cloud Foundry SLI exporter VM generates:

Metric Description

tas_sli_duration_seconds_bucket The number of seconds the Tanzu Platform for CF SLI


test suite takes to run, grouped by how many ran in
less than a certain amount of time. This metric is also
called a bucket of Tanzu Platform for CF SLI test suite
duration metrics.

tas_sli_duration_seconds_count The total number of duration metrics across all Tanzu


Platform for CF SLI test suite duration metric buckets.

tas_sli_duration_seconds_sum The total value of the duration metrics across all Tanzu
Platform for CF SLI test suite duration metric buckets.

tas_sli_exporter_status The health status of the Tanzu Platform for CF SLI


exporter VM. A value of 1 indicates that the Tanzu
Platform for CF SLI exporter VM is running and healthy.

tas_sli_failures_total The total number of times the Tanzu Platform for CF


SLI test suite fails.

tas_sli_run_duration_seconds The number of seconds the Tanzu Platform for CF SLI


test suite takes to run.

tas_sli_runs_total The total number of times the Tanzu Platform for CF


SLI test suite runs. To see the failure rate of
tas_sli_runs_total, divide the value of
tas_sli_failures_total by the value of
tas_sli_runs_total.

tas_sli_task_duration_seconds_bucket The number of seconds it takes a task within the Tanzu


Platform for CF SLI test suite to run, grouped by how
many ran in less than a certain amount of time. This
metric is also called a bucket of task duration metrics.

tas_sli_task_duration_seconds_count The total number of duration metrics across all task


duration metric buckets.

tas_sli_task_duration_seconds_sum The total value of the duration metrics across all task
duration metric buckets.

tas_sli_task_run_duration_seconds{task="delete"} The number of seconds it takes the cf delete


command test to run.

tas_sli_task_run_duration_seconds{task="login"} The number of seconds it takes the cf login


command test to run.

tas_sli_task_run_duration_seconds{task="logs"} The number of seconds it takes the cf logs command


test to run.

tas_sli_task_run_duration_seconds{task="push"} The number of seconds it takes the cf push command


test to run.

127
Healthwatch for VMware Tanzu

tas_sli_task_run_duration_seconds{task="setEnv"} The number of seconds it takes the cf set-env


command test to run.

tas_sli_task_run_duration_seconds{task="start"} The number of seconds it takes the cf start


command test to run.

tas_sli_task_run_duration_seconds{task="stop"} The number of seconds it takes the cf stop command


test to run.

tas_sli_task_runs_total The total number of times a task runs. To see the


failure rate of tas_sli_task_runs_total, divide the
value of tas_sli_task_failures by the value of
tas_sli_task_runs.

tas_sli_task_failures_total{task="delete"} The total number of times the cf delete command


fails.

tas_sli_task_failures_total{task="login"} The total number of times the cf login command fails.

tas_sli_task_failures_total{task="logs"} The total number of times the cf logs command fails.

tas_sli_task_failures_total{task="push"} The total number of times the cf push command fails.

tas_sli_task_failures_total{task="setEnv"} The total number of times the cf set-env command


fails.

tas_sli_task_failures_total{task="start"} The total number of times the cf start command fails.

tas_sli_task_failures_total{task="stop"} The total number of times the cf stop command fails.

TKGI SLI Exporter VM


Operators create and manage Kubernetes clusters using the TKGI Command Line Interface (TKGI CLI).
Healthwatch Exporter for TKGI deploys the TKGI SLI exporter VM, pks-sli-exporter, which continuously
tests the functionality of the TKGI CLI.

The following table describes each metric the TKGI SLI exporter VM generates:

Metric Description

tkgi_sli_duration_seconds_bucket The number of seconds the TKGI SLI test suite takes
to run, grouped by how many ran in less than a certain
amount of time. This metric is also called a bucket of
TKGI SLI test suite duration metrics.

tkgi_sli_duration_seconds_count The total number of duration metrics across all TKGI


SLI test suite duration metric buckets.

tkgi_sli_duration_seconds_sum The total value of the duration metrics across all TKGI
SLI test suite duration metric buckets.

tkgi_sli_exporter_status The health status of the TKGI SLI exporter VM. A value
of 1 indicates that the TKGI SLI exporter VM is running
and healthy.

tkgi_sli_failures_total The total number of times the TKGI SLI test suite fails.

128
Healthwatch for VMware Tanzu

tkgi_sli_run_duration_seconds The number of seconds the TKGI SLI test suite takes
to run.

tkgi_sli_runs_total The total number of times the TKGI SLI test suite runs.
To see the failure rate of tkgi_sli_runs_total, divide
the value of tkgi_sli_failures_total by the value
of tkgi_sli_runs_total.

tkgi_sli_task_duration_seconds_bucket The number of seconds it takes a task with the TKGI


SLI test suite to run, grouped by duration. This metric is
also called a bucket of task duration metrics.

tkgi_sli_task_duration_seconds_count The total number of duration metrics across all task


duration metric buckets.

tkgi_sli_task_duration_seconds_sum The total value of the duration metrics across all task
duration metric buckets.

tkgi_sli_task_run_duration_seconds{task="clusters"} The number of seconds it takes the tkgi clusters


command test to run.

tkgi_sli_task_run_duration_seconds{task="get- The number of seconds it takes the tkgi get-


credentials"} credentials command test to run.

tkgi_sli_task_run_duration_seconds{task="login"} The number of seconds it takes the tkgi login


command test to run.

tkgi_sli_task_run_duration_seconds{task="plans"} The number of seconds it takes the tkgi plans


command test to run.

tkgi_sli_task_runs_total The total number of times a task runs. To see the


failure rate of tkgi_sli_task_runs_total, divide the
value of tkgi_sli_task_failures by the value of
tkgi_sli_task_runs.

tkgi_sli_task_failures_total{task="clusters"} The total number of times the tkgi clusters


command fails.

tkgi_sli_task_failures_total{task="get-credentials"} The total number of times the tkgi get-credentials


command fails.

tkgi_sli_task_failures_total{task="login"} The total number of times the tkgi login command


fails.

tkgi_sli_task_failures_total{task="plans"} The total number of times the tkgi plans command


fails.

Certificate Expiration Metric Exporter VM


Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy the
certificate expiration metric exporter VM, cert-expiration-exporter, which collects metrics that show
when Tanzu Operations Manager certificates are due to expire. For more information, see Monitoring
Certificate Expiration.

The following table describes the metric the certificate expiration metric exporter VM generates:

Metric Description

129
Healthwatch for VMware Tanzu

ssl_certificate_expiry_seconds{exported_instance=~"CERT The time in seconds until a certificate expires, where


IFICATE"} CERTIFICATE is the name of the certificate.

Prometheus VM
In the Canary URLs pane of the Healthwatch tile, you configure target URLs to which the Blackbox
Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge
the overall health and accessibility of an app, runtime, or deployment.

On the Prometheus VM, tsdb, the Blackbox Exporter job, blackbox-exporter, generates canary test
metrics.

The following table describes each metric the Blackbox Exporters in the Prometheus instance generates:

Metric Description

probe_dns_additional_rrs The number of entries in the additional resource record


list of the DNS server for the canary target URL.

probe_dns_answer_rrs The number of entries in the answer resource record


list of the DNS server for the canary target URL.

probe_dns_authority_rrs The number of entries in the authority resource record


list of the DNS server for the canary target URL.

probe_dns_duration_seconds The duration of the canary test DNS request by phase.

probe_dns_lookup_time_seconds The number of seconds the canary test DNS lookup


takes to complete.

probe_dns_serial The serial number of the DNS zone for your canary
target URL.

probe_duration_seconds The number of seconds the canary test takes to


complete.

probe_failed_due_to_regex Whether the canary test failed due to a regex error in


the canary test configuration. A value of 0 indicates that
the canary test did not fail due to a regex error. A value
of 1 indicates that the canary test did fail due to a regex
error.

probe_http_content_length The length of the HTTP content response from the


canary target URL.

probe_http_duration_seconds The duration of the canary test HTTP request by


phase, summed over all redirects.

probe_http_last_modified_timestamp_seconds The last-modified timestamp for the HTTP response


header in Unix time.

probe_http_redirects The number of redirects the canary test goes through


to reach the canary target URL.

probe_http_ssl Whether the canary test used TLS for the final redirect.
A value of 0 indicates that the canary test did not use
TLS for the final redirect. A value of 1 indicates that the
canary test did use TLS for the final redirect.

130
Healthwatch for VMware Tanzu

probe_http_status_code The status code of the HTTP response from the


canary target URL.

probe_http_uncompressed_body_length The length of the uncompressed response body.

probe_http_version The version of HTTP the canary test HTTP response


uses.

probe_icmp_duration_seconds The duration of the canary test ICMP request by


phase.

probe_icmp_reply_hop_limit If the canary test protocol is IPv6: The replied


packet hop limit.
If the canary test protocol is IPv4: The time-to-live
count.

probe_ip_addr_hash The hash of the IP address of the canary target URL.

probe_ip_protocol Whether the IP protocol of the canary test is IPv4 or


IPv6.

probe_ssl_earliest_cert_expiry The earliest TLS certificate expiration for the canary


test URL in Unix time.

probe_ssl_last_chain_expiry_timestamp_seconds The last TLS chain expiration for the canary test URL
in Unix time.

probe_ssl_last_chain_info Information about the TLS leaf certificate for the canary
test URL.

probe_success Whether the canary test succeeded or failed. A value of


0 indicates that the canary test failed. A value of 1
indicates that the canary test succeeded.

probe_tls_version_info The TLS version the canary test uses, or NaN when
unknown.

bosh_deployments_status Whether any BOSH deployments other than bosh-


health are running. A value of 0 indicates that no other
BOSH deployments are running on the BOSH Director.
A value of 1 indicates that other BOSH deployments
are running on the BOSH Director.

SVM Forwarder VM - Platform Metrics


Super value metrics (SVMs) are composite metrics that the Prometheus instance in Healthwatch v2.2+
generates. The SVM Forwarder VM, svm-forwarder, then sends these metrics back into the Loggregator
Firehose so third-party nozzles can send them to external destinations, such as a remote server or external
aggregation service.

The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to
the Loggregator Firehose. For more information about SVMs related to Healthwatch component metrics, see
SVM Forwarder VM - Healthwatch Component Metrics below.

The following table describes each platform metric the SVM Forwarder VM sends to the Loggregator
Firehose:

131
Healthwatch for VMware Tanzu

Metric Description

Diego_AppsDomainSynced Whether Cloud Controller and Diego are in sync. A


value of 0 indicates that Cloud Controller and Diego are
not in sync. A value of 1 indicates that Cloud Controller
and Diego are in sync.

Diego_AvailableFreeChunksDisk The available free chunks of disk across all Diego


Cells.

Diego_AvailableFreeChunks The available free chunks of memory across all Diego


Cells.

Diego_LRPsAdded_1H The rate of change in running app instances over a


one-hour period.

Diego_TotalAvailableDiskCapacity_5M The remaining Diego Cell disk available across all


Diego Cells over a five-minute period.

Diego_TotalAvailableMemoryCapacity_5M The remaining Diego Cell memory available across all


Diego Cells over a five-minute period.

Diego_TotalPercentageAvailableContainerCapacity_5M The percentage of total available container capacity


across all Diego Cells over a five-minute period.

Diego_TotalPercentageAvailableDiskCapacity_5M The percentage of total available disk across all Diego


Cells over a five-minute period.

Diego_TotalPercentageAvailableMemoryCapacity_5M The percentage of total available memory across all


Diego Cells over a five-minute period.

Doppler_MessagesAverage_1M The average Doppler message rate over a one-minute


period.

Firehose_LossRate_1H The log transport loss rate over a one-hour period.

Firehose_LossRate_1M The log transport loss rate over a one-minute period.

SyslogAgent_LossRate_1M The Syslog Agent loss rate over a one-minute period.

SyslogDrain_RLP_LossRate_1M The Reverse Log Proxy loss rate over a one-minute


period.

bosh_deployment Represents bosh_deployments_status from the


BOSH deployment metric exporter VM, which indicates
whether any BOSH deployments other than the one
created by the BOSH health metric exporter VM are
running. A value of 0 indicates that no other BOSH
deployments are running on the BOSH Director. A value
of 1 indicates that other BOSH deployments are
running on the BOSH Director.

health_check_bosh_director_success Whether the BOSH SLI test suite that the BOSH health
metric exporter VM ran succeeded or failed. A value of
0 indicates that the BOSH SLI test suite failed. A value
of 1 indicates that the BOSH SLI test suite succeeded.

health_check_CanaryApp_available Whether the canary app is available. A value of 0


indicates that the canary app is unavailable. A value of
1 indicates that the canary app is available.

132
Healthwatch for VMware Tanzu

health_check_CanaryApp_responseTime The response time of the canary app in seconds.

health_check_cliCommand_delete Whether the cf delete command succeeds or fails. A


value of 0 indicates that the cf delete command
failed. A value of 1 indicates that the cf delete
command succeeded.

health_check_cliCommand_login Whether the cf login command succeeds or fails. A


value of 0 indicates that the cf login command failed.
A value of 1 indicates that the cf login command
succeeded.

health_check_cliCommand_logs Whether the cf logs command succeeds or fails. A


value of 0 indicates that the cf logs command failed. A
value of 1 indicates that the cf logs command
succeeded.

health_check_cliCommand_probe_count The number of cf CLI health checks that Healthwatch


completes in the measured time period.

health_check_cliCommand_pushTime The amount of time it takes the cf CLI to push an app.

health_check_cliCommand_push Whether the cf push command succeeds or fails. A


value of 0 indicates that the cf push command failed. A
value of 1 indicates that the cf push command
succeeded.

health_check_cliCommand_start Whether the cf start command succeeds or fails. A


value of 0 indicates that the cf start command failed.
A value of 1 indicates that the cf start command
succeeded.

health_check_cliCommand_stop Whether the cf stop command succeeds or fails. A


value of 0 indicates that the cf stop command failed. A
value of 1 indicates that the cf stop command
succeeded.

health_check_cliCommand_success The overall success of the SLI tests that Healthwatch


runs on the cf CLI.

uaa_throughput_rate The lifetime number of requests completed by the UAA


VM, emitted per UAA instance in Tanzu Platform for
Cloud Foundry. This number includes health checks.

Healthwatch Component Metrics


The following metrics exist for the purpose of monitoring the Healthwatch components:

TKGI Metric Exporter VM

Healthwatch Exporter for Tanzu Platform for Cloud Foundry Metric Exporter VMs

Prometheus Exposition Endpoint

SVM Forwarder VM - Healthwatch Component Metrics

133
Healthwatch for VMware Tanzu

TKGI Metric Exporter VM


Healthwatch Exporter for TKGI deploys a TKGI metric exporter VM, pks-exporter, that collects BOSH
system metrics for TKGI and converts them to a Prometheus exposition format.

The following table describes each metric the TKGI metric exporter VM collects and converts:

Metric Description

healthwatch_boshExporter_ingressLatency_seconds_bucket The number of seconds the TKGI metric exporter VM


takes to process a batch of Loggregator envelopes,
grouped by latency. This metric is also called a bucket
of ingress latency metrics.

healthwatch_boshExporter_ingressLatency_seconds_count The total number of metrics across all ingress latency


metric buckets.

healthwatch_boshExporter_ingressLatency_seconds_sum The total value of the metrics across all ingress latency
metric buckets.

healthwatch_boshExporter_ingress_envelopes The number of Loggregator envelopes the observability


metrics agent on the TKGI metric exporter VM
receives.

healthwatch_boshExporter_metricConversion_seconds_bucke The number of seconds the TKGI metric exporter VM


t takes to convert a BOSH metric to a Prometheus
gauge, grouped by how many ran in less than a certain
amount of time. This metric is also called a bucket of
gauge conversion duration metrics.

healthwatch_boshExporter_metricConversion_seconds_count The total number of metrics across all gauge


conversion duration metric buckets.

healthwatch_boshExporter_metricConversion_seconds_sum The total value of the metrics across all gauge


conversion duration metric buckets.

healthwatch_boshExporter_status The health status of the TKGI metric exporter VM. A


value of 0 indicates that the TKGI metric exporter VM is
not responding. A value of 1 indicates that the TKGI
metric exporter VM is running and healthy.

Healthwatch Exporter for Tanzu Platform for Cloud Foundry Metric


Exporter VMs
Healthwatch Exporter deploys metric exporter VMs that collect metrics from the Loggregator Firehose and
convert them into a Prometheus exposition format.

Each of the following metric exporter VMs collects and converts a single metric type from the Loggregator
Firehose. The names of the metric exporter VMs correspond to the types of metrics they collect and
convert:

Counter Metric Exporter VM

Gauge Metric Exporter VM

134
Healthwatch for VMware Tanzu

The counter metric exporter VM, pas-exporter-counter, collects counter metrics from the Loggregator
Firehose and converts them into a Prometheus exposition format.

The following table describes each metric the counter metric exporter VM collects and converts:

Metric Description

healthwatch_pasExporter_counterConversion_seconds The number of seconds the counter metric exporter


VM takes to convert a Loggregator counter envelope to
a Prometheus counter.

healthwatch_pasExporter_ingressLatency_seconds The number of seconds the counter metric exporter


VM takes to process a batch of Loggregator counter
envelopes.

healthwatch_pasExporter_ingress_envelopes The number of Loggregator counter envelopes the


observability metrics agent on the counter metric
exporter VM receives.

healthwatch_pasExporter_status The health status of the counter metric exporter VM. A


value of 0 indicates that the counter metric exporter VM
is not responding. A value of 1 indicates that the
counter metric exporter VM is running and healthy.

The gauge metric exporter VM, pas-exporter-gauge, collects gauge metrics from the Loggregator
Firehose and converts them into a Prometheus exposition format.

The following table describes each metric the gauge metric exporter VM collects and converts:

Metric Description

healthwatch_pasExporter_gaugeConversion_seconds The number of seconds the gauge metric exporter VM


takes to convert a Loggregator gauge envelope to a
Prometheus gauge.

healthwatch_pasExporter_ingressLatency_seconds The number of seconds the gauge metric exporter VM


takes to process a batch of Loggregator gauge
envelopes.

healthwatch_pasExporter_ingress_envelopes The number of Loggregator gauge envelopes the


observability metrics agent on the gauge metric
exporter VM receives.

healthwatch_pasExporter_status The health status of the gauge metric exporter VM. A


value of 0 indicates that the gauge metric exporter VM
is not responding. A value of 1 indicates that the gauge
metric exporter VM is running and healthy.

Prometheus Exposition Endpoint


Most of the metric exporter VMs generate metrics concerning how the Prometheus instance interacts with
the /metrics endpoint on each metric exporter VM.

The following table describes each metric the /metrics endpoint on each metric exporter VM generates:

135
Healthwatch for VMware Tanzu

Metric Description

healthwatch_prometheusExpositionLatency_seconds The number of seconds the metric exporter VM takes


to render a Prometheus scrape page.

healthwatch_prometheusExposition_histogramMapConversion The number of seconds the metric exporter VM takes


to convert histogram collection to a map.

healthwatch_prometheusExposition_metricMapConversion The number of seconds the metric exporter VM takes


to convert metrics collection to a map.

healthwatch_prometheusExposition_metricSorting The number of seconds the metric exporter VM takes


to sort metrics when rendering a Prometheus scrape
page.

SVM Forwarder VM - Healthwatch Component Metrics


SVMs are composite metrics that the Prometheus instance in Healthwatch v2.2+ generates. The SVM
Forwarder VM, svm-forwarder, then sends these metrics back into the Loggregator Firehose so third-party
nozzles can send them to external destinations, such as a remote server or external aggregation service.

The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to
the Loggregator Firehose. For more information about SVMs related to platform metrics, see SVM
Forwarder VM - Platform Metrics above.

The following table describes each Healthwatch component metric the SVM Forwarder VM sends to the
Loggregator Firehose:

Metric Description

failed_scrapes_total The total number of failed scrapes for the target


source_id.

last_total_attempted_scrapes The total number of attempted scrapes during the most


recent round of scraping.

last_total_failed_scrapes The total number of failed scrapes during the most


recent round of scraping.

last_total_scrape_duration The time in milliseconds to scrape all targets during the


most recent round of scraping.

scrape_targets_total The total number of scrape targets identified from the


configuration file for the Prometheus VM.

136
Healthwatch for VMware Tanzu

Troubleshooting Healthwatch

This topic describes how to troubleshoot problems and known issues that may arise when deploying or
operating Healthwatch for VMware Tanzu, Healthwatch Exporter for VMware Tanzu Platform for Cloud
Foundry, and Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI).

Accessing VM UIs for Troubleshooting


The sections below describe how to access the user interfaces (UIs) of the Prometheus and Alertmanager
VMs for troubleshooting.

Access the Prometheus UI


The Prometheus UI allows you to view various processes on the VMs in the Prometheus instance that the
Healthwatch tile deploys, including alerts that are currently running and the health status of scrape targets.
Because the Prometheus UI is not secure, the Healthwatch tile does not include it. However, you can
access the Prometheus UI to troubleshoot the Prometheus instance.

To access the Prometheus UI:

1. Run:

bosh deployments

This command returns a list of all BOSH deployments that are currently running.

2. Record the name of your Healthwatch deployment.

3. Run:

bosh -d DEPLOYMENT-NAME ssh tsdb/0 --opts='-L 9090:localhost:9090'

Where DEPLOYMENT-NAME is the name of your Healthwatch deployment that you recorded in the
previous step.

4. Go to the Tanzu Ops Manager Installation Dashboard.

5. Click the Healthwatch tile.

6. Select the Credentials tab.

7. In the Tsdb Client Mtls row, click Link to Credential.

8. Record the certificate and private key for Tsdb Client Mtls.

137
Healthwatch for VMware Tanzu

9. Add the certificate and private key for Tsdb Client Mtls that you recorded in the previous step to
the keystore for your operating system.

To store the Tsdb Client Mtls certificate and key on macOS:

1. Create a cert.pem file containing the Tsdb Client Mtls certificate.

2. Create a cert.key file containing the Tsdb Client Mtls private key.

3. Change the access permissions on the certificate and private key files to 0600.

For example:

chmod 0600 ~/Downloads/cert.key


chmod 0600 ~/Downloads/cert.pem

4. To import the Tsdb Client Mtls private key into the macOS keychain:

security import KEY-PATH -k ~/Library/Keychains/login.keychain-db

Where KEY-PATH is the path to the cert.key file. For example,


~/Downloads/cert.key.

5. To import the Tsdb Client Mtls certificate into the macOS keychain:

1. To open the macOS Keychain Access app:


1. Type CMD-SPACE followed by typing the word keychain.

2. Select the Keychain Access app on the displayed list.

2. Select File -> Import Items.

3. Select the Tsdb Client Mtls certificate cert.pem file.

10. In a web browser, navigate to localhost:9090. If your browser prompts you to specify which
certificate to use for mTLS, select the certificate you added to your operating system keystore.

On macOS:

1. Open a browser to https://localhost:9090.

2. If you are challenged by a security warning, select Advanced -> Proceed


Anyway. Alternatively, type the letters thisisunsafe into the webpage.

3. In the displayed dialog, select the Tsdb Client Mtls certificate you added to the
macOS keychain and click OK.

4. When prompted, provide the keychain access password, and select Always
Allow.

The Prometheus UI should display in your browser.

Access the Alertmanager UI


The Alertmanager UI allows you to view which alerts are currently running. Because the Alertmanager UI is
not secure, the Healthwatch tile does not include it. However, you can access the Alertmanager UI to
troubleshoot or silence alerts.

138
Healthwatch for VMware Tanzu

To access the Alertmanager UI:

1. Run:

bosh deployments

This command returns a list of all BOSH deployments that are currently running.

2. Record the name of your Healthwatch deployment.

3. Run:

bosh -d DEPLOYMENT-NAME ssh tsdb/0 --opts='-L 8080:localhost:10401'

Where DEPLOYMENT-NAME is the name of your Healthwatch deployment that you recorded in the
previous step.

4. In a web browser, navigate to localhost:8080. The Alertmanager UI appears.

Troubleshooting Known Issues


The sections below describe how to troubleshoot known issues in Healthwatch and Healthwatch Exporter
for TKGI.

Smoke Tests Errand Fails When Deploying Healthwatch


When you deploy Healthwatch, the Smoke Tests errand fails with the following error message:

querying for grafana up should be greater than 0

The Smoke Tests errand fails because the Prometheus instance fails to scrape metrics from the Grafana
instance. Potential causes of this failure include:

There is a network issue between the Prometheus instance and Grafana instance.

The Grafana instance uses a certificate that does not match the certificate authority (CA) you
configured in the Grafana pane in the Healthwatch tile. This could occur because the CA you
configured in the Grafana pane is either a self-signed certificate or a different CA from the one that
generated the certificate. As a result, the Prometheus instance does not trust the certificate that
the Grafana instance uses. For more information about configuring a CA for the Grafana instance,
see (Optional) Configure Grafana in Configuring Healthwatch.

To find out why the Prometheus instance fails to scrape metrics from the Grafana instance:

1. Log in to one of the VMs in the Prometheus instance by following the procedure in the Tanzu
Operations Manager documentation.

2. View information about the Grafana instance scrape target by running:

curl -vk https://localhost:9090/api/v1/targets --cacert /var/vcap/jobs/promethe


us/config/certs/prometheus_ca.pem --cert /var/vcap/jobs/prometheus/config/cert
s/prometheus_certificate.pem --key /var/vcap/jobs/prometheus/config/certs/prome

139
Healthwatch for VMware Tanzu

theus_certificate.key | /var/vcap/packages/prometheus_backup_jq/bin/jq '.data.a


ctiveTargets[] | select(.scrapePool == "grafana")'

The lastError field in the command output describes the reason for the Prometheus instance
failing to scrape the Grafana instance.

TKGI Metric Exporter VM Fails to Connect to the BOSH Director


When the TKGI metric exporter VM attempts to connect to the BOSH Director, you see the following error:

ERROR [context.UaaContext [ForkJoinPool-1-worker-3]] javax.net.ssl.SSLHandshakeExcepti


on: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path d
oes not chain with any of the trust anchors
ERROR [ingress.TokenCallCredentials [ForkJoinPool-1-worker-3]] Caught error retrieving
UAA token: PKIX path validation failed: java.security.cert.CertPathValidatorException:
Path does not chain with any of the trust anchors
INFO [ingress.EventStreamObserver [ForkJoinPool-1-worker-3]] io.grpc.StatusRuntimeExc
eption: UNAUTHENTICATED

This error appears when the TKGI metric exporter VM cannot verify that the certificate chain of the UAA
server for the BOSH Director is valid. To allow the TKGI metric exporter VM to connect to the BOSH
Director, you must correct any certificate chain errors.

To check for certificate chain errors in the UAA server for the BOSH Director:

1. Log in to the TKGI metric exporter VM by following the procedure in the Tanzu Operations Manager
documentation.

2. View the certificate that the UAA server uses by running:

openssl s_client -connect 10.0.0.5:8443

3. Save the certificate as a cert.pem file.

4. Run:

openssl verify cert.pem

If the command returns an OK message, the certificate is trusted and has a valid certificate chain. If
the command returns any other message, see the OpenSSL documentation to troubleshoot.

BOSH Health Metrics Cause Errors When Two Healthwatch Exporter


Tiles Are Installed
When you install both Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter
for TKGI on the same Tanzu Operations Manager foundation, the BOSH Director Status panel in the BOSH
Director Health dashboard in the Grafana UI shows “Not Running”, and your BOSH Director deployment
returns the following error:

Director responded with non-successful status code '401' response '{"code":600000,"des


cription":"Require one of the scopes: bosh.admin, bosh.750587e9-eae5-494f-99c4-5ca429b

140
Healthwatch for VMware Tanzu

13959.admin, bosh.teams.p-healthwatch2-pas-exporter-b3a337d7ec4cca94f166.admin"}'

This occurs because both Healthwatch Exporter tiles deploy a BOSH health metric exporter VM, and both
BOSH health metric exporter VMs are named bosh-health-exporter. This causes the two sets of metrics
to conflict with each other.

To address this, you must scale the BOSH health metric exporter VM down to zero instances in one of the
Healthwatch Exporter tiles.

To scale the BOSH health metric exporter VM down to zero instances in one of the Healthwatch Exporter
tiles:

1. Go to the Tanzu Ops Manager Installation Dashboard.

2. Click the Healthwatch Exporter for Tanzu Kubernetes Grid - Integrated tile or Healthwatch Exporter
for Tanzu Platform for Cloud Foundry tile.

3. Select Resource Config.

4. In the Bosh Health Exporter row, select 0 from the Instances dropdown.

5. Click Save.

6. Return to the Tanzu Ops Manager Installation Dashboard.

7. Click Review Pending Changes.

8. Click Apply Changes.

Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service


Accounts
If you run SLI tests for TKGI through Healthwatch Exporter for TKGI, and you do not have an OpenID
Connect (OIDC) provider for your Kubernetes clusters configured for TKGI, the TKGI SLI exporter VM does
not automatically clean up the service accounts that it creates while running the TKGI SLI test suite.

To fix this issue, either upgrade to Healthwatch v2.2.1 or configure an OIDC provider as the identity provider
for your Kubernetes clusters in the TKGI tile. This cleans up the service accounts that the TKGI SLI
exporter VM creates in future TKGI SLI tests, but does not clean up existing service accounts from
previous TKGI SLI tests. For more information about configuring an OIDC provider in TKGI, see the TKGI
documentation.

VMware recommends that you manually delete existing service accounts from previous TKGI SLI tests if
running the tkgi get-credentials command returns an error similar to the following example:

Error: Status: 500; ErrorMessage: nil; Description: Create Binding: Timed out waiting
for secrets; ResponseError: nil

Manually deleting service accounts also deletes the secrets and Clusterrolebindings associated with
those service accounts.

To manually delete a service account:

1. In a terminal window, run:

141
Healthwatch for VMware Tanzu

kubectl delete serviceaccount -n NAMESPACE SERVICE-ACCOUNT

Where:

NAMESPACE is the namespace that contains the service account you want to delete.

SERVICE-ACCOUNT is the service account you want to delete.

BBR Backup Snapshots Fill Disk Space on Prometheus VMs


In Healthwatch v2.2.0, the backup scripts for Prometheus VMs do not clean up the intermediary snapshots
created by BOSH Backup and Restore (BBR). This results in the disk space on Prometheus VMs filling up.

To fix this issue, either upgrade to Healthwatch v2.2.1 or manually clean up the snapshots. To manually
clean up the snapshots:

1. Log in to the Prometheus VM you want to clean up by following the procedure in the Tanzu
Operations Manager documentation.

2. Run:

sudo -i

3. Empty the snapshots folder for the Prometheus VM by running:

rm -rf /var/vcap/store/prometheus/snapshots/*

4. Change into the snapshots folder by running:

cd /var/vcap/store/prometheus/snapshots

5. Verify that the /var/vcap/store/prometheus/snapshots directory is empty by running:

ls

Troubleshooting Missing TKGI Cluster Metrics


The sections below describe how to troubleshoot missing TKGI cluster metrics in the Grafana UI.

To find out why the Prometheus instance fails to scrape metrics from your TKGI clusters, see Diagnose
Prometheus Scrape Job Failure below.

Potential causes of this failure include:

You are using TKGI v1.10.0 or v1.10.1. For more information, see No Data on Kubernetes Nodes
Dashboard for TKGI v1.10 below.

You are using TKGI v1.12. For more information, see No Data on Kubernetes Nodes Dashboard for
TKGI v1.12 in Healthwatch Release Notes.

142
Healthwatch for VMware Tanzu

You are using TKGI to monitor Windows clusters. For more information, see No Data on
Kubernetes Nodes Dashboard for Windows Clusters in Healthwatch Release Notes.

The Prometheus instance in the Healthwatch tile could not detect or create scrape jobs for the
clusters, causing TKGI cluster discovery to fail. For more information, see Configure DNS for Your
TKGI Cluster below.

Diagnose Prometheus Scrape Job Failure


When the Kubernetes Nodes dashboard in the Grafana UI does not show metrics data, the Prometheus
instance in the Healthwatch tile has failed to scrape metrics from on-demand Kubernetes clusters created
through the TKGI API.

To find out why the Prometheus instance fails to scrape metrics from your TKGI clusters:

1. Log in to one of the VMs in the Prometheus instance by following the procedure in the Tanzu
Operations Manager documentation.

2. View information about the Prometheus instance scrape targets by running:

curl -vk https://localhost:9090/api/v1/targets --cacert /var/vcap/jobs/promethe


us/config/certs/prometheus_ca.pem --cert /var/vcap/jobs/prometheus/config/cert
s/prometheus_certificate.pem --key /var/vcap/jobs/prometheus/config/certs/prome
theus_certificate.key | /var/vcap/packages/prometheus_backup_jq/bin/jq .

3. Find the scrape jobs for your TKGI clusters. The lastError field describes the reason for the
Prometheus instance failing to scrape your TKGI clusters.

No Data on Kubernetes Nodes Dashboard for TKGI v1.10


If you are using TKGI v1.10.0 or v1.10.1, the Kubernetes Nodes dashboard in the Grafana UI might not
show data for individual pods. This is due to a known issue in Kubernetes v1.19.6 and earlier and
Kubernetes v1.20.1 and earlier.

To fix this issue, upgrade to TKGI v1.10.2 or later. For more information about upgrading to TKGI v1.10.2 or
later, see the TKGI documentation.

Configure DNS for Your TKGI Clusters


When TKGI cluster discovery fails, you see the following error:

2020-05-20 19:24:02 ERROR k8s.K8sClient [parallel-1] Failed to make request


java.net.UnknownHostException: CLUSTER-NAME.ENVIRONMENT-DOMAIN

Where:

CLUSTER-NAME is the name of your TKGI cluster.

ENVIRONMENT-DOMAIN is the domain of your TKGI foundation.

143
Healthwatch for VMware Tanzu

This occurs because the TKGI API cannot access your TKGI clusters from the Internet. To resolve this
issue, you must configure a DNS entry for the control plane of each of your TKGI clusters in the console for
your IaaS.

To configure DNS entries for the control planes of your TKGI clusters:

1. Find the IP addresses and hostnames of the control plane of each of your TKGI clusters. For more
information, see the TKGI documentation.

2. Record the Kubernetes Master IP(s) and Kubernetes Master Host from the output you viewed in
the previous step. For more information, see the TKGI documentation.

3. In a web browser, log in to the user console for your IaaS.

4. For each TKGI cluster, find the public IP address of the VM that has an internal IP address
matching the Kubernetes Master IP(s) you recorded in a previous step. For more information, see
the documentation for your IaaS:

AWS: To find the public IP address of a Linux instance, see the AWS documentation for
Linux instances of Amazon EC2. To find the public IP address for a Windows instance, see
the AWS documentation for Windows instances of Amazon EC2.

Azure: To create or view the public IP address for an Azure VM, see the Azure
documentation.

GCP: To find the public IP address for a GCP VM, see the GCP documentation.

OpenStack: To associate a floating IP address to an OpenStack VM, see the OpenStack


documentation.

vSphere: To find the public IP address of a vSphere VM, see the vSphere documentation.

5. For each TKGI cluster, create an A record in your DNS server that points to the public IP address of
the control plane of the TKGI cluster that you recorded in the previous step. For more information,
see the documentation for your IaaS:

AWS: For more information about configuring a DNS entry in the Amazon VPC console,
see the AWS documentation.

Azure: For more information about configuring an A record in Azure DNS, see the Azure
documentation.

GCP: For more information about adding an A record to Cloud DNS, see the GCP
documentation.

OpenStack: For more information about configuring a DNS entry in the OpenStack internal
DNS, see the OpenStack documentation.

vSphere: For more information about configuring a DNS entry in the vCenter Server
Appliance, see the vSphere documentation.

6. Wait for your DNS server to update.

144
Healthwatch for VMware Tanzu

Troubleshooting Healthwatch Exporter Tiles Using Grafana


UI Dashboards
By default, the Grafana UI includes dashboards for Healthwatch Exporter tiles under the Healthwatch
folder.

Viewing Healthwatch Exporter Tile Metrics


The Healthwatch - SLOs dashboard in the Grafana UI displays a row for each metric exporter VM you
select from the corresponding metric exporter instance dropdown at the top of the page. Each row contains
four panels:

Up: The current health of the Prometheus endpoint on the metric exporter VM. A value of 1
indicates that the Prometheus endpoint is healthy. A value of 0 or missing data indicates that either
the Prometheus endpoint is unresponsive or the Prometheus instance failed to scrape the
Prometheus endpoint. For more information, see the Prometheus documentation.

Exporter SLO: The percentage of time that the Healthwatch Exporter tile was up and running over
the selected time period.

Error Budget Remaining: How many minutes are left in the error budget before exceeding the
selected Uptime SLO Target over the selected time period.

Minutes of Downtime: How many minutes the Healthwatch Exporter tiles were down during the
selected time period.

Troubleshooting Healthwatch Exporter for Tanzu Platform for Cloud


Foundry
The Healthwatch - Exporter Troubleshooting dashboard in the Grafana UI displays metrics that allow you
to monitor the performance of each Healthwatch Exporter for Tanzu Platform for Cloud Foundry tile installed
on your Tanzu Operations Manager foundations. You can use these metrics to troubleshoot when you see
inconsistent graphs for a particular metric type, or if a Healthwatch Exporter tile is not behaving as
expected.

These dashboards contain the following panels:

Exporter Info: A listing of the healthwatch_pasExporter_status metric, showing runtime


information for Healthwatch Exporter for Tanzu Platform for Cloud Foundry.

Exporter JVM Memory: A graph of the jvm_memory_bytes_used, jvm_memory_bytes_commited,


and jvm_memory_bytes_init metrics, showing the number of used, committed, and initial bytes in
a given Java virtual machine (JVM) memory area over the selected time period. You can use this
graph to check for memory leaks.

Ephemeral Disk Usage: A gauge of the system_disk_ephemeral_percent metric, showing the


percentage of the ephemeral disk used. You can use this gauge to determine whether the disk is
reaching capacity.

145
Healthwatch for VMware Tanzu

Rate of Garbage Collection: A graph of the jvm_gc_collection_seconds_sum metric, showing


the rate of JVM garbage collection over the selected time period. You can use this graph to
determine whether the JVM garbage collection is functional.

Rate of Envelope Ingress: A graph of the healthwatch_pasExporter_ingress_envelopes


metric, showing the rate of Loggregator envelope ingress over the selected time period. You can
use this graph to check for spikes in the number of Loggregator envelopes that the metric exporter
VMs receive.

CPU Usage: A graph of the cpu_usage_user metric, showing the percentage of CPU used over the
selected time period. You can use this graph to determine whether the amount of CPU used by
Healthwatch Exporter for Tanzu Platform for Cloud Foundry is reaching capacity.

Exporter VM Threads: A graph of the jvm_threads_current and jvm_threads_peak metrics,


showing the current and peak thread counts of a given JVM over the selected time period. You can
use this graph to check whether Healthwatch Exporter for Tanzu Platform for Cloud Foundry is
leaking threads.

146
Healthwatch for VMware Tanzu

Healthwatch Components and Resource


Requirements

This topic describes the Healthwatch for VMware Tanzu tile components and the resource requirements for
the Healthwatch tile.

For information about the metric exporter VMs that the Healthwatch Exporter for VMware Tanzu Platform for
Cloud Foundry and Healthwatch Exporter for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) tiles
deploy, see Healthwatch Metrics.

Overview of Healthwatch Components


The main components of the Healthwatch tile are Prometheus, Grafana, MySQL, and MySQL Proxy:

Prometheus: The Prometheus instance scrapes and stores metrics from the Healthwatch Exporter
tiles, allows you to configure alerts with Alertmanager, and sends canary tests to target URLs with
Blackbox Exporter.

Grafana: The Healthwatch tile exports collected metrics to dashboards in the Grafana UI, allowing
you to visualize the data with charts and graphs and create customized dashboards for long-term
monitoring and troubleshooting.

MySQL: MySQL is used only to store your Grafana settings and does not store any time series
data.

MySQL Proxy: MySQL Proxy routes client connections to healthy MySQL nodes.

The Healthwatch tile automatically selects the instance type that is best suited for each job based on the
available resources for your deployment.

For more information about Healthwatch components, see Healthwatch Component VMs and Resource
Requirements for the Healthwatch Tile below.

By default, the Healthwatch tile deploys two Prometheus VMs and only one each of the Grafana, MySQL,
and MySQL Proxy VMs. For information about scaling these resources, see Scale Healthwatch below.

Healthwatch Component VMs


The table below explains each Healthwatch tile component and which VM deploys it:

Component VM Name Description

147
Healthwatch for VMware Tanzu

Prometheus tsdb Collects metrics related to the functionality of platform-level and runtime-level
components

Stores metrics for up to six weeks

Can write to remote storage in addition to its local time-series database


(TSDB)

Manages and sends alerts through Alertmanager

Conducts canary tests through the Blackbox Exporter

Grafana grafana Deploys the Grafana UI

Authenticates user login credentials

Organizes metrics data in charts and graphs

MySQL pxc Stores the Grafana settings you configure

MySQL Proxy pxc-proxy Routes client connections to healthy MySQL nodes and away from unhealthy MySQL
nodes

Resource Requirements for the Healthwatch Tile


The following table provides the default resource and IP requirements for installing the Healthwatch tile:

Resource Instances CPUs RAM Ephemeral Disk Persistent Disk

Prometheus 2 4 16 GB 5 GB 512 GB

Grafana 1 1 4 GB 5 GB 5 GB

MySQL 1 1 4 GB 5 GB 10 GB

MySQL Proxy 1 1 4 GB 5 GB 5 GB

Scale Healthwatch
By default, the Healthwatch tile deploys two Prometheus VMs, one Grafana VM, one MySQL VM, and one
MySQL Proxy VM.

To scale Healthwatch, see:

Scale Healthwatch Component VMs

Configure Healthwatch for High Availability

Remove Grafana from Healthwatch

Scale Healthwatch Component VMs


By default, the Healthwatch tile deploys two Prometheus VMs, one Grafana VM, one MySQL VM, and one
MySQL Proxy VM.

148
Healthwatch for VMware Tanzu

To scale your Healthwatch component resources:

1. Open the Resource Config pane of the Healthwatch tile.

2. Scale Healthwatch tile resources. You can scale Healthwatch tile resources either vertically or
horizontally:

Prometheus: You can scale the Prometheus instance vertically only. Do not scale
Prometheus horizontally.

Grafana: Healthwatch deploys a single Grafana VM by default. If you need Grafana to


have high availability (HA), you can scale the Grafana instance horizontally. If you make
Grafana HA, VMware recommends you also scale your MySQL instances.

MySQL: If you make Grafana HA, VMware recommends scaling your MySQL instance to
three VMs.

MySQL Proxy: If you make Grafana HA, VMware recommends scaling your MySQL Proxy
instance to two VMs.

To remove Grafana from your Healthwatch deployment, set the number of Grafana,
MySQL, and MySQL Proxy instances to `0`. For more information, see Remove
Grafana from Healthwatch below.

For more information about vertical and horizontal scaling, see Scaling platform capacity in the Tanzu
Platform for Cloud Foundry documentation.

Configure Healthwatch for High Availability


If you want to make your Healthwatch deployment HA, you must deploy a redundant number of Healthwatch
tile component instances. This increases the capacity and availability of those components, which
decreases the chances of downtime.

To configure Healthwatch for HA:

1. Complete the steps in Scale Healthwatch Component VMs with the following HA configurations:

Prometheus: Healthwatch deploys two Prometheus VMs by default. With two VMs in the
Prometheus instance, Prometheus and Alertmanager are HA by default.

Grafana: Scale the Grafana instance horizontally.

MySQL: Scale your MySQL instance to three VMs.

MySQL Proxy: Scale your MySQL Proxy instance to two VMs.

Remove Grafana from Healthwatch


If you do not want to use any Grafana instances in your Healthwatch deployment, you can remove Grafana.
For example, you might want to remove Grafana from your Healthwatch deployment after configuring the
Prometheus instance to send metrics to an external Grafana instance.

149
Healthwatch for VMware Tanzu

If you remove Grafana from your Healthwatch deployment, scale Grafana, MySQL, and
MySQL Proxy to 0 at the same time. MySQL is used only to store Grafana settings and
MySQL Proxy is used only to route client connections to healthy MySQL nodes in an HA
Grafana deployment. Neither component is necessary if you have not deployed Grafana.

To remove Grafana from your Healthwatch deployment:

1. Open the Resource Config pane of the Healthwatch tile.

2. Set the number of Grafana, MySQL, and MySQL Proxy instances to 0.

150

You might also like