Learn about the VMware Tanzu Application Service Integration.

This page provides an overview of what you can do with the VMware Tanzu Application Service integration. The documentation pages only for a limited number of integrations contain the setup steps and instructions. If you do not see the setup steps here, navigate to the Operations for Applications GUI. The detailed instructions for setting up and configuring all integrations, including the VMware Tanzu Application Service integration are on the Setup tab of the integration.

  1. Log in to your Operations for Applications instance.
  2. Click Integrations on the toolbar, search for and click the VMware Tanzu Application Service tile.
  3. Click the Setup tab and you will see the most recent and up-to-date instructions.

Tanzu Application Service Integration

Tanzu Application Service is a popular platform for building cloud-native applications. This integration uses the VMware Tanzu Observability by Wavefront Nozzle v4. The integration offers TAS data egress, system dashboards, and alerts. A TAS environment with many foundations can be difficult to debug. Tanzu Observability can help. With an easy setup, TAS specific SLIs, ability to monitor across your foundations and out-of-the-box dashboards and alerts, you’ll be up and running in no time.

Supported Versions: TAS v2.11 and later.

Benefits

  • Multi-foundation capabilities: Explore the data from all your foundations by using dashboard variables.
  • Opinionated monitoring: Collection of critical metrics that allow you to monitor and visualize the health of Tanzu Application Service (TAS).
  • PPS management: Customize the nozzle in Ops Manager, for example update the scrape interval.

images/screen-1.png

How It Works

The Tanzu Observability Nozzle for TAS has three major components.

  1. The Healthwatch exporters are VMs that stream metrics from the TAS Firehose.
  2. Telegraf scrapes these metrics, converts them to Wavefront data format, and streams them to the Wavefront proxy.
  3. The Wavefront proxy sends the metrics to the Wavefront service.

images/screen-2.png

images/screen-3.png

Step 1: Download the Tanzu Observability Nozzle Tile

The Tanzu Observability Nozzle is the code that sends data from Tanzu Application Service to Operations for Applications.

Download nozzle

Step 2: Upload, Configure, and Deploy the Nozzle Tile in Ops Manager

  1. Log in to Ops Manager.
  2. Upload the TAS Integration for Tanzu Observability.
  3. Configure the nozzle. At a minimum, specify the following:
    • Wavefront Proxy Configuration:
      • Wavefront instance URL
      • Authentication Configuration
      • User-friendly host name
    • Telegraf Agent Configuration:
      • Foundation name

images/screen-4.png

Authentication Configuration

Starting with version 4.3.0, the Nozzle supports two types of authentication:

  • If your Operations for Applications service is onboarded to VMware Cloud services, use server to server OAuth app credentials, such as app ID and app secret. The server to server app must have the Proxies service role and must belong to the VMware Cloud organization running the Operations for Applications service instance.
  • If your Operations for Applications service is not onboarded to VMware Cloud services, you can still use a valid Operations for Applications API token.

For information about the original and onboarded subscriptions and how they differ, see Subscription Types.

See the documentation for details and FAQs.

Step 3: Confirm that Metrics are Flowing

After the tile is deployed in Ops Manager, metrics are sent to Tanzu Observability. To confirm, click the Metrics tab.

Step 4: Explore TAS Metrics with the System Dashboards

Examine your foundation data with the out-of-the-box dashboards on the Dashboards tab.

Tanzu Application Service Metrics Overview

This page describes the most important metrics exposed by the Tanzu Application Service (TAS) integration. This list of metrics is not comprehensive. The full list of unique metrics varies from environment to environment based on factors such as the TAS version, the optional packages installed, and the custom application metrics.

The Key Performance Indicators listed below are the most important metrics for the TAS platform. These are the metrics that you should monitor to track the overall platform health.

Key Performance Indicators

For more details about the metrics listed below, see Key Performance Indicators.

Metric Name Description
tas.auctioneer.AuctioneerFetchStatesDuration Time in ns that the Auctioneer took to fetch state from all the Diego Cells when running its auction.
tas.auctioneer.AuctioneerLRPAuctionsFailed The number of Long Running Process (LRP) instances that the Auctioneer failed to place on Diego Cells.
tas.auctioneer.AuctioneerLRPAuctionsStarted The number of LRP instances that the Auctioneer successfully placed on Diego Cells.
tas.auctioneer.AuctioneerTaskAuctionsFailed The number of Tasks that the Auctioneer failed to place on Diego Cells.
tas.auctioneer.LockHeld Indicates whether an Auctioneer instance holds the expected Auctioneer lock (in Locket). 1 means the active Auctioneer holds the lock, 0 means the lock was lost.
tas.bbs.BBSMasterElected Indicates when there is a BBS master election. A BBS master election takes place when a BBS instance has taken over as the active instance. A value of 1 is emitted when the election takes place.
tas.bbs.ConvergenceLRPDuration Time in ns that the BBS took to run its LRP convergence pass.
tas.bbs.CrashedActualLRPs Total number of LRP instances that have crashed.
tas.bbs.Domain_cf_apps Indicates if the cf-apps Domain is up-to-date, meaning that the TAS for VMs app requests from the Cloud Controller are synchronized to bbs.LRPsDesired (Diego-desired AIs) for execution.
tas.bbs.LockHeld Indicates whether a BBS instance holds the expected BBS lock (in Locket). 1 means the active BBS server holds the lock, 0 means the lock was lost.
tas.bbs.LRPsExtra Total number of LRP instances that are no longer desired but still have a BBS record. When Diego wants to add more apps, the BBS sends a request to the Auctioneer to spin up additional LRPs. LRPsExtra is the total number of LRP instances that are no longer desired but still have a BBS record.
tas.bbs.LRPsMissing Total number of LRP instances that are desired but have no record in the BBS.
tas.bbs.LRPsRunning The total number of LRP instances that are running on Diego Cells.
tas.capi.info Exposes the TAS version as tags.
tas.capi.servicedbindings Exposes information about services and bound apps as tags.
tas.bbs.RequestLatency The maximum observed latency time over the past 60 seconds that the BBS took to handle requests across all its API endpoints.
tas.gorouter.backend_exhausted_conns The lifetime number of requests that have been rejected by the Gorouter VM due to the Max Connections Per Backend limit being reached across all tried back ends.
tas.gorouter.bad_gateways The lifetime number of bad gateways, or 502 responses, from the Gorouter itself, emitted per Gorouter instance.
tas.gorouter.file_descriptors The number of file descriptors currently used by the Gorouter job.
tas.gorouter.latency The time in milliseconds that represents the length of a request from the Gorouter’s point of view.
tas.gorouter.latency_uaa Time in milliseconds that UAA took to process a request that the Gorouter sent to UAA endpoints.
tas.gorouter.ms_since_last_registry_update Time in milliseconds since the last route register was received, emitted per Gorouter instance.
tas.gorouter.responses_5xx The lifetime number of requests completed by the Gorouter VM for HTTP status family 5xx, server errors, emitted per Gorouter instance.
tas.gorouter.total_requests The lifetime number of requests completed by the Gorouter VM, emitted per Gorouter instance.
tas.gorouter.total_routes The current total number of routes registered with the Gorouter, emitted per Gorouter instance.
tas.locket.ActiveLocks Total count of how many locks the system components are holding.
tas.locket.ActivePresences Total count of active presences. Presences are defined as the registration records that the Diego Cells maintain to advertise themselves to the platform.
tas.p-mysql._mysql_available Indicates whether the MySQL service is available and responding to requests.
tas.p-mysql._mysql_galera_wsrep_cluster_size The number of cluster nodes with which each node is communicating normally.
tas.p-mysql._mysql_galera_wsrep_cluster_status Shows the primary status of the cluster component that the node is in.
tas.p-mysql._mysql_galera_wsrep_ready Shows whether each cluster node can accept queries. Returns only 0 or 1.
tas.p-mysql._mysql_net_max_used_connections Shows the maximum used number of connections over the maximum permitted number of simultaneous client connections.
tas.p-mysql._mysql_performance_cpu_utilization_percent CPU time being consumed by the MySQL service.
tas.p-mysql._mysql_performance_queries_delta The number of statements executed by the server over the last 30 seconds.
tas.p-mysql._mysql_system_ephemeral_disk_used_percent The percentage of disk used on the ephemeral file system.
tas.p-mysql._mysql_system_persistent_disk_used_percent The percentage of disk used on the persistent file system.
tas.p-mysql._mysql_variables_max_connections Shows the maximum used number of connections over the maximum permitted number of simultaneous client connections.
tas.rep.CapacityRemainingDisk Remaining amount of disk in MiB available for this Diego Cell to allocate to containers.
tas.rep.CapacityRemainingMemory Remaining amount of memory in MiB available for this Diego Cell to allocate to containers.
tas.rep.GardenHealthCheckFailed The Diego Cell periodically checks its health against the Garden backend. For Diego Cells, 0 means healthy and 1 means unhealthy.
tas.rep.RepBulkSyncDuration Time in ns that the Diego Cell Rep took to sync the ActualLRPs that it claimed with its actual Garden containers.
tas.route_emitter.RouteEmitterSyncDuration Time in ns that the active Route Emitter took to perform its synchronization pass.
tas.bosh-system-metrics-forwarder.system_cpu_user CPU utilization - The percentage of CPU spent in user processes.
tas.bosh-system-metrics-forwarder.system_disk_ephemeral_percent Ephemeral disk - Percentage of the ephemeral disk used on the VM.
tas.bosh-system-metrics-forwarder.system_disk_persistent_percent Persistent disk — Percentage of persistent disk used on the VM.
tas.bosh-system-metrics-forwarder.system_disk_system_percent System disk — Percentage of the system disk used on the VM.
tas.bosh-system-metrics-forwarder.system_healthy 1 means the system is healthy and 0 means the system is not healthy.
tas.bosh-system-metrics-forwarder.system_mem_percent System Memory — Percentage of memory used on the VM.
tas.uaa.requests_global_completed_count The lifetime number of requests completed by the UAA VM, emitted per UAA instance. This number includes health checks.
tas.uaa.server_inflight_count The number of requests UAA is currently processing (in-flight requests), emitted per UAA instance.

Alerts

  • TAS Apps Manager Availability:A result code of a poll to the Apps Manager URL. A result code of zero indicates a successful poll. See here for details.
  • TAS Auctioneer Fetch State Duration Taking Too Long:Time that the Auctioneer took to fetch state from all the Diego Cells when running its auction. See here for details.
  • TAS Auctioneer LRP Auctions Failed:The number of Long Running Process (LRP) instances that the Auctioneer failed to place on Diego Cells. See here for details.
  • TAS Auctioneer Task Auctions Failed:The number of Tasks that the Auctioneer failed to place on Diego Cells. See here for details.
  • TAS BBS Fewer App Instances Than Expected:Total number of LRP instances that are desired but have no record in the BBS. See here for details.
  • TAS BBS More App Instances Than Expected:Total number of LRP instances that are no longer desired but still have a BBS record. See here for details.
  • TAS BBS Time to Handle Requests:The maximum observed latency time over the past 60 seconds that the BBS took to handle requests across all its API endpoints. See here for details.
  • TAS BBS Time to Run LRP Convergence:Time that the BBS took to run its LRP convergence pass. See here for details.
  • TAS BOSH Director Health:The total number of times the BOSH health SLI test suite has failed. See here for details.
  • TAS BOSH VM Disk Used:Percentage of the system disk used on the VM. See here for details.
  • TAS BOSH VM Ephemeral Disk Used:Percentage of the ephemeral disk used on the VM. See here for details.
  • TAS BOSH VM Persistent Disk Used:Percentage of the persistent disk used on the VM. See here for details.
  • TAS CLI Command Status:The total number of times the App SLI Smoke Test Suite has failed. See here for details.
  • TAS Certificate Expiration:The number of seconds until expiration of each certificate in the foundation. See here for details.
  • TAS Cloud Controller and Diego Not in Sync:Indicates if the cf-apps Domain is up-to-date, meaning that TAS app requests from Cloud Controller are synchronized to tas.bbs.LRPsDesired (Diego-desired AIs) for execution. See here for details.
  • TAS Diego Cell Container Capacity:Percentage of remaining container capacity for a given Diego cell. See here for details.
  • TAS Diego Cell Memory Chunks Available:Indicates the available Diego Cell memory, by number of available full chunks for a given deployment. See here for details.
  • TAS Diego Cell Replication Bulk Sync Duration:Time that the Diego Cell Rep took to sync the ActualLRPs that it claimed with its actual garden containers. See here for details.
  • TAS Diego Cell Route Emitter Sync Duration:Time the active Route Emitter took to perform its synchronization pass. See here for details.
  • TAS Garden Health Check Failed:The Diego Cell periodically checks its health against the Garden back end. For Diego Cells, 0 means healthy, and 1 means unhealthy. See here for details.
  • TAS Gorouter File Descriptors:The number of file descriptors currently used by the Gorouter job. See here for details.
  • TAS Gorouter High CPU Utilization:The Gorouter is experiencing average CPU utilization above 65% (warn) or 70% (severe). See here for details.
  • TAS Gorouter Time Since Last Route Register Received:Time since the last route register was received, emitted per Gorouter instance. Indicates if routes are not being registered to apps correctly. See here for details.
  • TAS Gorouter VM Health:Measures the state of the processes running on the Gorouter VM. See here for details.
  • TAS Locks Held by Auctioneer:Whether an Auctioneer instance holds the expected Auctioneer lock (in Locket). See here for details.
  • TAS Locks Held by BBS:Whether a BBS instance holds the expected BBS lock (in Locket). See here for details.
  • TAS Low Proxy Points Received:Measures the total PPS received by the Wavefront proxy. See here for details.
  • TAS MySQL Status:This alert uses the metric from the SQL database used by system components, not from the MySQL service used by applications. See here for details.
  • TAS Ops Manager Availability:A result code of a poll to the Ops Manager URL. A result code of zero indicates a successful poll. See here for details.
  • TAS Percentage of Diego Cells with Enough Disk to Stage Apps:Percentage of Diego Cells with at least one chunk of Disk space available to stage an application. See here for details.
  • TAS Telegraf Gatherer and Exporter Scrape Errors:Increase in number of errors during scraping in the last 5 minutes. See here for details.
  • TAS UAA Latency is Elevated:Time in milliseconds that UAA took to process a request that the Gorouter sent to UAA endpoints. See here for details.
  • TAS UAA VM Health:Measures the state of the processes running on the UAA VM. See here for details.