Learn about the Wavefront Pivotal Cloud Foundry Integration.

Pivotal Cloud Foundry Integration

Pivotal Cloud Foundry (PCF) is a popular platform for building cloud-native applications. The PCF integration is full-featured implementation offering pre-defined dashboards and alert conditions and is fully configurable.

Dashboards

The PCF integration is a set of dashboards that give an overview of your PCF deployment and specific PCF components:

  • PCF: Summary - overall health of PCF deployment.
  • PCF: Cloud Controller - detailed Cloud Controller metrics.
  • PCF: GoRouter - detailed GoRouter metrics.
  • PCF: Container - health of containers within PCF.
  • PCF: User Account and Authentication (UAA) - detailed UAA server metrics.
  • PCF: Diego Auctioneer - detailed Diego Auctioneer metrics.
  • PCF: Diego BBS - detailed Deigo Bulletin Board System (BBS) metrics.
  • PCF: Diego Cell - health of Diego Cells.
  • PCF: Metron Agent - health of Metron Agents.
  • PCF: MySQL - Real-time visibility into the PCF MySQL status.
  • PCF: Redis - Real-time visibility into the PCF Redis status.
  • PCF: RabbitMQ - Real-time visibility into the PCF RabbitMQ status.
  • PCF: Wavefront Nozzle - To monitor the health and performance of your Pivotal Platform deployment and apps.

Alerts

The PCF alerts is also available for you to install and use. Descriptions of the alerts are available in Pivotal Cloud Foundry Alerts

Here’s a preview of the Cloud Controller dashboard: images/cloud_controller_dashboard.png

Pivotal Cloud Foundry Setup

Supported Version(s): PCF v2.2 and above.

Install Wavefront by VMware Nozzle for PCF tile

This integration uses the Wavefront by VMware Nozzle for PCF tile distributed through the Pivotal network.

See the documentation for info on installing and configuring the tile within your PCF deployment.
Use the following Wavefront instance URL and API token for configuring the Wavefront proxy:
Wavefront Instance URL: https://YOUR_CLUSTER.wavefront.com/api
Wavefront API Token: YOUR_API_TOKEN

Send App Metrics

See the documentation for info on sending metrics to Wavefront from your apps running within PCF.

Alerts

  • PAS Active Locks:Total count of how many locks the system components are holding. See here for details.
  • PAS Auctioneer Fetch State Duration Taking Too Long:App stage requests for Diego may be failing. Actions: - Consult your Pivotal Expert.
  • PAS Auctioneer LRP Auctions Failed:The number of Long Running Process (LRP) instances that the Auctioneer failed to place on Diego Cells. See here for details.
  • PAS Auctioneer Task Auctions Failed:The number of Tasks that the Auctioneer failed to place on Diego Cells. See here for details.
  • PAS Auctioneer Time to Fetch Diego Cell State:Time in ns that the Auctioneer took to fetch state from all the Diego Cells when running its auction. See here for details.
  • PAS BBS Crashed App Instances:Total number of LRP instances that have crashed. See here for details.
  • PAS BBS Fewer App Instances Than Expected:Total number of LRP instances that are desired but have no record in the BBS. See here for details.
  • PAS BBS Master Elected:Indicates when there is a BBS master election. See here for details.
  • PAS BBS More App Instances Than Expected:Total number of LRP instances that are no longer desired but still have a BBS record. See here for details.
  • PAS BBS Running App Instances Rate of Change:DYNAMIC ALERT: NEGATIVE 10 is a placeholder. Rate of change in the average number of app instances being started or stopped on the platform. See here for details.
  • PAS BBS Task Count is Elevated:This elevated BBS task metric is a KPI tracked by the internal Pivotal Web Services team.
  • PAS BBS Time to Handle Requests:The maximum observed latency time over the past 60 seconds that the BBS took to handle requests across all its API endpoints. See here for details.
  • PAS BBS Time to Run LRP Convergence:Time that the BBS took to run its LRP convergence pass. See here for details.
  • PAS BOSH VM CPU Used:CPU utilization - The percentage of CPU spent in user processes. Set an alert and investigate further if the CPU utilization is too high for a job.
  • PAS BOSH VM Disk Used:System disk - Percentage of the system disk used on the VM.
  • PAS BOSH VM Ephemeral Disk Used:Ephemeral disk - Percentage of the ephemeral disk used on the VM.
  • PAS BOSH VM Health:1 means the system is healthy, and 0 means the system is not healthy.
  • PAS BOSH VM Memory Used:System Memory - Percentage of memory used on the VM
  • PAS BOSH VM Persistent Disk Used:Persistent disk - Percentage of the persistent disk used on the VM. Set an alert and investigate if the persistent disk usage is too high for a job over an extended period.
  • PAS Cloud Controller and Diego Not in Sync:Indicates if the cf-apps Domain is up-to-date, meaning that PAS app requests from Cloud Controller are synchronized to bbs. See here for details.
  • PAS Diego Cell Container Capacity:Percentage of remaining container capacity for a given Diego Cell.
  • PAS Diego Cell Disk Capacity:Percentage of remaining disk capacity for a given Diego Cell.
  • PAS Diego Cell Memory Capacity:Percentage of remaining memory capacity for a given Diego Cell.
  • PAS Diego Cell Replication Bulk Sync Duration:Time that the Diego Cell Rep took to sync the ActualLRPs that it claimed with its actual garden containers.
  • PAS Diego Cell Route Emitter Sync Duration:Time the active Route Emitter took to perform its synchronization pass.
  • PAS Garden Health Check Failed:The Diego Cell periodically checks its health against the Garden back end. For Diego Cells, 0 means healthy, and 1 means unhealthy.
  • PAS Gorouter 502 Bad Gateway:The number of bad gateways, or 502 responses, from the Gorouter itself, emitted per Gorouter instance. See here for details.
  • PAS Gorouter File Descriptors:The number of file descriptors currently used by the Gorouter job. Indicates an impending issue with the Gorouter. See here for details.
  • PAS Gorouter Handling Latency:This measures the amount of time a Gorouter takes to handle requests to backend endpoints, including both apps, CC and UAA. See here for details.
  • PAS Gorouter Server Error:The number of requests completed by the Gorouter VM for HTTP status family 5xx, server errors, emitted per Gorouter instance.
  • PAS Gorouter Throughput:This measures the number of requests completed by the Gorouter VM, emitted per Gorouter instance. See here for details.
  • PAS Gorouter Time Since Last Route Register Received:Time since the last route register was received, emitted per Gorouter instance. Indicates if routes are not being registered to apps correctly.
  • PAS Locks Held by Auctioneer:Whether an Auctioneer instance holds the expected Auctioneer lock (in Locket). See here for details.
  • PAS Locks Held by BBS:Whether a BBS instance holds the expected BBS lock (in Locket). See here for details.
  • PAS UAA Latency is Elevated:A quick way to confirm user-impacting behavior is to try login.run.pivotal.io and see if you receive a delayed response. See here for details.