Monitoring

Monitor and observe the operation of Kyverno using metrics.

Introduction

As a cluster administrator, it may benefit you to have monitoring capabilities over both the state and execution of cluster-applied Kyverno policies. This includes monitoring over any applied changes to policies, any activity associated with incoming requests, and any results produced as an outcome. If enabled, monitoring will allow you to visualize and alert on applied policies, and is critical to overall cluster observability and compliance.

In addition, you can specify the scope of your monitoring targets to either the rule, policy, or cluster level, which enables you to extract more granular insights from collected metrics.

Installation and Setup

When you install Kyverno via Helm, additional services are created inside the kyverno Namespace which expose metrics on port 8000.

 1$ values.yaml
 2
 3admissionController:
 4  metricsService:
 5    create: true
 6  # ...
 7
 8backgroundController:
 9  metricsService:
10    create: true
11  # ...
12
13cleanupController:
14  metricsService:
15    create: true
16  # ...
17
18reportsController:
19  metricsService:
20    create: true
21  # ...

By default, the service type is going to be ClusterIP meaning that metrics can only be scraped by a Prometheus server sitting inside the cluster.

In some cases, the Prometheus server may sit outside your workload cluster as a shared service. In these scenarios, you will want the kyverno-svc-metrics Service to be publicly exposed so as to expose the metrics (available at port 8000) to your external Prometheus server.

Services can be exposed to external clients via an Ingress, or using LoadBalancer or NodePort Service types.

To expose your kyverno-svc-metrics service publicly as NodePort at host’s/node’s port number 8000, you can configure your values.yaml before Helm installation as follows:

 1admissionController:
 2  metricsService:
 3    create: true
 4    type: NodePort
 5    port: 8000
 6    nodePort: 8000
 7  # ...
 8
 9backgroundController:
10  metricsService:
11    create: true
12    type: NodePort
13    port: 8000
14    nodePort: 8000
15  # ...
16
17cleanupController:
18  metricsService:
19    create: true
20    type: NodePort
21    port: 8000
22    nodePort: 8000
23  # ...
24
25reportsController:
26  metricsService:
27    create: true
28    type: NodePort
29    port: 8000
30    nodePort: 8000
31  # ...

To expose the kyverno-svc-metrics service using a LoadBalancer type, you can configure your values.yaml before Helm installation as follows:

 1admissionController:
 2  metricsService:
 3    create: true
 4    type: LoadBalancer
 5    port: 8000
 6    nodePort: 
 7  # ...
 8
 9backgroundController:
10  metricsService:
11    create: true
12    type: LoadBalancer
13    port: 8000
14    nodePort: 
15  # ...
16
17cleanupController:
18  metricsService:
19    create: true
20    type: LoadBalancer
21    port: 8000
22    nodePort: 
23  # ...
24
25reportsController:
26  metricsService:
27    create: true
28    type: LoadBalancer
29    port: 8000
30  nodePort: 
31  # ...

Configuring the metrics

While installing Kyverno via Helm, you also have the ability to configure which metrics to expose, this configuration will be stored in the kyverno-metrics ConfigMap.

You can configure which Namespaces you want to include and/or exclude for metric exportation when configuring your Helm chart. This configuration is useful in situations where you might want to exclude the exposure of Kyverno metrics for certain Namespaces like test or experimental Namespaces. Likewise, you can include certain Namespaces if you want to monitor Kyverno-related activity for only a set of certain critical Namespaces. Exporting the right set of Namespaces (as opposed to exposing all Namespaces) can end up substantially reducing the memory footprint of Kyverno’s metrics exporter. Moreover, you can also configure the exposure of specific metrics, disabling them completely or dropping some label dimensions. For Histograms, you can change the default bucket boundaries or configure it for a specific metric as well.

 1...
 2metricsConfig:
 3  # 'namespaces.include': list of namespaces to capture metrics for. Default: all namespaces included.
 4  # 'namespaces.exclude': list of namespaces to NOT capture metrics for. Default: [], none of the namespaces excluded.
 5  # `exclude` takes precedence over `include` in cases when a Namespace is provided under both.
 6  namespaces:
 7    include: []
 8    exclude: []
 9
10  # Configures the bucket boundaries for all Histogram metrics, the value below is the default.
11  bucketBoundaries: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 20, 25, 30]
12
13  # Per Metric configuration, allows disabling metrics, dropping labels and change the bucket boundaries.
14  metricsExposure:
15    # Counter disabled
16    kyverno_policy_rule_info_total:
17      enabled: false
18    # Histogram disabled
19    kyverno_admission_review_duration_seconds:
20      enabled: false
21    # Counter with customized dimensions
22    kyverno_admission_requests:
23      disabledLabelDimensions: ["resource_namespace", "resource_kind", "resource_request_operation"]
24    # Histogram with custom boundaries and dimensions
25    kyverno_policy_execution_duration_seconds:
26      disabledLabelDimensions: ["resource_kind", "resource_namespace", "resource_request_operation"]
27      bucketBoundaries: [0.005, 0.01, 0.025]
28...

Disabling metrics

Some metrics may generate an excess amount of data which may be undesirable in situations where this incurs additional cost. Some monitoring products and solutions have the ability to selectively disable which metrics are sent to collectors while leaving others enabled.

Kyverno configuration side

As described above, Kyverno allows disabling metrics, dropping labels and changing the bucket boundaries by changing the kyverno-metrics ConfigMap, please refer to the example provided.

DataDog OpenMetrics side

Disabling select metrics with DataDog OpenMetrics can be done by annotating the Kyverno Pod(s) as shown below.

 1apiVersion: v1                                                                                                                                               
 2kind: Pod                                                                                                                                                    
 3metadata:                                                                                                                                                    
 4  annotations:                                                                                                                                               
 5    ad.datadoghq.com/kyverno.checks: |                                                                                                                       
 6      {                                                                                                                                                      
 7        "openmetrics": {                                                                                                                                     
 8          "init_config": {},                                                                                                                                 
 9          "instances": [                                                                                                                                     
10            {                                                                                                                                                
11              "openmetrics_endpoint": "http://%%host%%:8000/metrics",                                                                                        
12              "namespace": "kyverno",                                                                                                                        
13              "metrics": [                                                                                                                                   
14                {"kyverno_policy_rule_info_total": "policy_rule_info"},                                                                                      
15                {"kyverno_admission_requests": "admission_requests"},                                                                                        
16                {"kyverno_policy_changes": "policy_changes"}                                                                                                 
17              ],                                                                                                                                             
18              "exclude_labels": [                                                                                                                            
19                "resource_namespace"                                                                                                                         
20              ]                                                                                                                                              
21            },                                                                                                                                               
22            {                                                                                                                                                
23              "openmetrics_endpoint": "http://%%host%%:8000/metrics",                                                                                        
24              "namespace": "kyverno",                                                                                                                        
25              "metrics": [                                                                                                                                   
26                {"kyverno_policy_results": "policy_results"}                                                                                                 
27              ]                                                                                                                                              
28            }                                                                                                                                                
29          ]                                                                                                                                                  
30        }                                                                                                                                                    
31      } 

The Kyverno Helm chart supports including additional Pod annotations in the values file as shown in the below example.

 1podAnnotations:
 2  # https://github.com/DataDog/integrations-core/blob/master/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example
 3  # Note: To collect counter metrics with names ending in `_total`, specify the metric name without the `_total`
 4  ad.datadoghq.com/kyverno.checks: |
 5    {
 6      "openmetrics": {
 7        "init_config": {},
 8        "instances": [
 9          {
10            "openmetrics_endpoint": "http://%%host%%:8000/metrics",
11            "namespace": "kyverno",
12            "metrics": [
13              {"kyverno_policy_rule_info_total": "policy_rule_info"},
14              {"kyverno_admission_requests": "admission_requests"},
15              {"kyverno_policy_changes": "policy_changes"}
16            ],
17            "exclude_labels": [
18              "resource_namespace"
19            ]
20          },
21          {
22            "openmetrics_endpoint": "http://%%host%%:8000/metrics",
23            "namespace": "kyverno",
24            "metrics": [
25              {"kyverno_policy_results": "policy_results"}
26            ]
27          }
28        ]
29      }
30    }    

Metrics and Dashboard


Policies and Rules Count

This metric can be used to track the number of policies as well as rules present in the cluster which are currently active and even the ones which are not currently active but were created in the past.

Policy and Rule Execution

This metric can be used to track the results associated with the rules executing as a part of incoming resource requests and even background scans. This metric can be further aggregated to track policy-level results as well.

HTTP Requests Latency

This metric can be used to track the latencies associated with HTTP requests.

Policy Rule Execution Latency

This metric can be used to track the latencies associated with the execution/processing of the individual rules whenever they evaluate incoming resource requests or execute background scans. This metric can be further aggregated to present latencies at the policy-level.

Admission Review Latency

This metric can be used to track the end-to-end latencies associated with the entire individual admission review, corresponding to the incoming resource request triggering a bunch of policies and rules.

Admission Requests Count

This metric can be used to track the number of admission requests which were triggered as a part of Kyverno.

Cleanup Controller Deleted Objects

This metric can be used to track the number of objects deleted by the cleanup controller.

Cleanup Controller Deleted Objects

This metric can be used to track the number of objects deleted by the cleanup TTL controller.

Cleanup Controller Errors Count

This metric can be used to track the number of errors encountered by the cleanup controller while trying to delete objects.

Cleanup Controller Errors Count

This metric can be used to track the number of errors encountered by the cleanup TTL controller while trying to delete objects.

Controller Drops Count

This metric can be used to track the number of times a controller drops elements. Dropping usually indicates an unrecoverable error, the controller retried to process an item a couple of times and after failing every try drop the item.

Controller Reconciliations Count

This metric can be used to track the number of reconciliations performed by various Kyverno controllers.

Controller Requeues Count

This metric can be used to track the number of times a controller requeues elements to be processed. Requeueing usually indicates that an error occured and that the controller enqueued the same item to retry processing it a bit later.

HTTP Requests Count

This metric can be used to track the number of http requests which were triggered as a part of Kyverno.

Policy Changes Count

This metric can be used to track the history of all Kyverno policy-related changes such as policy creations, updates, and deletions.

Client Queries

This metric can be used to track the number of queries per second (QPS) from Kyverno.

Grafana Dashboard

A ready-to-use dashboard for Kyverno metrics.

OpenTelemetry

OpenTelemetry integration in Kyverno.


Last modified September 26, 2024 at 7:14 PM PST: feat: update on metrics exposure config (#1363) (25dbcd7)