GitOps Agent Metrics

info

All new metrics are available only from gitops-agent version 0.91.0 and upwards. Please ensure that you are up-to-date if you are unable to find or use GitOps Agent Metrics.

Overview

The GitOps Agent exposes internal metrics on port 2112 at the /metrics endpoint. These metrics are useful for monitoring the performance and health of the agent, and can be scraped using tools like Prometheus.

Metrics are enabled by default in the GitOps agent and exposed through a Kubernetes Service:-

Metrics are served from the agent’s container on port 2112 at /metrics.
If these metrics are not exposed by default, you must explicitly expose this port.
All metrics reset when the GitOps agent pod restarts.

Exposing Metrics for Scraping

This is already done in the default agent installation but here are the configs that control the exposing of metrics for agent.

Values overrides for Helm-based Agent Installation

Enable metrics service by setting the following value in your Helm overrides file:

agent:
  metrics:
    enabled: true

To enable Prometheus ServiceMonitor (if you're using Prometheus Operator):

agent:
  metrics:
    serviceMonitor:
      enabled: true

note

Avoid using a LoadBalancer type service if you're running multiple agent pods. In such cases, using a ServiceMonitor ensures metrics from all replicas are correctly scraped. For more details, refer to the Prometheus Operator guide on running exporters.

Alternatively, you can pass the values directly during the Helm installation:

helm install harness-agent <chart-path> \
  --set agent.metrics.enabled=true \
  --set agent.metrics.serviceMonitor.enabled=true

For Agent installed through Kubernetes manifests directly

To expose the metrics port (2112) manually, apply the following Kubernetes Service manifest in the same namespace as your GitOps agent:

apiVersion: v1
kind: Service
metadata:
  name: gitops-agent-metrics
  namespace: <your-agent-namespace>
  labels:
    app.kubernetes.io/name: harness-gitops-agent-metrics
    app.kubernetes.io/component: gitops-agent
    app.kubernetes.io/part-of: harness-gitops
spec:
  type: ClusterIP
  ports:
    - name: http-metrics
      protocol: TCP
      port: 2112
      targetPort: metrics
  selector:
    app.kubernetes.io/name: harness-gitops-agent

Once the service is created, you can test locally by port-forwarding:

kubectl port-forward svc/gitops-agent-metrics 2112:2112 -n <your-agent-namespace>

Visit http://localhost:2112/metrics to confirm the metrics endpoint is accessible.

Available Metrics

Metric Name	Type	Description	Min. Req. Agent Version
`entities_count`	Gauge	Number of entities being managed by the GitOps agent, grouped by label "entity_type".	v0.91.0
`task_complete_processing_time`	Histogram	Time a task spends in the agent from fetch to response.	v0.91.0
`http_request_duration`	Histogram	Time taken to complete HTTP requests to Harness.	v0.91.0
`http_request_payload_size`	Histogram	Size (in bytes) of payloads sent in HTTP requests.	v0.91.0
`reconciler_duration` (ADVANCED)	Histogram	Time taken for one reconciliation cycle (in milliseconds).	v0.91.0
`task_receiver_channel_blocked_time` (ADVANCED)	Histogram	Wait time before task is picked up by a processor.	v0.91.0
`task_execution_time` (ADVANCED)	Histogram	Time taken to execute a task (includes Argo CD work).	v0.91.0
`task_response_sender_channel_blocked_time` (ADVANCED)	Histogram	Time a completed task waits before sending response.	v0.91.0

info

The "ADVANCED" metrics will be useful once you run into non-obvious scaling issues and will be helpful in identifying which moving component is the bottleneck and scale workloads accordingly. Most basic scale issues can be tackled with a combination of either HA(high availability) mode, HPA and spec increases.

Useful information about task metrics

The gitops agent has 3 concurrent worker pools controlled by the env vars, GITOPS_AGENT_NUM_FETCHERS, GITOPS_AGENT_NUM_PROCESSORS, GITOPS_AGENT_NUM_RESPONDERS. The FETCHER fetches a task from harness, hands it to a PROCESSOR and then the result is picked up by a RESPONDER and sent back to harness.

task_receiver_channel_blocked_time observes the time tasks spend within the agent waiting for a PROCESSOR worker to pick them up.

task_response_sender_channel_blocked_time observes the time a completed task spends within the agent waiting for a RESPONDER worker to send its response back to harness.

task_execution_time is the time a PROCESSOR worker actually spent on the task, i.e. it is the time in between the above two channels once a PROCESSOR picks up a task.

task_complete_processing_time is the overall time spent by the task within the agent. This is the sum of the above 3 metrics:

task_receiver_channel_blocked_time + task_execution_time + task_response_sender_channel_blocked_time

This sum usually is equal to task_complete_processing_time for each task.

This information can be used to debug instances where the obvious reason why the gitops agent is taking too long is not clear.

For example if there are sporadic issues with tasks not completing properly and there is a spike in task_response_sender_channel_blocked_time you can increase GITOPS_AGENT_NUM_RESPONDERS so that there's more RESPONDER workers and there is no blocking of task results being sent to gitops.

Understanding Histogram Metrics

Most metrics are histograms and include these vectors below corresponding to each metric:

_count: Number of samples.

_sum: Sum of all samples.

_bucket: Buckets representing value ranges.

To calculate averages for example:

rate(task_complete_processing_time_sum[5m]) / rate(task_complete_processing_time_count[5m])

Example Prometheus Queries (PromQL)

Based on the above explanations you may also set alerts as required here.

Average HTTP Request Duration

rate(http_request_duration_sum[5m]) / rate(http_request_duration_count[5m])

95th Percentile HTTP Payload Size

histogram_quantile(0.95, rate(http_request_payload_size_bucket[5m]))

95th Percentile Task Execution Time

histogram_quantile(0.95, rate(task_execution_time_bucket[5m]))

[Advanced] 95th Percentile Receiver Channel Block Time

histogram_quantile(0.95, rate(task_receiver_channel_blocked_time_bucket[5m]))

Tip: If this increases under load, scale GITOPS_AGENT_NUM_PROCESSORS. Recommended max: 16 for agents with 1 vCPU.

[Advanced] 95th Percentile Responder Channel Block

histogram_quantile(0.95, rate(task_response_sender_channel_blocked_time_bucket[5m]))

Tip: If this increases under load, scale GITOPS_AGENT_NUM_RESPONDERS.

Overview​

Exposing Metrics for Scraping​

Values overrides for Helm-based Agent Installation​

For Agent installed through Kubernetes manifests directly​

Available Metrics​

Useful information about task metrics​

Understanding Histogram Metrics​

Example Prometheus Queries (PromQL)​

Average HTTP Request Duration​

95th Percentile HTTP Payload Size​

95th Percentile Task Execution Time​

[Advanced] 95th Percentile Receiver Channel Block Time​

[Advanced] 95th Percentile Responder Channel Block​