Recording rules

OBSERVABILITY PLATFORM

Recording rules

Recording rules are a type of aggregation rule used primarily to improve the performance of frequent queries. Recording rules let you compute frequently used or expensive queries ahead of time, and save the results as a new set of time series. Instead of running the expensive or complex query, Chronosphere Observability Platform queries for the time series generated by the recording rule.

This concept is similar to using a precomputed lookup table to avoid complex calculations and find data faster. Recording rules let you alias repeated expressions used in dashboards, and also alias other recording rules. For example, a recording rule might compute the total usage percentage for a resource, which can be referenced in other recording rules and dashboards.

Recording rules run based on a fixed interval. They ingest raw metric data into the database before reading it, save the results in a new time series, generate the aggregated and downsampled metric data, and store the data in the database.

Limitations

Although recording rules are powerful, they have the following limitations:

Recording rules might be delayed because they run in a batch format.
Recording rules apply only to individual metrics, not broad aggregations.
Recording rules don’t have the ability to discard raw data after aggregation.
Recording rules don’t capture late-arriving data.
If a recording rule fails to run, there’s no way to backfill the data.

To avoid these limitations, use rollup rules or derived telemetry instead. For example, if you have issues with late-arriving data, consider using rollup rules instead of recording rules.

Attributes

⚠️

Recording rules support only Prometheus metrics.

Configure a recording rule with a PromQL statement executed against the metrics data with the result stored in a new time series with a unique metric name. PromQL statements in recording rules can include any PromQL function.

Rule fields:

prometheus_expr: The PromQL expression to evaluate.
interval_secs: How often to evaluate the rule. Default is 60s.
label_policy: Specify label names to add to the output metric. If you attempt to add an existing label, the label isn’t added. For example, add (instance123:instance).
execution_group: The execution group this rule is assigned to. Rules in the same execution group run at intervals. All rules in a group must complete before the rules in that group run again.

Creating too many rules in an execution group can cause delays in execution of the next iteration. Chronosphere recommends limiting the number of rules in an execution group to 200-300 maximum.
name: The name of the rule. If metric_name is set, this is the human-readable name. Otherwise, it’s the time series to output to.
metric_name: The time series to output to.
slug: The slug for the rule. This can’t change after rule creation.

See the Recording rule API for a full definition.

Best practices

Recording rules support adding labels to the resultant aggregated metrics, which rollup rules don’t support. Rollup rules also require using either a Prometheus relabel rule, or a derived metric with a label_replace function in conjunction with the rollup rule, to accomplish the same goal.

Due to architectural differences between Observability Platform and Prometheus, defining recording rules is sometimes different, especially for expensive recording rules that span many metrics. For example, recording rules in Observability Platform are part of a rule group, whereas recording rules in Prometheus aren’t guaranteed to be run sequentially.

Observability Platform uses a single data store. To enhance performance, use the following recommendations:

Break up the recording rules to scope to different clusters, or another label that scopes your metrics.
Use the metric_name field so they all get written back into the same name.

With a Prometheus or Thanos setup, Chronosphere recommends scoping the rules to the local Prometheus server (opens in a new tab) to avoid cross-Prometheus queries.

View recording rules

Select from the following methods to view your recording rules.

In the navigation menu, click Go to Admin and then select Control > Recording Rules.

The recording rules page is searchable by rule Name or Execution group.

The following fields display:

Name: The rule name.
Execution Group: The execution group this rule is assigned to. Rules in the same execution group run at intervals. The entire group must complete an execution before the rules in that group will run again.
Metric Name: The time series to output to.
Interval: How often the rule evaluates.
Labels: Label names added to the output metric.
Query: Click the <> to display the query used for this rule.

Create or update recording rules

Select from the following methods to create or update recording rules.

Users can modify Terraform-managed resources only by using Terraform. Learn more.

To create a recording rule with Chronoctl, define the rule in a YAML file and apply it.

If you don’t already have a YAML configuration file, use the scaffold Chronoctl parameter to generate a template for a specific resource type:

chronoctl recording-rules scaffold

You can redirect the results (using the redirection operator >) to a file for editing.

Create or edit a YAML configuration file to configure the recording rule. See the Chronoctl example for a complete configuration file.
Apply the recording rule:
```
chronoctl apply -f FILE_NAME.yaml
```

Replace FILE_NAME with the name of the YAML configuration file.

Chronoctl example

The following YAML example includes three recording rules that calculate the average rate of increase per second for jobs that contain a value for node. The results display for instance and container as measured over one minute.

This example uses the metric_name field to specify the output name of the time series, and the name field to display the human readable name. For backwards compatibility, the example uses the name field for the time series if metric_name isn’t specified, like in the third rule.

api_version: v1/config
kind: RecordingRule
spec:
  name: cpu-usage-seconds-sum-rate-1m
  slug: instance-container-cpu-usage-seconds-sum-rate1m
  prometheus_expr: sum(rate(container_cpu_usage_seconds_total{node=""}[1m])) by (instance,
    container)
  metric_name: instance_container:cpu_usage_seconds:sum_rate1m
  interval_secs: 60
  label_policy:
    add:
      resource: cpu
---
api_version: v1/config
kind: RecordingRule
spec:
  name: network-receive-bytes-sum-rate-1m
  slug: instance-container-network-receive-bytes-sum-rate1m
  metric_name: instance_container:network_receive_bytes:sum_rate1m
  prometheus_expr: sum(rate(container_network_receive_bytes_total{node=""}[1m])) by (instance,
    container)
  interval_secs: 60
  label_policy:
    add:
      resource: network-receive
---
api_version: v1/config
kind: RecordingRule
spec:
  name: instance_container:network_transmit_bytes:sum_rate1m
  slug: instance-container-network-transmit-bytes-sum-rate1m
  prometheus_expr: sum(rate(container_network_transmit_bytes_total{node=""}[1m])) by (instance,
    container)
  interval_secs: 60
  label_policy:
    add:
      resource: network-transmit

Terraform example

The following code creates a recording rule that Terraform refers to as scrape_duration_recording_rule with the name 60s rule, and defines the other data needed to create a recording rule.

resource "chronosphere_recording_rule" "scrape_duration_recording_rule" {
  # Name for the rule
  # Value forms the metric name output if metric_name is not specified.
  name = "60s rule"
  metric_name = "scrape_duration_seconds:max_60s"
 
  # Arbitrary labels to attach to the rule
  # These labels end up being part of the output metric,
  # and take precedence over any labels that the expression would have created.
  # For example, if the expression had metrics with "foo=bar", but the value was
  # "foo=test", the final metric generated would have "foo=test".
  labels = {
    "owner" = "infra"
  }
 
  # Interval at which to evaluate the rule
  interval = "60s"
 
  # The PromQL expression to evaluate
  expr = "max(scrape_duration_seconds)"
}

Delete recording rules

Select from the following methods to delete a recording rule.

To delete a recording rule using Chronoctl, use the chronoctl recording-rules delete command:

chronoctl recording-rules delete SLUG

Replace SLUG with the slug of the rule you want to delete.

Rollup rules Mapping rules