Observability Multi-Cloud

Grafana Loki Deep Dive: LogQL, Label Cardinality, and Chunk Storage Tuning

Loki gets sold as “Prometheus for logs,” and that one-liner causes most of the production incidents I get called into. Loki does not index your log content. It indexes a small set of labels and stores everything else as compressed chunks in object storage, then brute-forces the text at query time. That design is why Loki is an order of magnitude cheaper than Elasticsearch — and why one badly chosen label can detonate your index, your ingesters, and your bill in an afternoon. This is the working architecture: how the components fit, how to design labels that stay cheap, the LogQL you actually run, and how to tune chunks, caching, and retention so the thing pays for itself.

1. The architecture you must internalize

Loki splits cleanly into a write path and a read path, with object storage in the middle and a TSDB index describing what lives where.

   logs in                                                 queries in
      |                                                         |
 [ distributor ]  --hash ring, by stream--> [ ingester ]   [ querier ]
      |                                          |  |            |
 validate, rate-limit                  build chunks  flush  [ query-frontend ]
                                                   |              | (split + cache)
                                                   v              v
                                          [ object storage: chunks + TSDB index ]
                                                   ^
                                          [ compactor ]  dedupe index, apply retention

A stream is the atomic unit of Loki: a unique set of label key-value pairs. {app="api", env="prod", pod="api-7d9f-x"} is one stream; change any value and it is a different stream. Everything about cost and performance follows from how many streams you create.

2. Label design: the one decision that determines your bill

Loki’s index size is governed by the number of unique streams, which is the product of the cardinalities of your labels. Two labels with 1,000 values each is potentially 1,000,000 streams. This is the cardinality bomb, and it is almost always a pod, request_id, user_id, trace_id, or path label that lights the fuse.

The rule that prevents 90% of Loki disasters:

Labels are for routing and selecting streams, not for storing data you want to search. If a value is high-cardinality or unbounded, it belongs inside the log line, extracted at query time with a parser — never as a label.

Good labels are bounded, predictable, and small in count: cluster, namespace, app, env, level, component. Everything else — IDs, IPs, paths, user agents, durations — stays in the line.

Configure hard guardrails so a bad pipeline can’t take down the cluster:

# loki config: limits_config
limits_config:
  # Reject streams with too many labels (a runaway label-extraction pipeline)
  max_label_names_per_series: 15
  max_label_value_length: 2048
  max_label_name_length: 1024

  # Cap active streams per tenant; protects ingester memory
  max_global_streams_per_user: 50000
  max_streams_per_user: 0          # 0 = use global limit only

  # Per-stream ingest rate (bytes/sec) and burst, the per-stream throttle
  per_stream_rate_limit: 5MB
  per_stream_rate_limit_burst: 20MB

  # Per-tenant ingestion ceiling
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  ingestion_rate_strategy: local   # per-distributor; use 'global' for shared budget

Find your worst offenders before they hurt you. The label dimensions endpoint and logcli report per-label cardinality:

# Top label values by cardinality for a given label over the last hour
logcli series '{namespace="prod"}' --since=1h | \
  awk -F'pod=' 'NF>1{print $2}' | sort -u | wc -l

# Or query Loki's cardinality endpoint directly
curl -s -G "http://loki:3100/loki/api/v1/index/stats" \
  --data-urlencode 'query={namespace="prod"}' \
  --data-urlencode "start=$(date -u -d '-1 hour' +%s)000000000" \
  --data-urlencode "end=$(date -u +%s)000000000" | jq

When you genuinely need to slice by a high-cardinality field, do it with a parser at query time (next section), not a label. The index stays tiny; the work happens only on the bytes a query actually touches.

3. LogQL filtering: stream selector, line filters, and parsers

Every LogQL query starts with a stream selector in braces — that is the part the index resolves, and it should be as specific as possible so Loki opens the fewest chunks:

{app="api", env="prod"}

After the selector come line filters, which scan the raw text. These run before parsing and are extremely fast (Loki uses SIMD/optimized substring matching), so put your cheapest, most selective filter first:

# |= contains, != not contains, |~ regex match, !~ regex not match
{app="api", env="prod"} |= "error" != "healthcheck" |~ `status=5\d\d`

Then parsers turn the line into labels you can filter and format on. Pick the parser that matches your format — they differ massively in cost:

# logfmt: key=value pairs. Cheap and predictable.
{app="api"} | logfmt | level="error" | duration > 500ms

# json: parses JSON into labels (nested keys flattened with _). Pricier than logfmt.
{app="api"} | json | status_code >= 500 | line_format "{{.method}} {{.path}} {{.status_code}}"

# pattern: explicit positional extraction, the fastest structured parser for
# fixed-shape lines like nginx access logs. <_> discards a field.
{app="nginx"} | pattern `<ip> - - <_> "<method> <path> <_>" <status> <size>` | status="500"

Two formatting tools matter for both readability and downstream metric queries:

The order is load-bearing for performance. This is the canonical efficient shape, and getting it wrong is the most common reason “Loki is slow”:

Stream selector (index) -> line filters (raw bytes) -> parser (structured) -> label filters (post-parse) -> formatting. Filter on raw text with |= before you | json, so the JSON parser only runs on the lines that survive.

# GOOD: line filter prunes 99% of lines before the expensive json parse
{app="api", env="prod"} |= "error" | json | status_code >= 500

# BAD: parses every single line, then throws most away
{app="api", env="prod"} | json | status_code >= 500 |= "error"

4. LogQL metric queries: turning logs into SLO signals

LogQL has two query types. Everything above is a log query (returns lines). Wrap a log query in a range aggregation and you get a metric query (returns a time series) — this is how you alert on logs and build SLOs without a separate metrics pipeline.

The core range-vector functions over a log stream:

# Lines per second matching the selector+filters (log-range, counts lines)
rate({app="api"} |= "error" [5m])

# Total matching lines over the window
count_over_time({app="api"} |= "error" [5m])

# Bytes per second ingested for a stream (capacity planning)
bytes_rate({app="api"}[5m])

For numeric SLOs you need unwrap, which pulls a numeric value out of an extracted label and lets you aggregate it. This is how you compute latency percentiles or error ratios straight from logs:

# p99 request latency from a logfmt 'duration' field, in seconds, per route
quantile_over_time(0.99,
  {app="api"} | logfmt | unwrap duration_seconds [5m]
) by (route)

# Error ratio as an SLI: 5xx lines divided by all lines, over 5m
sum(rate({app="api"} | logfmt | status >= 500 [5m]))
/
sum(rate({app="api"} | logfmt [5m]))

unwrap understands duration and bytes suffixes when you use the helpers unwrap duration(field) and unwrap bytes(field), so you do not have to pre-divide. Wrap these in a sum by (...) and you have a recording-rule-ready SLI that lives next to your Prometheus burn-rate alerts.

A word on cost: metric queries decompress and scan every chunk in the range. A 30-day rate() over a chatty stream is a lot of object-storage reads. Keep alerting queries on short windows and let the query-frontend cache and the compactor’s per-day index do the heavy lifting for dashboards.

5. Chunk storage, caching, and query splitting

A flushed chunk is a compressed blob of log lines for a single stream over a time window, stored as an object in S3/GCS/Azure Blob. The TSDB index maps (labels, time) -> chunk references. Tune the chunk lifecycle and the caches, and you control both ingester memory and query latency.

# Modern Loki (TSDB shipper) storage + chunk lifecycle
schema_config:
  configs:
    - from: 2024-04-01
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  aws:
    s3: s3://us-east-1/loki-chunks
  tsdb_shipper:
    active_index_directory: /loki/tsdb-index
    cache_location: /loki/tsdb-cache

ingester:
  chunk_target_size: 1572864      # 1.5MB compressed; the sweet spot for read efficiency
  chunk_idle_period: 30m          # flush a stream after 30m of no new lines
  max_chunk_age: 2h               # force-flush even an active chunk after 2h
  chunk_encoding: snappy          # snappy = fast; zstd = smaller but more CPU

The three caches that move the needle, in order of impact:

query_range:
  align_queries_with_step: true
  cache_results: true             # 1) results cache: identical sub-queries are free
  results_cache:
    cache:
      memcached_client:
        addresses: dns+memcached-results:11211

chunk_store_config:
  chunk_cache_config:             # 2) chunk cache: decompressed chunks reused across queries
    memcached_client:
      addresses: dns+memcached-chunks:11211

# 3) index/TSDB cache is handled by the shipper's cache_location above

Query splitting and parallelism are what make a “last 7 days” query return in seconds instead of timing out. The query-frontend chops the range into intervals and fans them across queriers:

limits_config:
  split_queries_by_interval: 30m  # each sub-query covers 30m, run in parallel
  max_query_parallelism: 32       # max sub-queries dispatched concurrently
  tsdb_max_query_parallelism: 128 # TSDB can shard much harder than the old index
  max_query_series: 500           # cap fan-out so one query can't OOM the read path

With TSDB, Loki also shards a single matcher across queriers automatically, so a heavy count_over_time is split both by time and by index shard. This is why the move from BoltDB-shipper to TSDB is the single biggest performance upgrade most clusters can make.

6. Compactor, retention, and per-tenant limits

Retention is not a storage-bucket lifecycle rule in Loki — it is the compactor’s job, and you must turn it on explicitly or chunks accumulate forever.

compactor:
  working_directory: /loki/compactor
  retention_enabled: true             # REQUIRED; off by default
  delete_request_store: s3            # where deletion markers live
  compaction_interval: 10m
  retention_delete_delay: 2h          # grace period before physical delete

limits_config:
  retention_period: 744h              # global default: 31 days

  # Per-stream overrides: keep audit logs longer, drop debug noise fast
  retention_stream:
    - selector: '{namespace="audit"}'
      priority: 10
      period: 8760h                   # 365 days
    - selector: '{level="debug"}'
      priority: 5
      period: 72h                     # 3 days

For multi-tenant clusters, set retention and rate limits per tenant in an overrides file rather than the global block. This is how a platform team gives each product team its own budget without separate Loki deployments:

# runtime overrides file, hot-reloaded; one block per X-Scope-OrgID tenant
overrides:
  team-payments:
    retention_period: 2160h           # 90 days
    per_stream_rate_limit: 10MB
    max_global_streams_per_user: 100000
    ingestion_rate_mb: 25
  team-batch:
    retention_period: 168h            # 7 days
    ingestion_rate_mb: 5

Loki enforces tenancy through the X-Scope-OrgID header — the distributor reads it on write, the querier on read. Run a gateway (auth proxy) in front that injects this header from authenticated identity, and never expose Loki’s HTTP port directly. With auth_enabled: true, the header is mandatory and Loki rejects requests without it.

7. Correlating logs to traces with derived fields

Logs become an order of magnitude more useful when a log line links straight to the trace that produced it. The mechanism is derived fields in the Grafana Loki data source: a regex extracts a value (a trace_id) from the log line, and Grafana renders it as a clickable link into your Tempo data source.

{
  "name": "Loki",
  "type": "loki",
  "jsonData": {
    "derivedFields": [
      {
        "name": "TraceID",
        "matcherType": "label",
        "matcherRegex": "trace_id",
        "url": "${__value.raw}",
        "datasourceUid": "tempo-uid",
        "urlDisplayLabel": "View Trace"
      }
    ]
  }
}

matcherType: "label" keys off an extracted label (use this when your logs are JSON/logfmt and you parse trace_id out); the older default matches a regex against the raw line body. Either way the prerequisite is that your application logs the trace_id in the first place — inject it from the active span context in your logging middleware. Once that link exists, the loop closes both ways: a metric exemplar jumps to a trace, a trace span jumps to its logs via tracesToLogsV2, and a log line jumps back to the trace. Three pillars, one investigation.

8. Capacity and cost versus an Elasticsearch/OpenSearch baseline

The reason to run Loki at all is the cost model, so make the comparison explicit. The fundamental difference: Elasticsearch builds an inverted index over every term in every document; Loki indexes only labels and stores the rest as compressed object-storage chunks.

Dimension Elasticsearch / OpenSearch Grafana Loki
What is indexed Every field/term (full-text) Labels only
Storage tier Hot SSD on data nodes Object storage (S3/GCS/Azure)
Storage cost High (replicated SSD + index overhead, often >1x raw) Low (compressed chunks, ~0.1-0.3x raw)
Arbitrary-field search Fast (indexed) Brute-force scan at query time
Stream/label search n/a Fast (TSDB index)
Scaling pain point Shard/heap management, index lifecycle Stream cardinality, query fan-out
Best fit Search-heavy, ad-hoc field queries High-volume logs, known label dimensions, cost-sensitive

Loki wins decisively on ingest and storage cost and on operational simplicity (stateless components plus object storage). It loses when your access pattern is genuinely “search any field across everything,” because that becomes a full scan. The honest framing for a platform team: Loki is cheaper to store and pricier to search broadly; Elasticsearch is the inverse. Most production logging is “I know the service and roughly when, now show me the errors,” which is exactly Loki’s strength — provided your labels are designed for it.

Sizing rule of thumb for the write path: ingesters are memory-bound by active streams, not by raw bytes. Budget roughly tens of thousands of active streams per ingester and scale on stream count, not log volume. The read path scales on query-frontend parallelism and cache hit rate. Keep both healthy and Loki is the cheapest pillar you run.

Enterprise scenario

A fintech platform team migrated ~40 microservices from a self-managed Elasticsearch cluster to Loki to cut a six-figure annual storage bill. Within two weeks of the cutover, ingesters started OOMing every few hours, ingestion lagged, and per_stream_rate_limit rejections flooded the distributors. The Elasticsearch bill went away; a stability fire replaced it.

The constraint was self-inflicted. Their Promtail/Alloy relabel config had been written to mirror Elasticsearch’s field-level searchability — they had promoted request_id, user_id, pod, and the full request path to labels, reasoning that “if it was searchable before, it should be a label now.” With ~40 services, dynamic pod names, and unbounded request IDs, active streams had blown past 2 million. The TSDB index was enormous and every ingester was trying to hold tens of thousands of tiny, never-filling chunks in memory.

The fix was to demote everything high-cardinality back into the log line and search it at query time. The relabel config kept only bounded labels, and the queries moved the slicing into a parser:

# Grafana Alloy: keep ONLY low-cardinality labels; drop the cardinality bombs
loki.relabel "trim" {
  forward_to = [loki.write.default.receiver]

  // keep app/namespace/level/cluster as labels
  rule { source_labels = ["__meta_kubernetes_namespace"], target_label = "namespace" }
  rule { source_labels = ["__meta_kubernetes_pod_label_app"], target_label = "app" }

  // DROP pod, request_id, path as labels - they live in the line instead
  rule { regex = "pod|request_id|trace_id|path", action = "labeldrop" }
}
# What used to be a label match {request_id="abc123"} becomes a query-time parse.
# The index resolves {app,env}; the parser does the rest on a small chunk set.
{app="payments", env="prod"} |= "request_id=abc123" | logfmt | status >= 500

Active streams dropped from ~2,000,000 to ~28,000. Ingester memory fell by roughly 85%, the OOMs stopped, the rate-limit rejections vanished, and request_id lookups — now line filters against a tiny set of chunks resolved by {app,env} — returned in under a second. The lesson they wrote into their onboarding doc: in Loki, a label is a routing key, not a search field, and the cost of forgetting that is paid in ingester RAM.

Verify

Confirm the pipeline end to end before you trust it:

# 1) Loki is ready and ingesters are healthy in the ring
curl -s http://loki:3100/ready
curl -s http://loki:3100/ring | grep -c ACTIVE

# 2) Streams are being created at a sane cardinality (NOT millions)
logcli series '{}' --since=15m | wc -l

# 3) A line filter beats a parser-first query (compare returned stats.bytesProcessed)
logcli query '{app="api"} |= "error" | json' --since=10m --stats

# 4) A metric query returns a time series (SLI is computable)
curl -s -G "http://loki:3100/loki/api/v1/query_range" \
  --data-urlencode 'query=sum(rate({app="api"} |= "error" [5m]))' \
  --data-urlencode "start=$(date -u -d '-1 hour' +%s)" \
  --data-urlencode "end=$(date -u +%s)" \
  --data-urlencode 'step=60' | jq '.data.result | length'

# 5) Compactor is actually applying retention (look for retention activity)
curl -s http://loki:3100/metrics | grep loki_compactor_apply_retention

Checklist

Pitfalls

Next steps

Move your highest-traffic SLIs from ad-hoc LogQL into Loki recording rules (the ruler component) so burn-rate alerts read a pre-aggregated series instead of scanning chunks on every evaluation. Stand up Bloom filters (the accelerated structured-metadata path) if your access pattern is dominated by needle-in-haystack ID lookups, to skip chunks that can’t contain a value. And formalize a cardinality budget per service the same way you budget Prometheus labels — review it at design time, because the cheapest stream is the one you never created.

lokilogqllogginggrafanaobservability

Comments

Keep Reading