What you’ll learn
- The mental model behind Loki (labels, streams, chunks) and why it’s cost-effective
- How to structure logs at the source (JSON, stable fields)
- How to ingest logs with Promtail and parse them
- How to query with LogQL to find, aggregate, and turn logs into metrics
- How to build Grafana dashboards, link traces, and alert on log-derived metrics
Why Loki (and how it works)
Traditional log stacks index full text. Loki only indexes labels you choose, and stores raw log lines compressed in chunks. This makes it cheaper to run and scale.
Key concepts:
- Log stream: a set of logs that share the exact same label set (e.g.,
{app="payments", env="prod", pod="p-123"}) - Labels: key/value pairs used for indexing and filtering; keep these low-cardinality
- Chunks: compressed blocks of log lines stored in object storage; queries scan only the relevant chunks
- Queries: first filter by labels, then parse/filter lines, then optionally aggregate
Quickstart: Loki, Promtail, Grafana (Docker Compose)
Create three files side-by-side and run docker compose up -d.
docker-compose.yml:
version: "3.8"
services:
loki:
image: grafana/loki:2.9.4
command: -config.file=/etc/loki/config.yml
ports:
- "3100:3100"
volumes:
- ./loki-config.yml:/etc/loki/config.yml
- ./loki-data:/loki
promtail:
image: grafana/promtail:2.9.4
command: -config.file=/etc/promtail/config.yml
volumes:
- ./promtail-config.yml:/etc/promtail/config.yml
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
depends_on:
- loki
grafana:
image: grafana/grafana:10.4.0
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
depends_on:
- loki
loki-config.yml(single-node, local storage):
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2023-01-01
store: boltdb-shipper
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
promtail-config.yml(scrape local files and Docker logs):
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# Example: your app JSON logs
- job_name: app-logs
static_configs:
- targets: [localhost]
labels:
job: app
app: payments
env: dev
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
ts:
level:
msg:
req_id:
user_id:
duration_ms:
# Label only stable, low-cardinality fields
- labels:
level:
# Drop noisy lines (example: health checks)
- match:
selector: '{app="payments"}'
stages:
- drop:
source: msg
expression: 'healthcheck'
# Example: nginx access log
- job_name: nginx
static_configs:
- targets: [localhost]
labels:
job: nginx
env: dev
__path__: /var/log/nginx/access.log
pipeline_stages:
- regex:
expression: '^(?P<ip>\\S+) \\S+ \\S+ \\[ (?P<time>[^\\]]+) \\] "(?P<method>\\S+) (?P<path>\\S+) (?P<proto>[^\"]+)" (?P<status>\\d{3}) (?P<size>\\d+)'
# Label only small-cardinality fields
- labels:
status:
method:
Start the stack:
docker compose up -d
Then open Grafana at http://localhost:3000 (user: admin, pass: admin) and add a Loki data source pointing to http://loki:3100.
Structure your logs at the source (so queries are easy)
- Prefer JSON logs. Consistent keys beat regex.
- Include stable fields as labels via Promtail (env, app, service, region). Avoid labeling high-cardinality values such as user_id, req_id, path with IDs.
- Record durations, status, and identifiers in JSON fields (not labels). Parse and filter them at query time.
Example JSON log line (file under /var/log/app/app.log):
{"ts":"2026-04-08T12:34:56Z","level":"error","app":"payments","msg":"charge failed","req_id":"abc-123","user_id":"u-42","order_id":"o-77","duration_ms":352,"err":"insufficient funds"}
LogQL by example
Open Explore in Grafana, pick your Loki data source, and try these.
- Find recent errors for the payments app
{app="payments", env="dev"} | json | level="error"
- Count errors over 5 minutes (per app)
sum by (app) (
count_over_time({app="payments", env="dev"} | json | level="error" [5m])
)
- Error rate (lines/sec) over 5 minutes
sum by (app) (
rate({app="payments", env="dev"} | json | level="error" [5m])
)
- Show only the fields you care about
{app="payments"} | json | level="error" |
line_format "{{.ts}} {{.req_id}} {{.user_id}} {{.msg}}"
- Top 5 error messages in the last 10 minutes
{app="payments"} | json | level="error" |
label_format msg="{{.msg}}" |
topk(5, sum by (msg) (count_over_time({app="payments"} | json | level="error" | label_format msg="{{.msg}}" [10m])))
- Numeric aggregations from logs (unwrap a number field)
- Average latency over 5 minutes:
avg_over_time({app="payments"} | json | unwrap duration_ms [5m])
- P99 latency over 10 minutes:
quantile_over_time(0.99, {app="payments"} | json | unwrap duration_ms [10m])
- Nginx: count 5xx by path If you didn’t parse in Promtail, parse at query time with a pattern:
sum by (path) (
count_over_time(
{job="nginx"} |
pattern "<ip> - - [<ts>] \"<method> <path> <proto>\" <status> *" |
status =~ "5.." [10m]
)
)
- Nginx: 5xx error rate percentage
(
sum by (job)(rate({job="nginx"} | pattern "<ip> - - [<ts>] \"<method> <path> <proto>\" <status> *" | status =~ "5.." [5m]))
)
/
(
sum by (job)(rate({job="nginx"} [5m]))
)
Tips:
- Start with the tightest label filter you can, then parse. Labels shrink the search space.
- Prefer
| patternor| jsonover heavy regex. unwrapconverts a parsed field into a sample for numeric functions.
From logs to dashboards, links, and alerts in Grafana
- Dashboards
- Use a Logs panel to show parsed fields. Add Stat/Time series panels with queries that aggregate logs (e.g., error rate, P99 from
unwrap).
- Derived fields (click-to-trace)
- Settings > Data sources > Loki > Derived fields. Example:
- Name:
trace_id - Regex:
"trace_id":"([a-f0-9-]+)" - URL: link to your tracing system (e.g., Tempo). Now clicking a log with a trace_id opens the trace.
- Name:
- Alerts on logs
- Create a rule (Grafana Alerting or Loki ruler) based on a metrics-style LogQL expression. Example: high error ratio for payments:
groups:
- name: payments
rules:
- alert: HighErrorRate
expr: |
(
sum(rate({app="payments"} | json | level="error" [5m]))
)
/
(
sum(rate({app="payments"} [5m]))
) > 0.05
for: 10m
labels:
severity: page
annotations:
summary: "Payments error rate >5% for 10m"
Best practices (save money, gain signal)
- Labels: only stable, bounded-cardinality. Good:
env,app,job,namespace,pod,instance. Avoid:user_id,req_id, rawpathwith IDs. - Prefer structured logs (JSON) from the app. If not possible, use Promtail pipelines to parse.
- Drop noise at the edge. Promtail
dropandmatchstages can remove health checks and debug spam. - Storage: for production, use object storage (S3/GCS) with
boltdb-shipper. Set retention per tenant. - Query efficiency: narrow time ranges, filter by labels first, then parse, then aggregate. Avoid unbounded regex.
- Governance: scrub PII, set sensible log levels, rotate and retain based on compliance needs.
Troubleshooting
- No logs in Grafana Explore:
- Check Promtail targets at
http://localhost:9080/targetsand positions file. - Verify
clients.urlpoints to Loki and Loki is reachable. - Ensure your time range covers when logs were written.
- Check Promtail targets at
- Slow queries or 429s:
- Tighten label selectors and time ranges.
- Reduce cardinality (check the label browser in Explore).
- Prefer
json/patternover complex regex.
- Unexpectedly high cardinality:
- Audit Promtail
labelsstage. Remove dynamic fields.
- Audit Promtail
Cheat sheet
- Filter by labels and substring:
{app="api", env="prod"} |= "timeout"
- Parse JSON and filter on a field:
{job="app"} | json | level="warn"
- Count lines over a window:
count_over_time({app="api"} |= "ERROR" [10m])
- Rate (lines/sec):
rate({app="api"} [5m])
- Numeric from field:
max_over_time({app="api"} | json | unwrap duration_ms [15m])
With the right labels, structured logs, and a handful of LogQL patterns, Loki turns raw text into actionable dashboards and alerts without breaking the bank.