Anomaly Detection

Most detections answer a yes/no question: did SSH reach a production host? did this source hit more than 50 destinations? They need you to know the threshold in advance. Anomaly detection answers a different question — is this behaviour unusual for this host, compared to how it normally behaves? — and it works even when you don’t know what “normal” is, because obserae learns it for you.

This page explains the two tools obserae gives you:

Statistical operators in NFQL — functions like ENTROPY, STDDEV, MAD, SKEWNESS you can put in a query today to measure the shape of traffic. You read the numbers yourself.
The Anomaly alert condition — a rule type that learns a baseline per entity and fires automatically when a value drifts too far from it. No threshold to pick.

They compose: the operators let you explore and confirm a signal by hand; the anomaly condition turns a chosen signal into a self-tuning alert.

Prerequisite. Both build on STATS … BY … — grouping rows and aggregating per group. If STATS is new to you, read the Aggregation section of the NFQL guide first; everything below assumes you can write STATS n = COUNT(*) BY client_ip.

Part 1 — Statistical operators

These are aggregate functions: they go in a STATS clause and collapse the rows of each group into one number. All of them take a numeric column and return a decimal (DOUBLE) you can filter in HAVING.

Operator	What it measures, in one sentence	Reach for it when…
`COUNT_DISTINCT(col)`	How many different values appear in the group.	“how many distinct ports / peers / users?”
`ENTROPY(col)`	How spread out and unpredictable the values are (in bits).	scanning, spraying, fan-out — one source touching many different things
`STDDEV(col)`	How much the values vary around their average.	regularity vs. burstiness — steady beacons vs. bursty humans
`MAD(col)`	Like `STDDEV`, but ignores outliers.	the same, when a few huge values would distort `STDDEV`
`SKEWNESS(col)`	Whether the values lean left or right (asymmetry).	“are the gaps between events suspiciously even?” (beaconing)
`KURTOSIS(col)`	Whether the values are tightly clustered with rare extremes.	pinpointing a few outliers hidden in an otherwise flat series
`MEDIAN`, `PERCENTILE(col, n)`	The middle value / the n‑th percentile.	robust “typical” and “tail” values

The rest of this part builds intuition for the three most useful behavioural ones. Copy any example into the Query page.

`COUNT_DISTINCT` — how many different things

The simplest behavioural signal. A workstation normally talks to a handful of internal services; one reaching hundreds of distinct internal IPs in a minute is spreading.

FROM sessions | LAST 60
  | STATS peers = COUNT_DISTINCT(server_ip) BY client_ip
  | HAVING peers > 100
  | SORT peers DESC

COUNT_DISTINCT counts how many different peers. It says nothing about how the traffic is distributed across them — that’s what ENTROPY adds.

`ENTROPY` — how spread out and unpredictable

Entropy measures uncertainty. Think of it as: if I picked one packet at random from this group, how hard would it be to guess its value?

All traffic to one port → entropy 0 bits (no uncertainty: it’s always port 443).
Traffic evenly spread over many ports → high entropy (you can’t guess; a scan sweeping ports looks exactly like this).

Entropy is measured in bits: n equally-likely values give log2(n) bits — 2 values = 1 bit, 4 = 2 bits, 256 = 8 bits, 1024 = ~10 bits. A rough reading for port entropy per source:

Entropy (bits)	Interpretation
`~0`	one service — normal client
`1 – 3`	a handful of services — normal server or busy client
`> 4`	many ports, fairly evenly — port scan / fan-out

# ports-per-source scan signature
FROM flows | LAST 3600
  | STATS ports_entropy  = ENTROPY(dst_port),
          distinct_ports = COUNT_DISTINCT(dst_port) BY src_addr
  | HAVING ports_entropy > 3.0
  | SORT ports_entropy DESC

Why entropy and not just COUNT_DISTINCT? A host that makes 1000 connections to port 443 and one to port 22 has distinct_ports = 2 but entropy near 0 — clearly a normal client. A host hitting 200 ports once each has high entropy — clearly a scan. Entropy captures how evenly spread the connections are, which COUNT_DISTINCT cannot. Use them together: high entropy and high distinct-count is a strong scan signal.

The same idea catches lateral movement if you point it at destinations instead of ports:

FROM sessions | LAST 300
  | STATS dst_entropy = ENTROPY(server_ip) BY client_ip
  | HAVING dst_entropy > 4.0

`STDDEV` and `MAD` — how much things vary

STDDEV (standard deviation) measures spread around the average. Small = values hug the mean (regular); large = values are all over the place (bursty). It’s the backbone of beaconing detection: malware that phones home every 60 seconds produces gaps between connections with a tiny standard deviation, whereas a human browsing produces wildly varying gaps.

# candidate beacons: pairs whose per-bucket connection count barely varies
FROM sessions | LAST 86400
  | WHERE server_ip == "internet4"
  | STATS n = COUNT(*) BY client_ip, server_ip, BUCKET(opened_at, 300)
  | STATS buckets = COUNT(*), avg = AVG(n), sd = STDDEV(n) BY client_ip, server_ip
  | HAVING buckets >= 24 AND avg < 3 AND sd < 0.5
  | SORT sd ASC

This reads as: bucket every pair’s traffic into 5-minute slots over a day; a pair that appears in ≥ 24 slots, with few connections each (avg < 3) and almost no variation (sd < 0.5), is contacting the outside on a metronome — the beaconing signature.

MAD (median absolute deviation) measures the same “how much do values vary” but is robust to outliers. One freak value inflates STDDEV; MAD shrugs it off because it’s built from medians, not averages. Prefer MAD when a single spike would otherwise mask an otherwise-steady pattern.

`SKEWNESS` and `KURTOSIS` — the finer shape

These describe subtler features and are mostly for advanced beaconing / outlier work:

SKEWNESS — asymmetry. Zero means the values are symmetric around the mean; the inter-arrival gaps of a rigid beacon are very symmetric, human traffic is not.
KURTOSIS — “tailedness”. High kurtosis means most values are tightly clustered with a few extreme outliers — useful to surface the rare big transfer hiding in an otherwise flat series.

You rarely need these two alone; they refine a STDDEV/ENTROPY-based score when you’re building a bespoke behavioural detector.

All of these are decimals. Because NFQL has decimal literals, every statistical result is filterable with a fractional threshold in HAVING — HAVING sd < 0.5, HAVING ports_entropy > 3.0. That’s what makes them usable as detections rather than just display columns.

Part 2 — The `Anomaly` alert condition

The operators above make you pick the threshold (> 3.0, < 0.5). But “normal” differs per host: a backup server legitimately sends gigabytes; a workstation sending 100 MB is alarming. One global number can’t fit both. The Anomaly condition solves this by learning a separate baseline for every entity and firing when a value strays from its own normal.

How it works, conceptually

You give an anomaly rule a query that produces one numeric value per entity per run, for example outbound bytes per host:

FROM sessions | LAST 900
  | WHERE server_ip == "internet4" OR server_ip == "internet6"
  | STATS out = SUM(client_to_server_bytes) BY client_ip

Then, on the rule, you set:

Metric = out (the numeric column to watch),
Group by = client_ip (the entity — one baseline per host),
condition = Anomaly.

client_to_server_bytes is the true client→server (upload) direction. Sessions are stored internally in a canonical low-IP-first order, but NFQL exposes them reoriented to client/server via the inferred server — so the rule credits the requester’s upload, not a big download from the Internet, and never the wrong endpoint.

For each host, obserae keeps a running average and a running measure of spread (variance), updated a little on every run. When a new value arrives it computes a z-score — how many standard deviations away from this host’s normal is it?

z = (value − average) / spread

It fires when |z| exceeds your sensitivity k — in either direction (a spike or an unusual drop). A host that normally sends a few MB and suddenly ships a gigabyte has a huge positive z and fires, even though it never approaches the fixed threshold a busy server needs.

The moving baseline (EWMA)

The average and spread are exponentially weighted (EWMA): recent runs count more than old ones, so the baseline follows genuine trends (a host that grows busier over weeks) without needing to store any history. Per host it’s just three numbers — an average, a spread, and a sample count — so it stays cheap and bounded no matter how long it runs.

The three settings

Setting	What it does	Default	Turn it…
k (sensitivity)	How many standard deviations count as anomalous.	`3.0`	up for fewer, stronger alerts; down to catch subtler drift
α (smoothing) — EWMA only	How fast the baseline adapts to new values (0–1).	`0.1`	up to react quickly to trend; down for a longer, steadier memory
Window N — median+MAD only	How many recent values the median/MAD is computed over (8–256).	`32`	up for a steadier baseline; down to adapt faster
Baseline method	EWMA (smoothed mean), Median + MAD (robust), or Seasonal (per hour × weekday).	`EWMA`	pick per the Choosing a baseline method guide below
Warm-up samples	How many values it learns per entity before it may fire.	`10`	up if early data is noisy; keep ≥ a few so a cold start doesn’t false-alarm

k = 3 corresponds to the classic “3-sigma” rule: for roughly bell-shaped data, ~99.7% of normal values fall within 3 standard deviations, so k = 3 alerts on the ~0.3% that don’t.

Tuning: how each parameter behaves

Every knob trades one thing for another. Understanding the direction of each trade-off is most of what you need to tune a rule.

k — sensitivity (the alert threshold). A value fires when its z-score |value − center| / spread exceeds k. Higher k → fewer, stronger alerts (fewer false positives, but you may miss subtle drift); lower k → catches smaller deviations (more sensitive, more false positives). It’s the knob you reach for first when there’s too much or too little noise. k works identically for all three methods.
α — smoothing (EWMA and Seasonal only), 0 < α < 1. How much weight the newest value gets. Higher α (e.g. 0.3) → the baseline adapts fast to a new regime, but it also forgets fast — a slow, creeping anomaly can be absorbed into “normal” before it ever crosses k. Lower α (e.g. 0.05) → a long, stable memory that won’t be fooled by a slow attack, but is slower to follow a legitimate change of regime and stays twitchy longer. Rule of thumb: start at 0.1; lower it if slow drift is being learned away, raise it if the baseline lags a real, permanent change.
Window N — Median + MAD only (8–256). How many recent values the median and MAD are computed over. Larger N → a steadier, more robust baseline that tolerates more outliers but adapts slowly; smaller N → more reactive but noisier. Constraint: warm-up ≤ N (you can’t require more samples than the window holds).
Warm-up samples — the anti-flood guard. The rule learns but never fires until an entity has been seen this many times. Keep it at a few so a brand-new host or a fresh deployment doesn’t false-alarm before “normal” is established; raise it if the first few observations are typically noisy. Combined with freeze-on-fire (below), it keeps the baseline anchored to genuinely normal behaviour.
Max keys — the memory bound. The maximum number of distinct entities a rule tracks. Past it, new keys are ignored and a single meta-alert warns you to narrow the query. It caps RAM and priming cost; the default is lowered for Seasonal (168× the state per entity). Raise it only when you deliberately track many entities and have the memory for it.

Symptom → fix. When a rule is misbehaving, work from the symptom:

Symptom	Likely cause	Try
Too many alerts	k too low, or a naturally-constant metric with ~0 spread	Raise k; or pick a metric with natural variation
Alerts on ordinary bursts	a legitimate spike inflates the baseline / spread	Switch to Median + MAD
Misses a real anomaly hiding behind a past spike	one big past value masks the next (EWMA)	Switch to Median + MAD
Slow, creeping anomaly never fires	α too high — the drift was learned as normal	Lower α
Baseline lags a real, permanent change of regime	α too low / window N too large	Raise α, or lower N
Fires every morning / every Monday	a legitimate time-of-week pattern	Switch to Seasonal
Never leaves warm-up / learns nothing	too few observations, cadence too slow, or entities > Max keys	Widen the query window / cadence; narrow the entity set or raise Max keys

A perfectly flat baseline is sensitive by design. If a metric has never varied its spread is ~0, so any change is “many standard deviations” and fires. For “bytes out” or “peer count” that’s usually what you want; if it’s too twitchy for a naturally-constant metric, raise k or choose a metric with some natural variation.

Choosing a baseline method

The Baseline method setting picks how “normal” is measured. All three are two-sided (spike and dip), silently warm up, and freeze on fire (below). They differ only in what “normal” means and what pulls it off course. You can switch a rule’s method at any time from the page — it resets the learned baseline so the new method relearns from scratch.

EWMA — the default, for smoothly-drifting metrics

Normal is an exponentially-weighted moving average and variance: recent runs weigh more than old ones, so the baseline follows a genuine trend without storing any history (three numbers per entity — mean, spread, count).

Strengths. Cheap, bounded, adapts smoothly, tuned with a single knob (α). Great default for volumes and counts that grow or shrink gradually.
Weakness. It has no memory of the shape of past values, only their weighted average — so a single very large past value pulls the mean up, which can mask a later spike (the average already looks high). It also has no idea of time-of-day: a nightly backup makes it alert every night.
Reach for it when the metric drifts slowly and doesn’t have legitimate recurring spikes: bytes out per host, number of peers, sessions per minute on a steady server.

NDR example. A workstation that normally sends ~50 MB/day to the Internet and slowly climbs to ~120 MB over months: EWMA follows the climb (no false alarm on the trend) but fires the day it suddenly ships 4 GB.

Median + MAD — robust, for metrics with legitimate spikes

Normal is the median of the last N values, and the spread is the median absolute deviation (MAD, scaled by 1.4826 to match a standard deviation). Medians ignore outliers, so a freak value neither raises the baseline nor hides the next anomaly.

Strengths. Immune to the “one big value masks the next spike” trap that bites EWMA. Exactly what you want when the metric is occasionally spiky by design — a weekly backup, a nightly batch, a monthly report job.
Weakness. Keeps a bounded window (N values) instead of three scalars, so it costs a little more memory and reacts a step more slowly. It still has no notion of time-of-day.
Reach for it when EWMA either misses spikes hiding behind an earlier outlier or alerts too often on ordinary bursts. Tuned with the window N (8–256, default 32) instead of α — see the tuning guide above.

NDR example. A backup server whose nightly transfer is normally ~2 GB but once a quarter legitimately hits ~30 GB. Under EWMA that quarterly spike inflates the mean for weeks and can mask a real exfil in between; under Median + MAD the 30 GB is one outlier the median shrugs off, so a genuine anomaly still stands out.

Seasonal — for metrics with a daily/weekly rhythm

Normal is a separate EWMA baseline per time-of-week slot — 24 hours × 7 days = 168 little baselines per entity. An observation is compared only to the same hour and weekday, so a load that is high every weekday morning (backups, market open, the 9-to-5 office pattern) is learned as normal for that slot and no longer alerts every time — while the same value at 3 a.m. Sunday still does.

Strengths. The only method that understands “high, but normal for this hour”. Kills the recurring-pattern false positives EWMA and MAD can’t.
Weaknesses. Needs several weeks of data to prime all 168 slots (each warms up independently — expect a long ramp-up), and holds ~168× more state per entity. obserae therefore lowers the default Max keys for seasonal rules; raise it deliberately if you track many entities. Tuned with α and k like EWMA.
Reach for it when the metric has a strong, legitimate daily or weekly shape you keep alerting on by mistake.

NDR example. DNS query volume from an internal resolver: it’s high every weekday 9-18h and near-zero at night. EWMA fires every morning; seasonal learns the weekday-daytime peak as normal and only fires on a midnight surge — the classic beaconing / after-hours-exfil hour.

Quick pick

Your metric’s profile	Method	Main knob	Memory / ramp-up
Drifts slowly, no recurring spikes	EWMA	α	3 numbers · seconds
Has legitimate occasional spikes (backups, batch)	Median + MAD	window N	N values · minutes
Has a strong daily/weekly rhythm	Seasonal	α	~168 slots · weeks

Rule of thumb: start with EWMA; move a noisy metric to Median + MAD if a legitimate spike is masking real anomalies or generating noise; switch to Seasonal only when a genuine time-of-week pattern is the source of the false positives (and you can afford the multi-week priming).

Two guarantees that keep it honest

Silent warm-up. Until a host has been seen Warm-up samples times, the rule learns but never fires. A brand-new deployment won’t flood you while it’s still figuring out what normal looks like — the same “learn first” behaviour as First seen and Heartbeat.
Freeze on fire. When a value fires, it is reported but not folded into the baseline it broke from. One big spike therefore can’t quietly drag “normal” upward and hide the next one — the baseline stays anchored to genuinely normal behaviour.

And like every grouped rule, the per-entity state is capped by Max keys and pruned by retention, so it can’t grow without bound.

Where the baseline lives (and why it stays light)

The learned baselines are held in memory while obserae runs — that is the source of truth the evaluator reads and updates on each cycle, with no database round-trip. Two things keep memory bounded: Max keys caps how many entities a rule tracks, and each entity’s baseline is a fixed, tiny amount of state (a few numbers), never a growing history.

For durability, obserae snapshots the baselines to compact Parquet files (one per rule) on a periodic cadence and again at a clean shutdown, and reloads them at boot. This deliberately keeps the baselines out of the main database file — so learning that grows with your host count never bloats the DB or its checkpoints — and matches how obserae already stores flows and sessions. A crash between snapshots loses at most the most recent learning (the baseline simply re-learns it); alert cooldowns are stored transactionally, so you never get a duplicate alert after a restart.

Retention prunes an entity’s baseline once it has been silent longer than the rule’s retention window (the same window that prunes its per-key state), and the on-disk size of the baseline store is shown on the Storage page alongside flows, sessions and backups.

Reference recipes

These nine ship as the std.anomaly rule set — install it from the Rule Sets page and all nine land ready to run (see below). The recipes below are the manual equivalent (and a starting point to adapt). They all group by the inferred client_ip (the requester) and sum the directional client columns (client_to_server_bytes and its peers). Sessions are stored internally in a canonical low-IP-first order, but that ordering is not exposed to NFQL and does not track who the client is — the recipes always orient on client_ip / server_ip. They reference only built-in address keywords (internet4/6, internal4/6), so they run on a fresh install with no cartography setup.

Adaptive exfiltration — client upload volume to the Internet, no fixed cap (MITRE T1041):

Query: FROM sessions | LAST 900 | WHERE server_ip == "internet4" OR server_ip == "internet6" | STATS out = SUM(client_to_server_bytes) BY client_ip
Rule: Anomaly, metric out, group by client_ip, k 3, α 0.1, warm-up 12.

Egress fan-out — a host reaching unusually many distinct Internet servers, a C2 / beaconing breadth signal (MITRE T1071):

Query: FROM sessions | LAST 900 | WHERE server_ip == "internet4" OR server_ip == "internet6" | STATS peers = COUNT_DISTINCT(server_ip) BY client_ip
Rule: Anomaly, metric peers, group by client_ip, k 3.

DNS exfiltration — abnormal DNS session count per host, DNS tunnelling (MITRE T1048.003):

Query: FROM sessions | LAST 900 | WHERE server_port == 53 | STATS lookups = COUNT(*) BY client_ip
Rule: Anomaly, metric lookups, group by client_ip, k 3.

Lateral spread — a workstation reaching unusually many internal peers (MITRE T1021):

Query: FROM sessions | LAST 900 | WHERE server_ip == "internal4" OR server_ip == "internal6" | STATS peers = COUNT_DISTINCT(server_ip) BY client_ip
Rule: Anomaly, metric peers, group by client_ip, k 3.

Lateral admin surge — a host reaching unusually many internal servers on admin ports (SSH/SMB/RDP/WinRM) (MITRE T1021):

Query: FROM sessions | LAST 900 | WHERE (server_ip == "internal4" OR server_ip == "internal6") AND server_port IN (22, 445, 3389, 5985, 5986) | STATS admin_peers = COUNT_DISTINCT(server_ip) BY client_ip
Rule: Anomaly, metric admin_peers, group by client_ip, k 3.

Port scan (vertical) — a source hitting many distinct ports on one host (MITRE T1046):

Query: FROM sessions | LAST 300 | STATS ports = COUNT_DISTINCT(server_port) BY client_ip, server_ip
Rule: Anomaly, metric ports, group by client_ip, server_ip, k 3, max keys 20000.

Host sweep (horizontal) — a source hitting many distinct hosts on one port (MITRE T1046/T1018):

Query: FROM sessions | LAST 300 | STATS hosts = COUNT_DISTINCT(server_ip) BY client_ip, server_port
Rule: Anomaly, metric hosts, group by client_ip, server_port, k 3, max keys 20000.

Half-open surge — a source producing many no-reply TCP connections, SYN scanning (MITRE T1595.001):

Query: FROM sessions | LAST 300 | WHERE close_reason == "no_reply" AND protocol == TCP | STATS dead = COUNT_DISTINCT(server_ip) BY client_ip
Rule: Anomaly, metric dead, group by client_ip, k 3.

Auth brute force — abnormal session count to auth/admin ports per source→server (MITRE T1110):

Query: FROM sessions | LAST 300 | WHERE server_port IN (22, 3389, 445, 5985, 5986, 389, 636, 1433, 3306, 5432) AND protocol == TCP | STATS attempts = COUNT(*) BY client_ip, server_ip
Rule: Anomaly, metric attempts, group by client_ip, server_ip, k 3, max keys 20000.

The Anomaly Detection page

Analysis → Anomaly Detection is where you turn the engine on and watch it work. This page is only about statistical (anomaly) rules — deterministic rules live on the Rules page, and the two lists never mix. A fresh install ships no anomaly rules; the two fastest ways to start are (1) install the std.anomaly rule set from the Rule Sets page, which lands nine ready detectors covering exfiltration, C2, lateral movement, scanning and brute force, or (2) click New anomaly rule to author your own.

Environment overview

A compact bar summarises the whole engine at a glance: how many anomaly rules exist and how many are active (learning has primed), learning (enabled, still warming up) or off; the total number of entities tracked across every baseline; and how many anomalies fired in the last 24 hours. Two small charts sit at the right — a 7-day timeline of fires and a by-severity breakdown — so a burst of anomalies is obvious without opening a single rule. (The activity charts reflect the most recent 5000 alerts.) The Rules page carries the same style of compact bar for deterministic rules.

The std.anomaly rule set

Rather than a per-page catalog, the starter detectors ship as a rule set you install from Rule Sets (like std.community/std.enterprise). Installing std.anomaly creates all nine as pack-owned rules that appear here immediately and start learning; you can enable/disable each, but editing a pack-owned rule is done by duplicating it (the pack file is the source of truth). Every detector is client/server-correct — it groups by the inferred client_ip and sums the true client direction — and is mapped to MITRE ATT&CK and the detection obligations of NIS2 (Art. 21), DORA (Art. 10), CIS Controls v8 (Control 13, Network Monitoring & Defense) and SOC 2 (CC7.2 monitoring, CC6.x logical access). The nine are:

Detector	Detects	MITRE	Compliance
Adaptive exfiltration	client upload to the Internet far above its own norm	T1041 / T1030	NIS2 · DORA · CIS 13.3 · SOC2 CC7.2
Egress fan-out	a host reaching unusually many distinct Internet servers (C2 breadth)	T1071 / T1571	NIS2 · DORA · CIS 13 · SOC2 CC7.2
DNS exfiltration	abnormal DNS session volume per host (DNS tunnelling)	T1048.003 / T1071.004	NIS2 · CIS 13.9 · SOC2 CC7.2
Lateral spread	a host reaching unusually many internal peers	T1021 / T1210	NIS2 · DORA · CIS 13.4 · SOC2 CC7.2
Lateral admin surge	a host reaching many internal admin services (SSH/SMB/RDP/WinRM)	T1021	NIS2 · CIS 13.4 · SOC2 CC6.1
Port scan (vertical)	many distinct ports probed on one host	T1046	NIS2 · CIS 13 · SOC2 CC7.2
Host sweep (horizontal)	many distinct hosts probed on one port	T1046 / T1018	NIS2 · CIS 13 · SOC2 CC7.2
Half-open surge	many no-reply TCP connections (SYN scan)	T1595.001	NIS2 · CIS 13 · SOC2 CC7.2
Auth brute force	abnormal session count to auth/admin ports	T1110	NIS2 · DORA · CIS 13.6 · SOC2 CC6.1/CC6.6

They reference only built-in address keywords, so they compile and run on a fresh install with no cartography setup. Each learns silently for its warm-up window before it can fire, so expect no day-one alerts.

Migrating from the old in-page detector catalog. Earlier builds let you enable these detectors from a per-page catalog, which created them as ordinary in-place anomaly rules (named Adaptive exfiltration, Lateral spread, Scan surge). That catalog is gone — the detectors now ship as the std.anomaly pack. If an instance still carries the old in-place rules, install std.anomaly from Rule Sets and delete the three in-place duplicates (otherwise each detector runs twice, with separate baselines). If you deploy through a config bundle, do the same in the file: declare std.anomaly under rule_sets.packs and remove the three anomaly rules — together with their paired … (detector) saved queries — from the alerting section. Leaving a rule that references a query you removed makes the whole bundle fail to import. Pack rules relearn their baselines from cold.

Creating and editing rules in place

New anomaly rule opens a modal on this page (no redirect): pick a saved query with the searchable picker, choose the numeric metric and the group-by entity, set k / α (or the window) and the baseline method, and save. Edit rule in a rule’s drawer opens the same modal. A filter box narrows the list by name, metric, method or group-by key — the same search the Rules page offers.

Rules, baselines and per-entity visualisations

Every anomaly rule is listed with its baseline method, how many entities it tracks, whether it has started learning, and when it last fired — with an on/off switch on each row. A drawer opens when you select a rule:

A baseline-method switch (EWMA / Median + MAD / Seasonal) lets you retune the estimator without leaving the page. Switching resets the rule’s learned baseline (an EWMA mean is meaningless once the rule is seasonal), so it asks for confirmation and the rule then relearns from cold.
An Activity heatmap button opens a large modal (entities × time, coloured by deviation) surfacing which hosts strayed from their own normal and when — see The rule activity heatmap below.
Baselines — one row per entity, showing the learned normal (EWMA: mean ± stddev; Median + MAD: median · window length; Seasonal: primed slots / 168), the sample count, and whether it is still in warm-up or active. This is where you watch the engine learn: a freshly enabled rule shows entities in warm-up flipping to active as data arrives. (Very large rules show the 500 most-recently-active entities.)
Recent fires — the anomalies that fired in the last 7 days, with the observed value and the entity — so you can confirm the rule catches what you expect before it pages anyone.

The charts open large, in modals. Both visualisations below open in a full-size modal (they use the whole screen) and are drawn with real, zoomable axes, hover tooltips and a colour-scale legend — a drawer-sized thumbnail can’t convey a metric that spans orders of magnitude. Press Esc or click outside to close.

The rule activity heatmap

The drawer’s Activity heatmap button opens a heatmap of one row per entity (the most anomalous first, up to 50), one column per time bucket over the last 7 days, each cell coloured by how far that entity strayed from its own normal at that moment (its z-score — blue = below normal, dark = normal, red = above). Cells where the rule fired are outlined, and a colour-scale legend sits under the grid. It answers which host deviated, and when, at a glance — a red streak across one row is an entity spiking; a single outlined cell is a one-off fire. Hover a cell for its exact z and timestamp, and click a cell to jump straight into that entity’s chart. (A perfectly flat baseline produces a huge z, so the colour is clamped to a readable range — the tooltip still shows the true value.)

The heatmap is reconstructed on demand by replaying the rule’s query bucketed over the window (see below) — it needs no stored history. A rule whose query can’t be replayed this way shows no heatmap button.

Reading the per-entity chart

Click an entity in the Baselines table (or a cell in the activity heatmap) to open its chart. The observed-vs-envelope timeline is the default for every method — EWMA, Median + MAD and Seasonal:

The white line is the observed metric over time; the shaded band is the moving normal envelope — centre ± k·spread recomputed at every bucket (mean ± k·σ for EWMA; median ± k·1.4826·MAD for Median + MAD; the per-time-of-week mean ± k·σ for Seasonal) — and the dashed line is the centre. Now you can see the problem: the line riding inside the band is normal; where it punches out — and how far — is the anomaly.
Deviations are graded by severity. Every bucket whose value left the band is marked, coloured amber → orange → red by how far past the threshold it sits (a superset of what actually fired — it shows what the algorithm considers abnormal). The buckets that truly paged get an extra ring on the line.
The warm-up span — while the rule is still learning and never fires — is shaded and labelled learning, so an empty band up front reads correctly.
Hover any point for a card with the observed value, the expected centre, the band edges, the z-score (in σ) and a plain-language status.
12 h / 24 h / 7 d buttons switch the display window (the caption shows the bucket span and point count); a Linear / Log toggle switches the value axis (log makes a few-kB baseline and a multi-MB spike both readable); drag to zoom into a window.
Seasonal rules add a Weekly grid toggle: a 24 × 7 heatmap (hour of day × weekday) whose cell colour encodes the learned mean for each time-of-week slot — the entity’s weekly rhythm made visible; a slot never observed stays empty.

If the window genuinely has no data the chart shows a short empty-state note (widen the window, or send traffic). If the query can’t be replayed at all, the chart falls back to the current (frozen) envelope with the fires on it.

Editing a rule’s full parameters still happens in the rule editor (the Edit rule link, or the Rules page).

How the series and heatmap are reconstructed (and its limits)

obserae stores only each entity’s current baseline and its fires — never a per-tick history of observed values. Both the per-entity series and the rule heatmap are therefore rebuilt on the fly: obserae replays the rule’s query with a time BUCKET added to its STATS … BY, so one query returns the observed value per entity per bucket over the display window you pick (12 h / 24 h / 7 d) — which overrides the rule’s own live window, so a rule scanning LAST 900 still charts a full day — and then re-folds the same estimator the engine uses to redraw the envelope that would have applied at each bucket. Nothing new is stored, and the scan is bounded (the display window, a bucket width chosen to stay well under a few hundred points, and a top-50 cap on the heatmap’s rows).

Two honest caveats follow from this:

It is a reconstruction at the chosen display bucket, not a replay of the historical ticks: the moving envelope is what the estimator would have learned from these buckets, which closely tracks — but need not exactly equal — the live baseline the rule built at its own cadence. The fires shown are the authoritative ones from the journal.
Only rules whose query ends in a plain STATS <metric> = <agg>(…) BY <entity> can be replayed (a pivot cascade, a trailing SORT/HAVING, or a source with no time column cannot). Those rules keep the frozen-envelope band and skip the heatmap — everything else on the page is unchanged.

Which do I use?

Explore / confirm by hand → statistical operators in a query. You see the numbers and choose a threshold. Great for one-off hunts and for understanding a signal before you automate it.
Alert on a fixed, known limit → a Threshold rule (see Alerting). Best when the limit is the same for everyone (e.g. “no host should ever hit > 500 distinct ports”).
Alert on “unusual for this entity” → an Anomaly rule. Best when normal differs per host and you’d rather not maintain a threshold per host.

A common workflow: hunt with the operators, find a signal, then wrap it in an anomaly rule so obserae watches it per-entity from then on.

See also: the NFQL guide for the full query syntax, and Alerting for cadence, cooldown, per-entity grouping and how alerts are routed to your outputs.