Anomaly Detection
Most detections answer a yes/no question: did SSH reach a production host? did this source hit more than 50 destinations? They need you to know the threshold in advance. Anomaly detection answers a different question — is this behaviour unusual for this host, compared to how it normally behaves? — and it works even when you don’t know what “normal” is, because obserae learns it for you.
This page explains the two tools obserae gives you:
- Statistical operators in NFQL — functions like
ENTROPY,STDDEV,MAD,SKEWNESSyou can put in a query today to measure the shape of traffic. You read the numbers yourself. - The
Anomalyalert condition — a rule type that learns a baseline per entity and fires automatically when a value drifts too far from it. No threshold to pick.
They compose: the operators let you explore and confirm a signal by hand; the anomaly condition turns a chosen signal into a self-tuning alert.
Prerequisite. Both build on
STATS … BY …— grouping rows and aggregating per group. IfSTATSis new to you, read the Aggregation section of the NFQL guide first; everything below assumes you can writeSTATS n = COUNT(*) BY client_ip.
Part 1 — Statistical operators
These are aggregate functions: they go in a STATS clause and collapse
the rows of each group into one number. All of them take a numeric
column and return a decimal (DOUBLE) you can filter in HAVING.
| Operator | What it measures, in one sentence | Reach for it when… |
|---|---|---|
COUNT_DISTINCT(col) | How many different values appear in the group. | “how many distinct ports / peers / users?” |
ENTROPY(col) | How spread out and unpredictable the values are (in bits). | scanning, spraying, fan-out — one source touching many different things |
STDDEV(col) | How much the values vary around their average. | regularity vs. burstiness — steady beacons vs. bursty humans |
MAD(col) | Like STDDEV, but ignores outliers. | the same, when a few huge values would distort STDDEV |
SKEWNESS(col) | Whether the values lean left or right (asymmetry). | “are the gaps between events suspiciously even?” (beaconing) |
KURTOSIS(col) | Whether the values are tightly clustered with rare extremes. | pinpointing a few outliers hidden in an otherwise flat series |
MEDIAN, PERCENTILE(col, n) | The middle value / the n‑th percentile. | robust “typical” and “tail” values |
The rest of this part builds intuition for the three most useful behavioural ones. Copy any example into the Query page.
COUNT_DISTINCT — how many different things
The simplest behavioural signal. A workstation normally talks to a handful of internal services; one reaching hundreds of distinct internal IPs in a minute is spreading.
FROM sessions | LAST 60
| STATS peers = COUNT_DISTINCT(server_ip) BY client_ip
| HAVING peers > 100
| SORT peers DESC
COUNT_DISTINCT counts how many different peers. It says nothing about
how the traffic is distributed across them — that’s what ENTROPY
adds.
ENTROPY — how spread out and unpredictable
Entropy measures uncertainty. Think of it as: if I picked one packet at random from this group, how hard would it be to guess its value?
- All traffic to one port → entropy 0 bits (no uncertainty: it’s always port 443).
- Traffic evenly spread over many ports → high entropy (you can’t guess; a scan sweeping ports looks exactly like this).
Entropy is measured in bits: n equally-likely values give log2(n)
bits — 2 values = 1 bit, 4 = 2 bits, 256 = 8 bits, 1024 = ~10 bits. A
rough reading for port entropy per source:
| Entropy (bits) | Interpretation |
|---|---|
~0 | one service — normal client |
1 – 3 | a handful of services — normal server or busy client |
> 4 | many ports, fairly evenly — port scan / fan-out |
# ports-per-source scan signature
FROM flows | LAST 3600
| STATS ports_entropy = ENTROPY(dst_port),
distinct_ports = COUNT_DISTINCT(dst_port) BY src_addr
| HAVING ports_entropy > 3.0
| SORT ports_entropy DESC
Why entropy and not just COUNT_DISTINCT? A host that makes 1000
connections to port 443 and one to port 22 has distinct_ports = 2
but entropy near 0 — clearly a normal client. A host hitting 200 ports
once each has high entropy — clearly a scan. Entropy captures how evenly
spread the connections are, which COUNT_DISTINCT cannot. Use them
together: high entropy and high distinct-count is a strong scan
signal.
The same idea catches lateral movement if you point it at destinations instead of ports:
FROM sessions | LAST 300
| STATS dst_entropy = ENTROPY(server_ip) BY client_ip
| HAVING dst_entropy > 4.0
STDDEV and MAD — how much things vary
STDDEV (standard deviation) measures spread around the average.
Small = values hug the mean (regular); large = values are all over the
place (bursty). It’s the backbone of beaconing detection: malware that
phones home every 60 seconds produces gaps between connections with a
tiny standard deviation, whereas a human browsing produces wildly
varying gaps.
# candidate beacons: pairs whose per-bucket connection count barely varies
FROM sessions | LAST 86400
| WHERE server_ip == "internet4"
| STATS n = COUNT(*) BY client_ip, server_ip, BUCKET(opened_at, 300)
| STATS buckets = COUNT(*), avg = AVG(n), sd = STDDEV(n) BY client_ip, server_ip
| HAVING buckets >= 24 AND avg < 3 AND sd < 0.5
| SORT sd ASC
This reads as: bucket every pair’s traffic into 5-minute slots over a
day; a pair that appears in ≥ 24 slots, with few connections each
(avg < 3) and almost no variation (sd < 0.5), is contacting the
outside on a metronome — the beaconing signature.
MAD (median absolute deviation) measures the same “how much do values
vary” but is robust to outliers. One freak value inflates STDDEV;
MAD shrugs it off because it’s built from medians, not averages. Prefer
MAD when a single spike would otherwise mask an otherwise-steady
pattern.
SKEWNESS and KURTOSIS — the finer shape
These describe subtler features and are mostly for advanced beaconing / outlier work:
SKEWNESS— asymmetry. Zero means the values are symmetric around the mean; the inter-arrival gaps of a rigid beacon are very symmetric, human traffic is not.KURTOSIS— “tailedness”. High kurtosis means most values are tightly clustered with a few extreme outliers — useful to surface the rare big transfer hiding in an otherwise flat series.
You rarely need these two alone; they refine a STDDEV/ENTROPY-based
score when you’re building a bespoke behavioural detector.
All of these are decimals. Because NFQL has decimal literals, every statistical result is filterable with a fractional threshold in
HAVING—HAVING sd < 0.5,HAVING ports_entropy > 3.0. That’s what makes them usable as detections rather than just display columns.
Part 2 — The Anomaly alert condition
The operators above make you pick the threshold (> 3.0, < 0.5). But
“normal” differs per host: a backup server legitimately sends gigabytes;
a workstation sending 100 MB is alarming. One global number can’t fit
both. The Anomaly condition solves this by learning a separate
baseline for every entity and firing when a value strays from its own
normal.
How it works, conceptually
You give an anomaly rule a query that produces one numeric value per entity per run, for example outbound bytes per host:
FROM sessions | LAST 900
| WHERE server_ip == "internet4" OR server_ip == "internet6"
| STATS out = SUM(client_to_server_bytes) BY client_ip
Then, on the rule, you set:
- Metric =
out(the numeric column to watch), - Group by =
client_ip(the entity — one baseline per host), - condition = Anomaly.
client_to_server_bytes is the true client→server (upload) direction. Sessions
are stored internally in a canonical low-IP-first order, but NFQL exposes them
reoriented to client/server via the inferred server — so the rule credits the
requester’s upload, not a big download from the Internet, and never the wrong
endpoint.
For each host, obserae keeps a running average and a running measure of spread (variance), updated a little on every run. When a new value arrives it computes a z-score — how many standard deviations away from this host’s normal is it?
z = (value − average) / spread
It fires when |z| exceeds your sensitivity k — in either
direction (a spike or an unusual drop). A host that normally sends a
few MB and suddenly ships a gigabyte has a huge positive z and fires,
even though it never approaches the fixed threshold a busy server needs.
The moving baseline (EWMA)
The average and spread are exponentially weighted (EWMA): recent runs count more than old ones, so the baseline follows genuine trends (a host that grows busier over weeks) without needing to store any history. Per host it’s just three numbers — an average, a spread, and a sample count — so it stays cheap and bounded no matter how long it runs.
The three settings
| Setting | What it does | Default | Turn it… |
|---|---|---|---|
| k (sensitivity) | How many standard deviations count as anomalous. | 3.0 | up for fewer, stronger alerts; down to catch subtler drift |
| α (smoothing) — EWMA only | How fast the baseline adapts to new values (0–1). | 0.1 | up to react quickly to trend; down for a longer, steadier memory |
| Window N — median+MAD only | How many recent values the median/MAD is computed over (8–256). | 32 | up for a steadier baseline; down to adapt faster |
| Baseline method | EWMA (smoothed mean), Median + MAD (robust), or Seasonal (per hour × weekday). | EWMA | pick per the Choosing a baseline method guide below |
| Warm-up samples | How many values it learns per entity before it may fire. | 10 | up if early data is noisy; keep ≥ a few so a cold start doesn’t false-alarm |
k = 3 corresponds to the classic “3-sigma” rule: for roughly bell-shaped
data, ~99.7% of normal values fall within 3 standard deviations, so k = 3
alerts on the ~0.3% that don’t.
Tuning: how each parameter behaves
Every knob trades one thing for another. Understanding the direction of each trade-off is most of what you need to tune a rule.
k — sensitivity (the alert threshold). A value fires when its z-score
|value − center| / spreadexceedsk. Higher k → fewer, stronger alerts (fewer false positives, but you may miss subtle drift); lower k → catches smaller deviations (more sensitive, more false positives). It’s the knob you reach for first when there’s too much or too little noise.kworks identically for all three methods.α — smoothing (EWMA and Seasonal only), 0 < α < 1. How much weight the newest value gets. Higher α (e.g. 0.3) → the baseline adapts fast to a new regime, but it also forgets fast — a slow, creeping anomaly can be absorbed into “normal” before it ever crosses
k. Lower α (e.g. 0.05) → a long, stable memory that won’t be fooled by a slow attack, but is slower to follow a legitimate change of regime and stays twitchy longer. Rule of thumb: start at0.1; lower it if slow drift is being learned away, raise it if the baseline lags a real, permanent change.Window N — Median + MAD only (8–256). How many recent values the median and MAD are computed over. Larger N → a steadier, more robust baseline that tolerates more outliers but adapts slowly; smaller N → more reactive but noisier. Constraint: warm-up ≤ N (you can’t require more samples than the window holds).
Warm-up samples — the anti-flood guard. The rule learns but never fires until an entity has been seen this many times. Keep it at a few so a brand-new host or a fresh deployment doesn’t false-alarm before “normal” is established; raise it if the first few observations are typically noisy. Combined with freeze-on-fire (below), it keeps the baseline anchored to genuinely normal behaviour.
Max keys — the memory bound. The maximum number of distinct entities a rule tracks. Past it, new keys are ignored and a single meta-alert warns you to narrow the query. It caps RAM and priming cost; the default is lowered for Seasonal (168× the state per entity). Raise it only when you deliberately track many entities and have the memory for it.
Symptom → fix. When a rule is misbehaving, work from the symptom:
| Symptom | Likely cause | Try |
|---|---|---|
| Too many alerts | k too low, or a naturally-constant metric with ~0 spread | Raise k; or pick a metric with natural variation |
| Alerts on ordinary bursts | a legitimate spike inflates the baseline / spread | Switch to Median + MAD |
| Misses a real anomaly hiding behind a past spike | one big past value masks the next (EWMA) | Switch to Median + MAD |
| Slow, creeping anomaly never fires | α too high — the drift was learned as normal | Lower α |
| Baseline lags a real, permanent change of regime | α too low / window N too large | Raise α, or lower N |
| Fires every morning / every Monday | a legitimate time-of-week pattern | Switch to Seasonal |
| Never leaves warm-up / learns nothing | too few observations, cadence too slow, or entities > Max keys | Widen the query window / cadence; narrow the entity set or raise Max keys |
A perfectly flat baseline is sensitive by design. If a metric has never varied its spread is ~0, so any change is “many standard deviations” and fires. For “bytes out” or “peer count” that’s usually what you want; if it’s too twitchy for a naturally-constant metric, raise k or choose a metric with some natural variation.
Choosing a baseline method
The Baseline method setting picks how “normal” is measured. All three are two-sided (spike and dip), silently warm up, and freeze on fire (below). They differ only in what “normal” means and what pulls it off course. You can switch a rule’s method at any time from the page — it resets the learned baseline so the new method relearns from scratch.
EWMA — the default, for smoothly-drifting metrics
Normal is an exponentially-weighted moving average and variance: recent runs weigh more than old ones, so the baseline follows a genuine trend without storing any history (three numbers per entity — mean, spread, count).
- Strengths. Cheap, bounded, adapts smoothly, tuned with a single knob (α). Great default for volumes and counts that grow or shrink gradually.
- Weakness. It has no memory of the shape of past values, only their weighted average — so a single very large past value pulls the mean up, which can mask a later spike (the average already looks high). It also has no idea of time-of-day: a nightly backup makes it alert every night.
- Reach for it when the metric drifts slowly and doesn’t have legitimate recurring spikes: bytes out per host, number of peers, sessions per minute on a steady server.
NDR example. A workstation that normally sends ~50 MB/day to the Internet and slowly climbs to ~120 MB over months: EWMA follows the climb (no false alarm on the trend) but fires the day it suddenly ships 4 GB.
Median + MAD — robust, for metrics with legitimate spikes
Normal is the median of the last N values, and the spread is the median absolute deviation (MAD, scaled by 1.4826 to match a standard deviation). Medians ignore outliers, so a freak value neither raises the baseline nor hides the next anomaly.
- Strengths. Immune to the “one big value masks the next spike” trap that bites EWMA. Exactly what you want when the metric is occasionally spiky by design — a weekly backup, a nightly batch, a monthly report job.
- Weakness. Keeps a bounded window (N values) instead of three scalars, so it costs a little more memory and reacts a step more slowly. It still has no notion of time-of-day.
- Reach for it when EWMA either misses spikes hiding behind an earlier
outlier or alerts too often on ordinary bursts. Tuned with the window N
(
8–256, default32) instead of α — see the tuning guide above.
NDR example. A backup server whose nightly transfer is normally ~2 GB but once a quarter legitimately hits ~30 GB. Under EWMA that quarterly spike inflates the mean for weeks and can mask a real exfil in between; under Median + MAD the 30 GB is one outlier the median shrugs off, so a genuine anomaly still stands out.
Seasonal — for metrics with a daily/weekly rhythm
Normal is a separate EWMA baseline per time-of-week slot — 24 hours × 7 days = 168 little baselines per entity. An observation is compared only to the same hour and weekday, so a load that is high every weekday morning (backups, market open, the 9-to-5 office pattern) is learned as normal for that slot and no longer alerts every time — while the same value at 3 a.m. Sunday still does.
- Strengths. The only method that understands “high, but normal for this hour”. Kills the recurring-pattern false positives EWMA and MAD can’t.
- Weaknesses. Needs several weeks of data to prime all 168 slots (each warms up independently — expect a long ramp-up), and holds ~168× more state per entity. obserae therefore lowers the default Max keys for seasonal rules; raise it deliberately if you track many entities. Tuned with α and k like EWMA.
- Reach for it when the metric has a strong, legitimate daily or weekly shape you keep alerting on by mistake.
NDR example. DNS query volume from an internal resolver: it’s high every weekday 9-18h and near-zero at night. EWMA fires every morning; seasonal learns the weekday-daytime peak as normal and only fires on a midnight surge — the classic beaconing / after-hours-exfil hour.
Quick pick
| Your metric’s profile | Method | Main knob | Memory / ramp-up |
|---|---|---|---|
| Drifts slowly, no recurring spikes | EWMA | α | 3 numbers · seconds |
| Has legitimate occasional spikes (backups, batch) | Median + MAD | window N | N values · minutes |
| Has a strong daily/weekly rhythm | Seasonal | α | ~168 slots · weeks |
Rule of thumb: start with EWMA; move a noisy metric to Median + MAD if a legitimate spike is masking real anomalies or generating noise; switch to Seasonal only when a genuine time-of-week pattern is the source of the false positives (and you can afford the multi-week priming).
Two guarantees that keep it honest
- Silent warm-up. Until a host has been seen Warm-up samples times, the rule learns but never fires. A brand-new deployment won’t flood you while it’s still figuring out what normal looks like — the same “learn first” behaviour as First seen and Heartbeat.
- Freeze on fire. When a value fires, it is reported but not folded into the baseline it broke from. One big spike therefore can’t quietly drag “normal” upward and hide the next one — the baseline stays anchored to genuinely normal behaviour.
And like every grouped rule, the per-entity state is capped by Max keys and pruned by retention, so it can’t grow without bound.
Where the baseline lives (and why it stays light)
The learned baselines are held in memory while obserae runs — that is the source of truth the evaluator reads and updates on each cycle, with no database round-trip. Two things keep memory bounded: Max keys caps how many entities a rule tracks, and each entity’s baseline is a fixed, tiny amount of state (a few numbers), never a growing history.
For durability, obserae snapshots the baselines to compact Parquet files (one per rule) on a periodic cadence and again at a clean shutdown, and reloads them at boot. This deliberately keeps the baselines out of the main database file — so learning that grows with your host count never bloats the DB or its checkpoints — and matches how obserae already stores flows and sessions. A crash between snapshots loses at most the most recent learning (the baseline simply re-learns it); alert cooldowns are stored transactionally, so you never get a duplicate alert after a restart.
Retention prunes an entity’s baseline once it has been silent longer than the rule’s retention window (the same window that prunes its per-key state), and the on-disk size of the baseline store is shown on the Storage page alongside flows, sessions and backups.
Reference recipes
These nine ship as the
std.anomalyrule set — install it from the Rule Sets page and all nine land ready to run (see below). The recipes below are the manual equivalent (and a starting point to adapt). They all group by the inferredclient_ip(the requester) and sum the directional client columns (client_to_server_bytesand its peers). Sessions are stored internally in a canonical low-IP-first order, but that ordering is not exposed to NFQL and does not track who the client is — the recipes always orient onclient_ip/server_ip. They reference only built-in address keywords (internet4/6,internal4/6), so they run on a fresh install with no cartography setup.
Adaptive exfiltration — client upload volume to the Internet, no fixed cap (MITRE T1041):
- Query:
FROM sessions | LAST 900 | WHERE server_ip == "internet4" OR server_ip == "internet6" | STATS out = SUM(client_to_server_bytes) BY client_ip - Rule: Anomaly, metric
out, group byclient_ip, k3, α0.1, warm-up12.
Egress fan-out — a host reaching unusually many distinct Internet servers, a C2 / beaconing breadth signal (MITRE T1071):
- Query:
FROM sessions | LAST 900 | WHERE server_ip == "internet4" OR server_ip == "internet6" | STATS peers = COUNT_DISTINCT(server_ip) BY client_ip - Rule: Anomaly, metric
peers, group byclient_ip, k3.
DNS exfiltration — abnormal DNS session count per host, DNS tunnelling (MITRE T1048.003):
- Query:
FROM sessions | LAST 900 | WHERE server_port == 53 | STATS lookups = COUNT(*) BY client_ip - Rule: Anomaly, metric
lookups, group byclient_ip, k3.
Lateral spread — a workstation reaching unusually many internal peers (MITRE T1021):
- Query:
FROM sessions | LAST 900 | WHERE server_ip == "internal4" OR server_ip == "internal6" | STATS peers = COUNT_DISTINCT(server_ip) BY client_ip - Rule: Anomaly, metric
peers, group byclient_ip, k3.
Lateral admin surge — a host reaching unusually many internal servers on admin ports (SSH/SMB/RDP/WinRM) (MITRE T1021):
- Query:
FROM sessions | LAST 900 | WHERE (server_ip == "internal4" OR server_ip == "internal6") AND server_port IN (22, 445, 3389, 5985, 5986) | STATS admin_peers = COUNT_DISTINCT(server_ip) BY client_ip - Rule: Anomaly, metric
admin_peers, group byclient_ip, k3.
Port scan (vertical) — a source hitting many distinct ports on one host (MITRE T1046):
- Query:
FROM sessions | LAST 300 | STATS ports = COUNT_DISTINCT(server_port) BY client_ip, server_ip - Rule: Anomaly, metric
ports, group byclient_ip, server_ip, k3, max keys20000.
Host sweep (horizontal) — a source hitting many distinct hosts on one port (MITRE T1046/T1018):
- Query:
FROM sessions | LAST 300 | STATS hosts = COUNT_DISTINCT(server_ip) BY client_ip, server_port - Rule: Anomaly, metric
hosts, group byclient_ip, server_port, k3, max keys20000.
Half-open surge — a source producing many no-reply TCP connections, SYN scanning (MITRE T1595.001):
- Query:
FROM sessions | LAST 300 | WHERE close_reason == "no_reply" AND protocol == TCP | STATS dead = COUNT_DISTINCT(server_ip) BY client_ip - Rule: Anomaly, metric
dead, group byclient_ip, k3.
Auth brute force — abnormal session count to auth/admin ports per source→server (MITRE T1110):
- Query:
FROM sessions | LAST 300 | WHERE server_port IN (22, 3389, 445, 5985, 5986, 389, 636, 1433, 3306, 5432) AND protocol == TCP | STATS attempts = COUNT(*) BY client_ip, server_ip - Rule: Anomaly, metric
attempts, group byclient_ip, server_ip, k3, max keys20000.
The Anomaly Detection page
Analysis → Anomaly Detection is where you turn the engine on and watch it
work. This page is only about statistical (anomaly) rules — deterministic
rules live on the Rules page, and the two lists never mix. A fresh install
ships no anomaly rules; the two fastest ways to start are (1) install the
std.anomaly rule set from the Rule Sets page, which lands nine ready
detectors covering exfiltration, C2, lateral movement, scanning and brute force,
or (2) click New anomaly rule to author your own.
Environment overview
A compact bar summarises the whole engine at a glance: how many anomaly rules exist and how many are active (learning has primed), learning (enabled, still warming up) or off; the total number of entities tracked across every baseline; and how many anomalies fired in the last 24 hours. Two small charts sit at the right — a 7-day timeline of fires and a by-severity breakdown — so a burst of anomalies is obvious without opening a single rule. (The activity charts reflect the most recent 5000 alerts.) The Rules page carries the same style of compact bar for deterministic rules.
The std.anomaly rule set
Rather than a per-page catalog, the starter detectors ship as a rule set you
install from Rule Sets (like std.community/std.enterprise). Installing
std.anomaly creates all nine as pack-owned rules that appear here immediately
and start learning; you can enable/disable each, but editing a pack-owned
rule is done by duplicating it (the pack file is the source of truth). Every
detector is client/server-correct — it groups by the inferred client_ip
and sums the true client direction — and is mapped to MITRE ATT&CK and the
detection obligations of NIS2 (Art. 21), DORA (Art. 10), CIS Controls
v8 (Control 13, Network Monitoring & Defense) and SOC 2 (CC7.2 monitoring,
CC6.x logical access). The nine are:
| Detector | Detects | MITRE | Compliance |
|---|---|---|---|
| Adaptive exfiltration | client upload to the Internet far above its own norm | T1041 / T1030 | NIS2 · DORA · CIS 13.3 · SOC2 CC7.2 |
| Egress fan-out | a host reaching unusually many distinct Internet servers (C2 breadth) | T1071 / T1571 | NIS2 · DORA · CIS 13 · SOC2 CC7.2 |
| DNS exfiltration | abnormal DNS session volume per host (DNS tunnelling) | T1048.003 / T1071.004 | NIS2 · CIS 13.9 · SOC2 CC7.2 |
| Lateral spread | a host reaching unusually many internal peers | T1021 / T1210 | NIS2 · DORA · CIS 13.4 · SOC2 CC7.2 |
| Lateral admin surge | a host reaching many internal admin services (SSH/SMB/RDP/WinRM) | T1021 | NIS2 · CIS 13.4 · SOC2 CC6.1 |
| Port scan (vertical) | many distinct ports probed on one host | T1046 | NIS2 · CIS 13 · SOC2 CC7.2 |
| Host sweep (horizontal) | many distinct hosts probed on one port | T1046 / T1018 | NIS2 · CIS 13 · SOC2 CC7.2 |
| Half-open surge | many no-reply TCP connections (SYN scan) | T1595.001 | NIS2 · CIS 13 · SOC2 CC7.2 |
| Auth brute force | abnormal session count to auth/admin ports | T1110 | NIS2 · DORA · CIS 13.6 · SOC2 CC6.1/CC6.6 |
They reference only built-in address keywords, so they compile and run on a fresh install with no cartography setup. Each learns silently for its warm-up window before it can fire, so expect no day-one alerts.
Migrating from the old in-page detector catalog. Earlier builds let you enable these detectors from a per-page catalog, which created them as ordinary in-place anomaly rules (named Adaptive exfiltration, Lateral spread, Scan surge). That catalog is gone — the detectors now ship as the
std.anomalypack. If an instance still carries the old in-place rules, installstd.anomalyfrom Rule Sets and delete the three in-place duplicates (otherwise each detector runs twice, with separate baselines). If you deploy through a config bundle, do the same in the file: declarestd.anomalyunderrule_sets.packsand remove the three anomaly rules — together with their paired… (detector)saved queries — from thealertingsection. Leaving a rule that references a query you removed makes the whole bundle fail to import. Pack rules relearn their baselines from cold.
Creating and editing rules in place
New anomaly rule opens a modal on this page (no redirect): pick a saved query with the searchable picker, choose the numeric metric and the group-by entity, set k / α (or the window) and the baseline method, and save. Edit rule in a rule’s drawer opens the same modal. A filter box narrows the list by name, metric, method or group-by key — the same search the Rules page offers.
Rules, baselines and per-entity visualisations
Every anomaly rule is listed with its baseline method, how many entities it tracks, whether it has started learning, and when it last fired — with an on/off switch on each row. A drawer opens when you select a rule:
- A baseline-method switch (EWMA / Median + MAD / Seasonal) lets you retune the estimator without leaving the page. Switching resets the rule’s learned baseline (an EWMA mean is meaningless once the rule is seasonal), so it asks for confirmation and the rule then relearns from cold.
- An Activity heatmap button opens a large modal (entities × time, coloured by deviation) surfacing which hosts strayed from their own normal and when — see The rule activity heatmap below.
- Baselines — one row per entity, showing the learned normal (EWMA: mean ± stddev; Median + MAD: median · window length; Seasonal: primed slots / 168), the sample count, and whether it is still in warm-up or active. This is where you watch the engine learn: a freshly enabled rule shows entities in warm-up flipping to active as data arrives. (Very large rules show the 500 most-recently-active entities.)
- Recent fires — the anomalies that fired in the last 7 days, with the observed value and the entity — so you can confirm the rule catches what you expect before it pages anyone.
The charts open large, in modals. Both visualisations below open in a full-size modal (they use the whole screen) and are drawn with real, zoomable axes, hover tooltips and a colour-scale legend — a drawer-sized thumbnail can’t convey a metric that spans orders of magnitude. Press Esc or click outside to close.
The rule activity heatmap
The drawer’s Activity heatmap button opens a heatmap of one row per entity (the most anomalous first, up to 50), one column per time bucket over the last 7 days, each cell coloured by how far that entity strayed from its own normal at that moment (its z-score — blue = below normal, dark = normal, red = above). Cells where the rule fired are outlined, and a colour-scale legend sits under the grid. It answers which host deviated, and when, at a glance — a red streak across one row is an entity spiking; a single outlined cell is a one-off fire. Hover a cell for its exact z and timestamp, and click a cell to jump straight into that entity’s chart. (A perfectly flat baseline produces a huge z, so the colour is clamped to a readable range — the tooltip still shows the true value.)
The heatmap is reconstructed on demand by replaying the rule’s query bucketed over the window (see below) — it needs no stored history. A rule whose query can’t be replayed this way shows no heatmap button.
Reading the per-entity chart
Click an entity in the Baselines table (or a cell in the activity heatmap) to open its chart. The observed-vs-envelope timeline is the default for every method — EWMA, Median + MAD and Seasonal:
- The white line is the observed metric over time; the shaded band is the moving normal envelope — centre ± k·spread recomputed at every bucket (mean ± k·σ for EWMA; median ± k·1.4826·MAD for Median + MAD; the per-time-of-week mean ± k·σ for Seasonal) — and the dashed line is the centre. Now you can see the problem: the line riding inside the band is normal; where it punches out — and how far — is the anomaly.
- Deviations are graded by severity. Every bucket whose value left the band is marked, coloured amber → orange → red by how far past the threshold it sits (a superset of what actually fired — it shows what the algorithm considers abnormal). The buckets that truly paged get an extra ring on the line.
- The warm-up span — while the rule is still learning and never fires — is shaded and labelled learning, so an empty band up front reads correctly.
- Hover any point for a card with the observed value, the expected centre, the band edges, the z-score (in σ) and a plain-language status.
- 12 h / 24 h / 7 d buttons switch the display window (the caption shows the bucket span and point count); a Linear / Log toggle switches the value axis (log makes a few-kB baseline and a multi-MB spike both readable); drag to zoom into a window.
- Seasonal rules add a Weekly grid toggle: a 24 × 7 heatmap (hour of day × weekday) whose cell colour encodes the learned mean for each time-of-week slot — the entity’s weekly rhythm made visible; a slot never observed stays empty.
If the window genuinely has no data the chart shows a short empty-state note (widen the window, or send traffic). If the query can’t be replayed at all, the chart falls back to the current (frozen) envelope with the fires on it.
Editing a rule’s full parameters still happens in the rule editor (the Edit rule link, or the Rules page).
How the series and heatmap are reconstructed (and its limits)
obserae stores only each entity’s current baseline and its fires — never a
per-tick history of observed values. Both the per-entity series and the rule
heatmap are therefore rebuilt on the fly: obserae replays the rule’s query
with a time BUCKET added to its STATS … BY, so one query returns the
observed value per entity per bucket over the display window you pick (12 h /
24 h / 7 d) — which overrides the rule’s own live window, so a rule scanning
LAST 900 still charts a full day — and then re-folds the same estimator the
engine uses to redraw the envelope that would have applied at each bucket. Nothing
new is stored, and the scan is bounded (the display window, a bucket width chosen
to stay well under a few hundred points, and a top-50 cap on the heatmap’s rows).
Two honest caveats follow from this:
- It is a reconstruction at the chosen display bucket, not a replay of the historical ticks: the moving envelope is what the estimator would have learned from these buckets, which closely tracks — but need not exactly equal — the live baseline the rule built at its own cadence. The fires shown are the authoritative ones from the journal.
- Only rules whose query ends in a plain
STATS <metric> = <agg>(…) BY <entity>can be replayed (a pivot cascade, a trailingSORT/HAVING, or a source with no time column cannot). Those rules keep the frozen-envelope band and skip the heatmap — everything else on the page is unchanged.
Which do I use?
- Explore / confirm by hand → statistical operators in a query. You see the numbers and choose a threshold. Great for one-off hunts and for understanding a signal before you automate it.
- Alert on a fixed, known limit → a Threshold rule (see Alerting). Best when the limit is the same for everyone (e.g. “no host should ever hit > 500 distinct ports”).
- Alert on “unusual for this entity” → an Anomaly rule. Best when normal differs per host and you’d rather not maintain a threshold per host.
A common workflow: hunt with the operators, find a signal, then wrap it in an anomaly rule so obserae watches it per-entity from then on.
See also: the NFQL guide for the full query syntax, and Alerting for cadence, cooldown, per-entity grouping and how alerts are routed to your outputs.