Peeking discipline

Every time someone opens an experiment's results page, Splitstream logs a peek event. The peek count is surfaced on the experiment's stop audit entry alongside the outcome.

Why log peeks

Under continuous monitoring of a Bayesian posterior, the chance of crossing a fixed threshold by random noise grows with the number of opportunities. At our shipped defaults (0.995 / 20,000 / 240 min), the empirical Type-I rate is calibrated to ~5% under the regime of opening the dashboard every snapshot cadence — i.e. roughly every four hours.

If you peeked twice that often (every 2 hours), the realized FP rate creeps up. If you peeked every snapshot for two weeks and stopped on the first crossing, you re-entered the regime the calibration table was designed to rule out.

The peek log is evidence, not an indictment

Splitstream does not block stopping decisions based on peek count. The peek count is recorded as evidence of the decision process. A postmortem on a noisy-looking result starts with "how many times did we peek before stopping?"

The audit log entry for a stopped experiment includes:

{
  "action": "experiment.stop",
  "target_id": "01HXY…EXPC1",
  "payload": {
    "reason": "won",
    "peek_count": 12
  },
  "actor_id": "01HXY…ACTOR",
  "created_at": "2026-06-12T15:00:00Z"
}

What peeking honestly costs

Frequentist NHST forbids peeking outright. Bayesian threshold stopping with a fixed posterior cutoff tolerates peeking at a documented cadence, with the threshold chosen via empirical calibration to keep the realized Type-I rate at the nominal level under that exact cadence.

If you want the analytical guarantee — Type-I rate controlled at the nominal level under arbitrarypeeking, no calibration required — that's the always-valid e-value rule, documented as a v2 contract surface (decision_rule.method = "bayesian.always_valid_evalue") but not implemented in v0.1.