Sample Ratio Mismatch (SRM)

Sample Ratio Mismatch is a chi-squared test comparing the observed per-variant exposure counts against the declared variant weights. A low p-value means the realized split deviates further from the configured split than chance alone explains.

Why it matters

SRM warnings usually mean a bug — not a real-world traffic skew. A skewed audience filter, a bucketing-pipeline change that affected one variant disproportionately, a load balancer routing a fraction of users to a stale build that doesn't call /v1/assign — these surface as SRM before they surface as anything else. If you trust the credible interval at a variant-allocation that's not what you configured, the math underneath is silently wrong.

How Splitstream checks it

The analysis worker computes the chi-squared statistic on every snapshot (default cadence: 240 minutes). The SRM monitor also runs a dedicated 5-minute pass that doesn't wait for the full snapshot recomputation.

Threshold: chi-squared p-value < 0.001 raises an SRM warning on the results page. We picked 0.001 rather than the conventional 0.01 because experiment-platforms run the test many times (every snapshot, multi-arm, multi-experiment), and we wanted a tight false-alarm rate at high test-count.

What to do when the banner fires

Stop the experiment. Find the cause. Fix it. Start a new experiment with a fresh seed. Resuming an SRM-affected experiment with a "we'll just exclude the bad data" manual override is almost always less defensible than calling the run dead and re-running clean.

The platform does not auto-stop on SRM by default — the platform can't distinguish a bucketing bug from a real-world skew, and auto-stopping during a flash sale that broke the traffic mix would be worse than the warning. Auto-stop on SRM is a per-experiment opt-in.