Splitstreamportfolio demo

Bucketing

Splitstream uses a deterministic bucketing function so every SDK (TypeScript, PHP, Python, future Go) can preview the variant locally and so the server-side computation is reproducible.

The function

bucket(experiment_key, unit_id, seed) =
  parseInt(
    sha256(NFC(experiment_key) + ":" + NFC(unit_id) + ":" + seed)
      .slice(0, 8),
    16
  ) / 0xffffffff

Returns a float in [0, 1). UTF-8 NFC normalization is applied to experiment_key and unit_id before concatenation so that unit_id="café"evaluates the same regardless of the SDK's local string encoding (precomposed é vs decomposed e + ◌́).

Variant selection

Variants are selected by cumulative-weight lookup. Variants with weights [A: 50, B: 30, C: 20] map the bucket to [0, 0.5) → A, [0.5, 0.8) → B, [0.8, 1.0) → C. The lookup is a single linear pass ordered by variant position.

Per-experiment seed

Every experiment carries a server-generated random bucketing_seed(32-byte hex) so overlapping experiments at 50/50 don't bucket the same units into the same letter. The seed is set at experiment creation and never mutated.

Server wins

Despite the function being deterministic and reproducible on the client, server-side bucketing wins. The SDK reads the assignment from the /v1/assign response (which wrote the assignments row), not from a local computation. Otherwise a sufficiently motivated user could re-bucket themselves by changing their context. The bucket() helper is exported as an offline-preview convenience and for the corpus drift test.

The corpus is the contract

Every SDK runs the same stats/corpus/bucketing.json through its own implementation in CI. Drift between any implementation means a returning user is re-bucketed into a different variant — silently poisons experiment results.

The 10-row corpus covers ASCII, UTF-8 (café, ideographic 新規ユーザー), ULID-shaped identifiers, empty strings, and long identifiers. New rows are added before the code changes.

Change the function only by editing the corpus first. That sequence — corpus, then implementation — is what catches the bug that the change introduced. The other order catches nothing.