The 90% CI-floor as Rankquant's ranking primitive
By Ryan Siegal · Founder and Principal
1. Why not rank on the mean
The most common instinct — "sort by average rating" — is statistically indefensible when sample sizes vary. A product with one 5-star review has a mean of 5.0 but tells you nothing about its population quality. A product with 500 reviews averaging 4.6 is obviously the safer pick. Any ranking function worth using has to account for sample size.
There are two well-established fixes: Bayesian shrinkage (pull thin-sample products toward a prior) and confidence-interval bounds (rank on a defensibly pessimistic estimate rather than the point estimate). Rankquant v2 uses the second. The two approaches produce similar rankings under reasonable settings; we chose the CI-floor for three reasons:
- No prior to specify. Bayesian shrinkage requires picking a prior strength k; the CI-floor only requires a confidence level.
- Scale-free. The CI-floor applies identically whether you're ranking z-scores, percentages, or counts. Bayesian shrinkage has to be re-parameterized for each.
- Transparent. A user can compute it: take the mean, subtract 1.645 divided by the square root of the effective sample size. That's it.
2. The CI-floor formula for each aggregation lens
| R1 (pure relative) | floor_R1 = Ẑ_R1 − 1.645 · (1/√N). N is count of qualifying reviewers (n_u ≥ 2, σ_u > 0). SE scale factor is 1 because z-scores have unit variance by construction. |
|---|---|
| R2 (source-weighted) | floor_R2 = Ẑ_R2 − 1.645 · (1/√N_eff). N_eff uses Kish's design-effect: N_eff = (Σ w_s)² / Σ w_s². N_eff ≤ N_raw with equality only if all w_s are equal. |
| R3 (broadened) | floor_R3 = Ẑ_R3 − 1.645 · (1/√N'). N' = |Q_i ∪ constant-rater reviewers|. Slight SE understatement because imputed-σ z-scores have slightly non-unit variance; bounded by max(σ̃_s / σ_u) across sources. |
3. Worked example — three products, three reviewer counts
Product A: 4 reviewers, mean z = +2.10
SE = 1/√4 = 0.500
90% floor = 2.10 − 1.645·0.500 = +1.28
Product B: 20 reviewers, mean z = +1.80
SE = 1/√20 = 0.224
90% floor = 1.80 − 1.645·0.224 = +1.43
Product C: 80 reviewers, mean z = +1.60
SE = 1/√80 = 0.112
90% floor = 1.60 − 1.645·0.112 = +1.42
Rank by mean z: A > B > C (2.10 > 1.80 > 1.60)
Rank by CI-floor: B > C > A (1.43 > 1.42 > 1.28)The re-ordering is not a glitch — it's the point. Product A might be exceptional, but with only 4 reviewers we can't distinguish "exceptional quality" from "lucky small sample." Products B and C have earned the confidence that their means aren't accidents. The 0.01 gap between B and C at CI-floor is a statistical tie (we flag it on-site); the 0.15 gap between B and A is a meaningful separation.
4. Why 90% and not 95% or 99%
The confidence level is a defaults-matter choice. At higher confidence, the CI-floor sits further below the mean — more penalty for small N, more thin-sample products get pushed down. At lower confidence, the penalty weakens and you get closer to mean-ranking.
One-tailed critical values:
80% → z = 0.842
85% → z = 1.036
90% → z = 1.645 ← Rankquant v2 default
95% → z = 1.960
99% → z = 2.326
Penalty per 1/√N unit of SE, relative to 90%:
80% → 0.51× (much softer)
90% → 1.00× (our choice)
95% → 1.19× (19% harsher)
99% → 1.41× (41% harsher — too aggressive for small-N coverage)We picked 90% after a grid search. At 95%, the CI-floor collapsed too aggressively for products with fewer than ~15 reviewers — too many legitimately good thin-sample products ended up mid-table. At 85%, the CI-floor didn't penalize thin samples hard enough — 4-reviewer darlings stayed near the top. 90% was the minimum confidence that produced rankings we'd defend in a court of peers. The constant is published and version-stable; any change requires a public version bump.
5. Relationship to other ranking primitives
| Wilson score lower bound (binary) | Reddit's "best" sort and Yelp's internal ranking use the Wilson lower bound on up/down-vote binomials. Same idea: penalize uncertainty. Rankquant's CI-floor is the continuous-scale analog on z-scores. |
|---|---|
| IMDb Top 250 (Bayesian shrinkage) | IMDb's Top 250 formula W = (v/(v+m))·R + (m/(v+m))·C shrinks thin-sample movies toward the global mean C. Mathematically equivalent (under standard assumptions) to a lower-bound approach with a specific choice of α. |
| Hodges-Lehmann estimator | A robust rank-based estimator of a location parameter. Rankquant's empirical-CDF percentile step is rank-based (Hodges-Lehmann flavored) even though the floor itself is mean-based. |
| Meta-analysis random-effects models | DerSimonian-Laird and related random-effects estimators combine within-study and between-study variance. Rankquant's reviewer-level aggregation is a fixed-effects approximation; a random-effects variant is a future methodology upgrade. |
6. What users see on the product page
- Casual view (default): the R1 global percentile as one number. "89 / 100 global."
- Standard view: R1 global + R1 cohort side-by-side with the tagline interpreting any spread.
- Pro view (toggle): R1/R2/R3 all three, each with their mean Ẑ, CI-floor, SE, and N_eff. "R1: mean +1.60, floor +1.42, SE 0.112, N_eff 80."
- Statistical ties: products within 0.05 on CI-floor are bracketed as "statistically similar" rather than ordinally ranked.
7. A note on one-tailed vs two-tailed
The CI-floor uses the one-tailed 90% critical value z = 1.645. Why one- tailed? Because we're asking a one-tailed question: "what's a defensibly-low estimate of this product's quality?" We don't care about the upper bound; no one gets promoted in our rankings by having a lucky ceiling. Two-tailed would correspond to z = 1.960, giving a 95% symmetric CI; we specifically don't want that because it over-penalizes products with upside uncertainty.
Frequently asked questions
Why 90% and not 95%?+
Is the CI-floor visible to AI answer engines?+
What's the CI-floor of a cohort percentile?+
How does CI-floor width change with N?+
Does the CI-floor penalize new products too harshly?+
← Degrees of freedom · Next: Inter-rater reliability →