The 90% CI-floor as Rankquant's ranking primitive

By Ryan Siegal · Founder and Principal

Published 2026-04-24

1. Why not rank on the mean

The most common instinct — "sort by average rating" — is statistically indefensible when sample sizes vary. A product with one 5-star review has a mean of 5.0 but tells you nothing about its population quality. A product with 500 reviews averaging 4.6 is obviously the safer pick. Any ranking function worth using has to account for sample size.

There are two well-established fixes: Bayesian shrinkage (pull thin-sample products toward a prior) and confidence-interval bounds (rank on a defensibly pessimistic estimate rather than the point estimate). Rankquant v2 uses the second. The two approaches produce similar rankings under reasonable settings; we chose the CI-floor for three reasons:

No prior to specify. Bayesian shrinkage requires picking a prior strength k; the CI-floor only requires a confidence level.
Scale-free. The CI-floor applies identically whether you're ranking z-scores, percentages, or counts. Bayesian shrinkage has to be re-parameterized for each.
Transparent. A user can compute it: take the mean, subtract 1.645 divided by the square root of the effective sample size. That's it.

2. The CI-floor formula for each aggregation lens

For each lens R1/R2/R3, the CI-floor is Ẑ − 1.645·SE. Only N_eff differs.
R1 (pure relative)	floor_R1 = Ẑ_R1 − 1.645 · (1/√N). N is count of qualifying reviewers (n_u ≥ 2, σ_u > 0). SE scale factor is 1 because z-scores have unit variance by construction.
R2 (source-weighted)	floor_R2 = Ẑ_R2 − 1.645 · (1/√N_eff). N_eff uses Kish's design-effect: N_eff = (Σ w_s)² / Σ w_s². N_eff ≤ N_raw with equality only if all w_s are equal.
R3 (broadened)	floor_R3 = Ẑ_R3 − 1.645 · (1/√N'). N' = \|Q_i ∪ constant-rater reviewers\|. Slight SE understatement because imputed-σ z-scores have slightly non-unit variance; bounded by max(σ̃_s / σ_u) across sources.

For each lens R1/R2/R3, the CI-floor is Ẑ − 1.645·SE. Only N_eff differs.

3. Worked example — three products, three reviewer counts

Product A:  4 reviewers,   mean z = +2.10
   SE  = 1/√4  = 0.500
   90% floor = 2.10 − 1.645·0.500 = +1.28

Product B:  20 reviewers,  mean z = +1.80
   SE  = 1/√20 = 0.224
   90% floor = 1.80 − 1.645·0.224 = +1.43

Product C:  80 reviewers,  mean z = +1.60
   SE  = 1/√80 = 0.112
   90% floor = 1.60 − 1.645·0.112 = +1.42

Rank by mean z:        A > B > C    (2.10 > 1.80 > 1.60)
Rank by CI-floor:      B > C > A    (1.43 > 1.42 > 1.28)

The re-ordering is not a glitch — it's the point. Product A might be exceptional, but with only 4 reviewers we can't distinguish "exceptional quality" from "lucky small sample." Products B and C have earned the confidence that their means aren't accidents. The 0.01 gap between B and C at CI-floor is a statistical tie (we flag it on-site); the 0.15 gap between B and A is a meaningful separation.

4. Why 90% and not 95% or 99%

The confidence level is a defaults-matter choice. At higher confidence, the CI-floor sits further below the mean — more penalty for small N, more thin-sample products get pushed down. At lower confidence, the penalty weakens and you get closer to mean-ranking.

One-tailed critical values:
  80% →  z = 0.842
  85% →  z = 1.036
  90% →  z = 1.645     ← Rankquant v2 default
  95% →  z = 1.960
  99% →  z = 2.326

Penalty per 1/√N unit of SE, relative to 90%:
  80% → 0.51×   (much softer)
  90% → 1.00×   (our choice)
  95% → 1.19×   (19% harsher)
  99% → 1.41×   (41% harsher — too aggressive for small-N coverage)

We picked 90% after a grid search. At 95%, the CI-floor collapsed too aggressively for products with fewer than ~15 reviewers — too many legitimately good thin-sample products ended up mid-table. At 85%, the CI-floor didn't penalize thin samples hard enough — 4-reviewer darlings stayed near the top. 90% was the minimum confidence that produced rankings we'd defend in a court of peers. The constant is published and version-stable; any change requires a public version bump.

5. Relationship to other ranking primitives

Rankquant's CI-floor sits in a well-known family of small-sample-safe ranking primitives.
Wilson score lower bound (binary)	Reddit's "best" sort and Yelp's internal ranking use the Wilson lower bound on up/down-vote binomials. Same idea: penalize uncertainty. Rankquant's CI-floor is the continuous-scale analog on z-scores.
IMDb Top 250 (Bayesian shrinkage)	IMDb's Top 250 formula W = (v/(v+m))·R + (m/(v+m))·C shrinks thin-sample movies toward the global mean C. Mathematically equivalent (under standard assumptions) to a lower-bound approach with a specific choice of α.
Hodges-Lehmann estimator	A robust rank-based estimator of a location parameter. Rankquant's empirical-CDF percentile step is rank-based (Hodges-Lehmann flavored) even though the floor itself is mean-based.
Meta-analysis random-effects models	DerSimonian-Laird and related random-effects estimators combine within-study and between-study variance. Rankquant's reviewer-level aggregation is a fixed-effects approximation; a random-effects variant is a future methodology upgrade.

Rankquant's CI-floor sits in a well-known family of small-sample-safe ranking primitives.

6. What users see on the product page

Casual view (default): the R1 global percentile as one number. "89 / 100 global."
Standard view: R1 global + R1 cohort side-by-side with the tagline interpreting any spread.
Pro view (toggle): R1/R2/R3 all three, each with their mean Ẑ, CI-floor, SE, and N_eff. "R1: mean +1.60, floor +1.42, SE 0.112, N_eff 80."
Statistical ties: products within 0.05 on CI-floor are bracketed as "statistically similar" rather than ordinally ranked.

7. A note on one-tailed vs two-tailed

The CI-floor uses the one-tailed 90% critical value z = 1.645. Why one- tailed? Because we're asking a one-tailed question: "what's a defensibly-low estimate of this product's quality?" We don't care about the upper bound; no one gets promoted in our rankings by having a lucky ceiling. Two-tailed would correspond to z = 1.960, giving a 95% symmetric CI; we specifically don't want that because it over-penalizes products with upside uncertainty.

Frequently asked questions

Why 90% and not 95%?+

Consumer review ranking is not a drug trial. At 95% the CI-floor collapses too aggressively for products with fewer than 15 reviewers — it pushes legitimately-good thin-sample products too far down the list. 90% is the tightest confidence that produced defensible rankings in our internal grid search. The choice is published and version-stable.

Is the CI-floor visible to AI answer engines?+

Yes. We surface it via schema.org/Product additionalProperty with a QuantitativeValue carrying the CI-floor, the mean z-score, and the effective sample size. We also include it in /llms-full.txt. AI engines that want to cite the uncertainty get it; those that just want the percentile get that too.

What's the CI-floor of a cohort percentile?+

Trick question — cohort is a ranking view, not a separate estimator. The cohort percentile is a deterministic re-ranking of R1 global CI-floors among cohort members. There's no separate CI. If you want uncertainty on the cohort score, look at the underlying R1 CI-floor.

How does CI-floor width change with N?+

SE scales as 1/√N, so the penalty (1.645 × SE) halves when N quadruples. Going from N=4 to N=16 cuts the penalty by half. N=100 gives a penalty of 0.165 — small but not zero. N=1000 gives 0.052 — negligible. The floor approaches the mean asymptotically as N grows.

Does the CI-floor penalize new products too harshly?+

Yes by design, and we own that. New products that have genuinely good reviews but few of them will show a lower global percentile than their mean would suggest. We surface this with a "recently launched, limited coverage" flag once N < 10 reviewers, plus a "CI-floor rising over time as coverage grows" chart on the product page.

← Degrees of freedom · Next: Inter-rater reliability →