Theory & Derivations

Theory & derivations

Start here: the founding metrics

New to the methodology? Read Founding metrics — the five statistics primitives behind every Rankquant percentile first. It is a first-principles tour of the five textbook ingredients (z-score, standard error, 90% CI-floor, empirical CDF, Kish design effect) the rest of the pipeline composes.

The three load-bearing concepts

Each concept has a dedicated sub-page with derivation and worked examples.
Founding metrics (start here)	First-principles tour of the five primitives the pipeline rests on: z-score, standard error, 90% CI-floor, empirical CDF, Kish design effect. Each primitive paired with its canonical statistical reference.
Degrees of freedom (df)	Why reviewer σ_u uses n_u − 1 (Bessel), why we require n_u ≥ 2, how effective sample size N_eff enters the R2 weighted aggregation, and why a reviewer with df = 1 is admitted but implicitly penalized at the CI-floor step.
Confidence intervals (CI-floor ranking)	Why we rank on the lower bound of the 90% CI rather than the mean. Derivation of SE(Ẑ) = 1/√N_eff for R1, R2, R3. Why 90% and not 95%. Analytical comparison to Wilson score and IMDb-style Bayesian shrinkage.
Inter-rater reliability (reviewer-level)	ICC(1,1) variance decomposition across reviewers, how per-reviewer normalization removes reviewer main-effects, Cohen's κ for binary reviewer decisions, and how low agreement informs R2 weight calibration.

Each concept has a dedicated sub-page with derivation and worked examples.

Each step of the pipeline, with its statistical justification

Rankquant's methodology (/methodology) is a four-step compositional estimator. The table below maps each step to the statistical property that keeps it rigorous, and points at the deep-dive page that justifies it.

Step	Estimator	Statistical property
1	z_u,i = (r_u,i − μ_u) / σ_u	Per-reviewer studentization. σ_u uses Bessel correction (df = n_u − 1). Removes reviewer main-effects; dimensionless output.
2a	R1(i) = mean_u z_u,i	Unweighted reviewer-fixed-effects aggregation. Unbiased estimator of the product's population mean standardized rating.
2b	R2(i) = weighted mean with w_s	Source-weighted aggregation. Effective sample size N_eff = (Σ w_s)² / Σ w_s² (Kish).
2c	R3(i) with imputed σ̃_s	Broadened pool includes σ_u = 0 reviewers via pooled source-level SD imputation. Trades a small bias for higher N; see /theory/degrees-of-freedom/.
3	floor = Ẑ − 1.645 · SE(Ẑ)	One-tailed 90% CI lower bound. SE(Ẑ) = 1/√N_eff under unit-variance z-scores. Penalizes uncertainty; same family as Wilson score / Reddit's "best" sort.
4	p = empirical-CDF rank of floor	Nonparametric ranking. Robust to non-normality of the underlying floor distribution. Global p uses all products; cohort p uses the ±20%-price peer set.
Cohort	re-rank same floor within Cohort(i)	No new estimator. Cohort score is a view, not a computation. Guarantees internal consistency: cohort rank is a deterministic function of CI-floors.

Each sub-tab above unpacks one of the three load-bearing concepts in full. Read in order (Degrees of freedom → Confidence intervals → Inter-rater reliability), or jump to the one you need. All pages cite primary statistical sources; nothing here is proprietary theory — what's proprietary is publishing the details, committing to the constants, and running the pipeline on real data at scale.

Prerequisites: basic familiarity with mean, standard deviation, and the normal distribution. For term-by-term definitions see /glossary/statistics-terms.