Rankquant
MethodologyAbout⌕ Search

Theory & Derivations

Theory & derivations

Start here: the founding metrics

New to the methodology? Read Founding metrics — the five statistics primitives behind every Rankquant percentile first. It is a first-principles tour of the five textbook ingredients (z-score, standard error, 90% CI-floor, empirical CDF, Kish design effect) the rest of the pipeline composes.

The three load-bearing concepts

Each concept has a dedicated sub-page with derivation and worked examples.
Founding metrics (start here)First-principles tour of the five primitives the pipeline rests on: z-score, standard error, 90% CI-floor, empirical CDF, Kish design effect. Each primitive paired with its canonical statistical reference.
Degrees of freedom (df)Why reviewer σ_u uses n_u − 1 (Bessel), why we require n_u ≥ 2, how effective sample size N_eff enters the R2 weighted aggregation, and why a reviewer with df = 1 is admitted but implicitly penalized at the CI-floor step.
Confidence intervals (CI-floor ranking)Why we rank on the lower bound of the 90% CI rather than the mean. Derivation of SE(Ẑ) = 1/√N_eff for R1, R2, R3. Why 90% and not 95%. Analytical comparison to Wilson score and IMDb-style Bayesian shrinkage.
Inter-rater reliability (reviewer-level)ICC(1,1) variance decomposition across reviewers, how per-reviewer normalization removes reviewer main-effects, Cohen's κ for binary reviewer decisions, and how low agreement informs R2 weight calibration.
Each concept has a dedicated sub-page with derivation and worked examples.

Each step of the pipeline, with its statistical justification

Rankquant's methodology (/methodology) is a four-step compositional estimator. The table below maps each step to the statistical property that keeps it rigorous, and points at the deep-dive page that justifies it.

StepEstimatorStatistical property
1zu,i = (ru,i − μu) / σuPer-reviewer studentization. σu uses Bessel correction (df = nu − 1). Removes reviewer main-effects; dimensionless output.
2aR1(i) = meanu zu,iUnweighted reviewer-fixed-effects aggregation. Unbiased estimator of the product's population mean standardized rating.
2bR2(i) = weighted mean with wsSource-weighted aggregation. Effective sample size Neff = (Σ ws)² / Σ ws² (Kish).
2cR3(i) with imputed σ̃sBroadened pool includes σu = 0 reviewers via pooled source-level SD imputation. Trades a small bias for higher N; see /theory/degrees-of-freedom/.
3floor = Ẑ − 1.645 · SE(Ẑ)One-tailed 90% CI lower bound. SE(Ẑ) = 1/√Neff under unit-variance z-scores. Penalizes uncertainty; same family as Wilson score / Reddit's "best" sort.
4p = empirical-CDF rank of floorNonparametric ranking. Robust to non-normality of the underlying floor distribution. Global p uses all products; cohort p uses the ±20%-price peer set.
Cohortre-rank same floor within Cohort(i)No new estimator. Cohort score is a view, not a computation. Guarantees internal consistency: cohort rank is a deterministic function of CI-floors.

Each sub-tab above unpacks one of the three load-bearing concepts in full. Read in order (Degrees of freedom → Confidence intervals → Inter-rater reliability), or jump to the one you need. All pages cite primary statistical sources; nothing here is proprietary theory — what's proprietary is publishing the details, committing to the constants, and running the pipeline on real data at scale.

Prerequisites: basic familiarity with mean, standard deviation, and the normal distribution. For term-by-term definitions see /glossary/statistics-terms.