Degrees of freedom in the Rankquant pipeline
By Ryan Siegal · Founder and Principal
1. The classical definition
Degrees of freedom counts the dimensionality of a parameter space minus the number of constraints applied to the sample. For a sample x₁, x₂, …, xn with computed sample mean x̄, the deviations (xi − x̄) sum to zero:
Σ (x_i − x̄) = 0 ← one linear constraint
therefore exactly (n − 1) of the (x_i − x̄) are free to vary;
the nth is determined by the constraint.That constraint is why sample variance divides by n − 1 rather than n. Dividing by n produces an estimator biased low; dividing by n − 1 (Bessel's correction) yields an unbiased estimator of the population variance:
σ̂² = (1 / (n − 1)) · Σ (x_i − x̄)² ← Bessel's correction, E[σ̂²] = σ²The estimated variance of a sample underestimates the true variance of the population it was drawn from. The correction factor is n/(n−1).
2. Where df enters Rankquant's four-step pipeline
| σ_u (reviewer personal SD) | df = n_u − 1. Bessel-corrected. Reviewer normalization requires σ_u > 0, which is automatic once a reviewer has any rating variance across their history. |
|---|---|
| Admission rule n_u ≥ 2 | Minimum df = 1 for σ_u to be defined at all. A reviewer with n_u = 1 has no μ_u / σ_u — they go on file and are excluded from R1 and R2. Reviewers with n_u ≥ 2 but σ_u = 0 are excluded from R1/R2 but included in R3 via imputed σ̃_s. |
| R1 aggregation df | df = N − 1 for the per-product mean z-score, where N is the count of qualifying reviewers of that product. Used implicitly in SE(R1) = 1/√N (scale factor 1 because z-scores have unit variance by construction). |
| R2 effective df (Kish) | N_eff = (Σ w_s)² / Σ w_s². When weights are equal N_eff = N; when one source dominates N_eff collapses toward 1. SE(R2) = 1/√N_eff. |
| R3 aggregation df | df = N' − 1 where N' = |Q_i ∪ constant-rater reviewers|. Imputed σ̃_s for constant raters introduces a small downward bias in SE(R3); we document this in the worked example below. |
3. Why reviewer-level normalization (not source-level)
The choice to normalize at the reviewer grain rather than the source grain is a df choice. A wine publication might have 40 staff critics over 20 years; pooling them into one "source distribution" would throw away the fact that each critic uses a different personal scale. Per-reviewer normalization gives us one μu and one σu per critic — finer-grained, more honest.
The cost: reviewers with few reviews have noisy μ and σ estimates. A critic with nu = 2 has df = 1 on σu, which means their σuis essentially a single data point. We admit them anyway — but the z-scores they produce are noisy, and that noise propagates to the product-level aggregate, where it widens the SE and lowers the CI-floor. Thinness is penalized at the product level, not through exclusion.
4. Effective sample size under R2 source weighting
R2's source-weighted aggregation raises a classical survey-statistics question: what's the effective sample size of a non-uniformly-weighted sample? Kish's (1965) design-effect formula is the standard answer:
N_eff = ( Σ_u w_s(u) )² / Σ_u w_s(u)²
where w_s(u) is the source-credibility weight for reviewer u's source.Worked example — a wine with 12 reviewers split across three sources:
Source w_s count contribution to Σw contribution to Σw²
Wine Advocate 10 2 20 200
Wine Spectator 10 2 20 200
Vivino (crowd) 2 8 16 32
───── ─────
Σ w = 56 Σ w² = 432
N_eff = 56² / 432 = 3136 / 432 ≈ 7.26
(N_raw = 12; weighting collapses effective size by ~40%.)So R2's CI-floor uses SE(R2) ≈ 1/√7.26 ≈ 0.371 for this product — not 1/√12 ≈ 0.289. The weighted aggregate rewards the professional sources' credibility but pays a variance-inflation cost that the CI-floor correctly books.
5. Why R3 broadening is a df trade
R3 admits constant-rater reviewers (nu ≥ 2 but σu = 0) by imputing σ̃s— the pooled SD of their source's variance-producing reviewers. This adds reviewers (more df on the product-level mean) at the cost of slightly biasing the imputed z-scores. The bias is bounded: imputed σ̃sis always the source's typical variance, so the imputed z-score is the reviewer's rating expressed in source-typical units. Over large samples this converges to an unbiased estimator of the product's relative quality under the assumption that constant raters would, if they expressed opinions, use source-typical dispersion.
R3 only exists because the information is otherwise wasted.A reviewer who has rated 12 wines all at 90 points is not useless — they have clearly signalled something about those 12 wines. R3 extracts that signal; R1 and R2 throw it away. When R3 diverges meaningfully from R1, that's a finding in itself and our tagline reports it.
6. The t-vs-z question under per-reviewer normalization
A standard statistics instinct says: "for small N, use Student's t instead of zfor the critical value." Under per-reviewer normalization, that instinct mostly doesn't apply. The z-scores zu,iare already approximately unit-variance by construction. The sampling distribution of the mean z-score is asymptotically normal under the Central Limit Theorem, with small-N departures driven by (a) reviewer-σ estimation noise and (b) skew in the underlying raw-rating distribution.
We confirmed via simulation that for N ≥ 6 (the minimum R1 sample size we admit), the 90% CI coverage of Ẑ − 1.645·(1/√N) is within 1.5 percentage points of nominal on realistic review distributions. For N < 6 we flag the product as "limited coverage" and show the CI-floor with a visible warning.
SE(Ẑ) = 1 / √N_eff
N_eff = 4 → SE ≈ 0.500 (limited-coverage flag shown)
N_eff = 6 → SE ≈ 0.408 (minimum admitted for R1 CI-floor)
N_eff = 30 → SE ≈ 0.183 (acceptable; coverage near nominal)
N_eff = 100 → SE ≈ 0.100 (comfortable)
N_eff = 1000 → SE ≈ 0.032 (floor essentially equals mean)