Rankquant
MethodologyAbout⌕ Search

Statistics & methodology glossary

Every statistical concept behind Rankquant's normalized percentiles, in plain language. Each entry is linkable as #term and exposes schema.org/DefinedTerm structured data for AI retrieval.

Normalized percentile
A 0–100 number indicating where a product's CI-floor ranks relative to every other product's CI-floor in the Rankquant database. 90 means top 10%; 50 means median; 10 means bottom 10%. Derived from the empirical cumulative distribution function (empirical CDF) of all products' CI-floors.
Global percentile
A product's normalized percentile against the full Rankquant database. Always stable because the reference set is very large. Useful for big-picture "is this product any good" questions.
Cohort percentile
A product's normalized percentile against only products in the same category and within ±20% of its list price. Mathematically it is a pure re-ranking of the same CI-floor used for the global percentile — no new computation. Useful for "is this good for what it costs?" questions.
Rating inflation
The phenomenon where online review scores cluster in a compressed high range (Amazon 4.4/5, Yelp 4.2/5, Goodreads 4.1+). Caused by self-selection bias, social signaling, and platform incentives. Makes raw averages nearly useless for consumer buying decisions — the problem Rankquant's per-reviewer normalization is designed to solve.
Per-reviewer z-score normalization
Rankquant's first operation: for each reviewer u, subtract their personal mean μ_u and divide by their personal SD σ_u. Converts a reviewer's rating to a z-score z_{u,i} = (r_{u,i} − μ_u) / σ_u in their personal units. Removes the reviewer main effect so different reviewers' scores become commensurable.
Z-score
The number of standard deviations a value sits above or below its mean. In Rankquant, z-scores are computed per reviewer so each reviewer's ratings become unit-variance in their personal distribution. A reviewer's z of +1.6 for a product means that product sits 1.6 personal-SDs above their personal mean.
Reviewer personal mean (μ_u)
The arithmetic mean of all ratings a reviewer u has produced in our dataset, pooled across every category they've rated. Computed once and updated as new reviews arrive.
Reviewer personal standard deviation (σ_u)
The Bessel-corrected sample standard deviation of all ratings a reviewer u has produced, pooled cross-category. Must be > 0 for inclusion in R1 and R2; if σ_u = 0 (reviewer gives every product the same rating) the reviewer is admitted only to the R3 broadened lens with an imputed σ̃_s from their source pool.
Cross-category reviewer pooling
Rankquant's decision to compute a reviewer's μ_u and σ_u across every category they've rated rather than per-category. Assumes a person's rating calibration is roughly stable across product types — an assumption we continue to validate empirically. Keeps n_u large and σ_u stable.
Qualifying reviewer
A reviewer with n_u ≥ 2 and σ_u > 0. Only qualifying reviewers enter R1 and R2. Reviewers with n_u ≥ 2 and σ_u = 0 are admitted only to R3. Reviewers with n_u = 1 are excluded from all three lenses — their μ and σ are undefined.
Reviewer main effect
The portion of raw-rating variance explained by "which reviewer is rating" rather than "which product is being rated." Per-reviewer z-score normalization mathematically zeros the reviewer main effect, leaving only the product-quality signal plus residual noise.
R1 (pure relative)
Rankquant's headline aggregate. The unweighted mean z-score across every qualifying reviewer of a product. Every reviewer counts equally. Answers: "What does the crowd think, relative to each reviewer's own scale?"
R2 (source-weighted)
A weighted-mean z-score aggregate in which reviewers from more credible sources count more. Weights w_s are published per source and version-stable. Complements R1 by asking: "what does the crowd think if we listen more to the most rigorous reviewers?"
R3 (broadened)
An R1-style unweighted mean z-score that additionally includes reviewers with n_u ≥ 2 but σ_u = 0 via an imputed σ̃_s drawn from their source's pooled dispersion. Adds signal from consistent-rater reviewers otherwise excluded.
Source weight (w_s)
A published number (typically 1–10) encoding Rankquant's editorial judgment of a review source's historical credibility. Affects R2 only; R1 and R3 treat all qualifying reviewers equally. Weights live in the open-source repo and any change requires a public version bump.
Imputed σ̃_s
A pooled standard-deviation estimate for a source s, computed from all variance-producing reviewers writing for that source. Used only in R3 as a stand-in σ when a specific reviewer has σ_u = 0. Introduces a small bounded bias in R3 in exchange for admitting additional reviewers.
CI-floor (90% confidence-interval lower bound)
The ranking primitive Rankquant uses in place of raw mean z-scores. For each aggregate Ẑ (R1, R2, or R3): floor = Ẑ − 1.645 · SE(Ẑ). Penalizes products with few reviewers: a thin-sample 4-reviewer +2.1 mean loses to an 80-reviewer +1.6 mean because the former's SE is much wider. Same family as Wilson score (Reddit) and IMDb Top 250 shrinkage.
Standard error (SE)
The standard deviation of a sampling distribution. For a mean z-score, SE(Ẑ) = 1/√N_eff. Shrinks as √N, so N=100 cuts the SE to one-third of N=10. Rankquant's CI-floor penalty scales linearly with SE.
Effective sample size (N_eff)
For R1 and R3, N_eff is simply the count of contributing reviewers. For R2, N_eff = (Σ w_s)² / Σ w_s² (Kish's design-effect formula). Weighted aggregates always have N_eff ≤ N_raw; the ratio tells you how much the weighting collapses the effective sample.
Kish design-effect formula
The standard survey-statistics formula for the effective sample size of a weighted mean: N_eff = (Σ w)² / Σ w². When weights are equal, N_eff = N. When one observation dominates the weights, N_eff collapses toward 1. Used in Rankquant for R2's CI-floor denominator.
One-tailed critical value
The z or t value defining a one-sided rejection region. Rankquant uses z = 1.645 (90% one-tailed, equivalent to a defensibly pessimistic lower bound on the mean z-score). We use one-tailed rather than two-tailed because we care about the downside of the estimate, not both tails.
Wilson score interval
The continuity-corrected confidence interval for a binomial proportion. Used by Reddit's "best" sort and Yelp's internal ranking. Rankquant's CI-floor is the continuous-scale analog — same family of small-sample-safe ranking primitives.
Empirical cumulative distribution function (empirical CDF)
A step function that counts the fraction of sample points below any given value. Rankquant computes the empirical CDF of all CI-floors in the database and uses it to convert each product's CI-floor to a 0–100 percentile. Robust to non-normality of the underlying CI-floor distribution.
Rank-based ranking
A ranking scheme that uses a value's rank among peers rather than its absolute magnitude. Robust to monotone transformations, outliers, and non-normality. Rankquant's percentile is a rank-based transformation of the CI-floor.
Cohort (±20% price, same category)
The narrow peer set used for Rankquant's cohort percentile: products in the same category and within a symmetric ±20% price band. A $100 product's cohort is $80–$120. When the natural band produces fewer than 20 cohort members, we expand the band until ≥20 are in scope and log the expansion on the product page.
Cohort re-ranking
Converting a product's CI-floor percentile from global (against all products) to cohort (against cohort members only) by pure re-ranking of the same CI-floor. No new CI computation. Guarantees cohort rank is a deterministic function of CI-floors and cohort membership — auditable by hand.
Degrees of freedom (df)
The number of independent values in a dataset after constraints. In Rankquant, σ_u uses df = n_u − 1 (Bessel's correction). For the product-level aggregate, the minimum admission rule is df ≥ 1 per reviewer (n_u ≥ 2). Deep dive at /theory/degrees-of-freedom/.
Bessel's correction
The n−1 denominator in the sample-variance formula (rather than n). Corrects the downward bias in σ² that would otherwise arise when the sample mean is estimated from the same data. Rankquant uses Bessel-corrected σ_u for every reviewer.
Standard deviation (σ)
The square root of variance: a measure of spread around the mean. In Rankquant, each reviewer has a personal σ_u and (for R3 imputation) each source has a pooled σ̃_s. Z-score normalization uses σ_u as the per-reviewer scale factor.
Bias–variance tradeoff
The fundamental statistical tension: reducing bias in an estimator typically increases variance and vice versa. Rankquant's choice of CI-floor ranking (rather than mean ranking) accepts slight conservative bias in exchange for a large variance penalty on thin-sample products. Similar intuition drives Bayesian shrinkage.
Reviewer fixed effects
A regression-modeling framework in which each reviewer is assigned their own intercept, effectively removing their main effect from the product-quality estimate. Rankquant's per-reviewer normalization is a normalized analog: subtracting μ_u and dividing by σ_u is equivalent to absorbing a reviewer-specific intercept and scale.
Central Limit Theorem
The theorem that the distribution of a sample mean becomes approximately normal as sample size grows, regardless of the underlying distribution's shape. Justifies treating the sampling distribution of Ẑ (the product-level mean z-score) as approximately normal for the CI-floor computation once N ≥ ~6.
Intraclass correlation coefficient (ICC)
A measure of reliability for continuous ratings. ICC = σ²_between / (σ²_between + σ²_within). In Rankquant, we compute reviewer-level ICC(1,1) on the z-scored data to quantify how much reviewers agree about product quality after personal-scale differences are removed. Deep dive at /theory/inter-rater-reliability/.
Cohen's kappa (κ)
Chance-corrected agreement statistic for two raters on a categorical scale. κ = 0 means chance-level agreement; κ = 1 means perfect. Used for binary reviewers (Rotten Tomatoes fresh/rotten, Michelin star/no-star) that can't be z-score normalized in the standard way.
Fleiss' kappa
A generalization of Cohen's κ to more than two raters rating the same items categorically. Rankquant uses it for panels of binary reviewers where pairwise Cohen's would produce combinatorially many numbers.
Variance decomposition
Splitting total variance into additive components — in review data typically between-product, between-reviewer, and residual. Per-reviewer normalization mathematically zeros out the between-reviewer component, leaving between-product + residual for ICC computation.
Self-selection bias
A bias in voluntary review systems: consumers with strongly positive (or sometimes strongly negative) experiences are much more likely to leave reviews than consumers with middling experiences. Drives rating inflation. Per-reviewer normalization partially corrects for it by rescaling each reviewer's distribution to unit variance.
Right-skew
A distributional shape where most observations cluster near the top of the scale. Online review distributions are typically right-skewed (in the sense of ceiling-clustered). Rank-based percentile mapping is robust to right-skew; mean-based aggregation is not.
Outlier filtering
Removing reviews that diverge extremely from the bulk distribution — typically obvious manipulation (review bombs, 1-star content-farm reviews) before normalization. Rankquant applies a simple IQR-based filter on the raw-rating distribution before computing reviewer statistics.
Review bomb
A coordinated effort to lower a product's review score via large numbers of 1-star reviews from newly-created or low-activity accounts. Detected via reviewer-behavior heuristics (n_u = 1 with σ_u undefined excludes these reviewers automatically from R1/R2/R3).
Reproducibility
The property that running Rankquant's normalization on the same inputs (reviewer ratings + source weights + published constants) produces identical outputs. All inputs are published; users can verify any percentile themselves by running the open-source code.
Maximum likelihood estimation (MLE)
A method for estimating parameters by maximizing the likelihood of observing the data under a model. Rankquant uses Bessel-corrected variance (the unbiased estimator, df = n−1) rather than the MLE variance (biased, uses n) because unbiasedness matters more than asymptotic efficiency at small sample sizes.
Empirical Bayes
A framework where prior parameters are estimated from the data rather than specified a priori. Rankquant's imputed σ̃_s (used in R3) is empirical-Bayes in spirit: the "prior" dispersion of constant-rater reviewers is estimated from their source's pooled SD.
Shrinkage estimator
An estimator that moves raw observations toward a central value (shrinkage target) to trade bias for reduced variance. IMDb's Top 250 formula uses Bayesian shrinkage toward a global mean. Rankquant's CI-floor is a close cousin — instead of shrinking the estimate toward a prior, it subtracts a standard-error term. Both penalize small samples in a principled way.
James–Stein estimator
Stein's 1956 finding that when simultaneously estimating three or more means, shrinking them all toward a common grand mean dominates the unshrunken sample means in total squared error. Rankquant's empirical-CDF percentile step is a rank-based analog of this shrinkage intuition.
Fixed-effects vs random-effects aggregation
In meta-analysis, fixed-effects models assume all studies share one true effect; random-effects models allow per-study heterogeneity. Rankquant's reviewer-level aggregation is fixed-effects (we treat each reviewer as producing noisy observations of the same product-quality signal). A random-effects upgrade is a future methodology version.
Bootstrap resampling
A nonparametric method: draw B resamples (with replacement) from the original data, compute the statistic on each, use the resulting distribution as the sampling distribution. Rankquant uses bootstrap to validate analytical CI-floor coverage at small N and to estimate uncertainty in the percentile rank itself.
Kolmogorov–Smirnov test
A nonparametric test for whether a sample comes from a specified distribution (one-sample) or whether two samples share a distribution (two-sample). Rankquant uses KS in its pipeline diagnostics to flag reviewer distributions with unusual shapes (strong bimodality, hard truncation) that merit editorial inspection.
Skewness
A measure of distributional asymmetry. Positive skew = long right tail; negative skew = long left tail; 0 = symmetric. Online review distributions have strong negative skew (ceiling clustering). Per-reviewer normalization does not remove skew — it removes scale and location — which is why we use rank-based percentile mapping instead of a z-to-percentile normal CDF in Step 4.
Kurtosis
A measure of tail heaviness. Normal distribution has kurtosis = 3 (excess kurtosis = 0). Review-bombed products have low kurtosis (bimodal, thin tails around the middle); well-calibrated reviewer distributions have kurtosis close to normal.
Interquartile range (IQR)
The range between the 25th and 75th percentile (Q3 − Q1). A robust measure of spread, unaffected by extreme outliers. Used in Rankquant's pre-normalization outlier filter: ratings more than 1.5·IQR outside Q1/Q3 are flagged for editorial review.
Median absolute deviation (MAD)
A robust alternative to standard deviation: MAD = median(|x_i − median(x)|). Resistant to outliers. Rankquant uses MAD as a cross-check on σ_u when a reviewer's distribution is suspected of contamination (e.g. bot-pattern ratings).
Cohen's d
Standardized effect size: d = (μ_1 − μ_2) / σ_pooled. A Rankquant global-90th-percentile product has Cohen's d of roughly +1.3 vs the database mean — a large effect, meaning the product is genuinely distinguishable from average.
Statistical power (1 − β)
The probability of correctly rejecting a false null hypothesis. In Rankquant's context, power is the probability that two meaningfully-different-quality products receive distinguishable CI-floors. At N = 30 per product, power to distinguish effect sizes of d = 0.5 at our 90% confidence level is roughly 0.75.

See also: the full methodology · theory & derivations · wine glossary