Statistics & methodology glossary

Every statistical concept behind Rankquant's normalized percentiles, in plain language. Each entry is linkable as #term and exposes schema.org/DefinedTerm structured data for AI retrieval.

Normalized percentile: A 0–100 number indicating where a product's CI-floor ranks relative to every other product's CI-floor in the Rankquant database. 90 means top 10%; 50 means median; 10 means bottom 10%. Derived from the empirical cumulative distribution function (empirical CDF) of all products' CI-floors.
Global percentile: A product's normalized percentile against the full Rankquant database. Always stable because the reference set is very large. Useful for big-picture "is this product any good" questions.
Cohort percentile: A product's normalized percentile against only products in the same category and within ±20% of its list price. Mathematically it is a pure re-ranking of the same CI-floor used for the global percentile — no new computation. Useful for "is this good for what it costs?" questions.
Rating inflation: The phenomenon where online review scores cluster in a compressed high range (Amazon 4.4/5, Yelp 4.2/5, Goodreads 4.1+). Caused by self-selection bias, social signaling, and platform incentives. Makes raw averages nearly useless for consumer buying decisions — the problem Rankquant's per-reviewer normalization is designed to solve.
Per-reviewer z-score normalization: Rankquant's first operation: for each reviewer u, subtract their personal mean μ_u and divide by their personal SD σ_u. Converts a reviewer's rating to a z-score z_{u,i} = (r_{u,i} − μ_u) / σ_u in their personal units. Removes the reviewer main effect so different reviewers' scores become commensurable.
Z-score: The number of standard deviations a value sits above or below its mean. In Rankquant, z-scores are computed per reviewer so each reviewer's ratings become unit-variance in their personal distribution. A reviewer's z of +1.6 for a product means that product sits 1.6 personal-SDs above their personal mean.
Reviewer personal mean (μ_u): The arithmetic mean of all ratings a reviewer u has produced in our dataset, pooled across every category they've rated. Computed once and updated as new reviews arrive.
Reviewer personal standard deviation (σ_u): The Bessel-corrected sample standard deviation of all ratings a reviewer u has produced, pooled cross-category. Must be > 0 for inclusion in R1 and R2; if σ_u = 0 (reviewer gives every product the same rating) the reviewer is admitted only to the R3 broadened lens with an imputed σ̃_s from their source pool.
Cross-category reviewer pooling: Rankquant's decision to compute a reviewer's μ_u and σ_u across every category they've rated rather than per-category. Assumes a person's rating calibration is roughly stable across product types — an assumption we continue to validate empirically. Keeps n_u large and σ_u stable.
Qualifying reviewer: A reviewer with n_u ≥ 2 and σ_u > 0. Only qualifying reviewers enter R1 and R2. Reviewers with n_u ≥ 2 and σ_u = 0 are admitted only to R3. Reviewers with n_u = 1 are excluded from all three lenses — their μ and σ are undefined.
Reviewer main effect: The portion of raw-rating variance explained by "which reviewer is rating" rather than "which product is being rated." Per-reviewer z-score normalization mathematically zeros the reviewer main effect, leaving only the product-quality signal plus residual noise.
R1 (pure relative): Rankquant's headline aggregate. The unweighted mean z-score across every qualifying reviewer of a product. Every reviewer counts equally. Answers: "What does the crowd think, relative to each reviewer's own scale?"
R2 (source-weighted): A weighted-mean z-score aggregate in which reviewers from more credible sources count more. Weights w_s are published per source and version-stable. Complements R1 by asking: "what does the crowd think if we listen more to the most rigorous reviewers?"
R3 (broadened): An R1-style unweighted mean z-score that additionally includes reviewers with n_u ≥ 2 but σ_u = 0 via an imputed σ̃_s drawn from their source's pooled dispersion. Adds signal from consistent-rater reviewers otherwise excluded.
Source weight (w_s): A published number (typically 1–10) encoding Rankquant's editorial judgment of a review source's historical credibility. Affects R2 only; R1 and R3 treat all qualifying reviewers equally. Weights live in the open-source repo and any change requires a public version bump.
Imputed σ̃_s: A pooled standard-deviation estimate for a source s, computed from all variance-producing reviewers writing for that source. Used only in R3 as a stand-in σ when a specific reviewer has σ_u = 0. Introduces a small bounded bias in R3 in exchange for admitting additional reviewers.
CI-floor (90% confidence-interval lower bound): The ranking primitive Rankquant uses in place of raw mean z-scores. For each aggregate Ẑ (R1, R2, or R3): floor = Ẑ − 1.645 · SE(Ẑ). Penalizes products with few reviewers: a thin-sample 4-reviewer +2.1 mean loses to an 80-reviewer +1.6 mean because the former's SE is much wider. Same family as Wilson score (Reddit) and IMDb Top 250 shrinkage.
Standard error (SE): The standard deviation of a sampling distribution. For a mean z-score, SE(Ẑ) = 1/√N_eff. Shrinks as √N, so N=100 cuts the SE to one-third of N=10. Rankquant's CI-floor penalty scales linearly with SE.
Effective sample size (N_eff): For R1 and R3, N_eff is simply the count of contributing reviewers. For R2, N_eff = (Σ w_s)² / Σ w_s² (Kish's design-effect formula). Weighted aggregates always have N_eff ≤ N_raw; the ratio tells you how much the weighting collapses the effective sample.
Kish design-effect formula: The standard survey-statistics formula for the effective sample size of a weighted mean: N_eff = (Σ w)² / Σ w². When weights are equal, N_eff = N. When one observation dominates the weights, N_eff collapses toward 1. Used in Rankquant for R2's CI-floor denominator.
One-tailed critical value: The z or t value defining a one-sided rejection region. Rankquant uses z = 1.645 (90% one-tailed, equivalent to a defensibly pessimistic lower bound on the mean z-score). We use one-tailed rather than two-tailed because we care about the downside of the estimate, not both tails.
Wilson score interval: The continuity-corrected confidence interval for a binomial proportion. Used by Reddit's "best" sort and Yelp's internal ranking. Rankquant's CI-floor is the continuous-scale analog — same family of small-sample-safe ranking primitives.
Empirical cumulative distribution function (empirical CDF): A step function that counts the fraction of sample points below any given value. Rankquant computes the empirical CDF of all CI-floors in the database and uses it to convert each product's CI-floor to a 0–100 percentile. Robust to non-normality of the underlying CI-floor distribution.
Rank-based ranking: A ranking scheme that uses a value's rank among peers rather than its absolute magnitude. Robust to monotone transformations, outliers, and non-normality. Rankquant's percentile is a rank-based transformation of the CI-floor.
Cohort (±20% price, same category): The narrow peer set used for Rankquant's cohort percentile: products in the same category and within a symmetric ±20% price band. A $100 product's cohort is $80–$120. When the natural band produces fewer than 20 cohort members, we expand the band until ≥20 are in scope and log the expansion on the product page.
Cohort re-ranking: Converting a product's CI-floor percentile from global (against all products) to cohort (against cohort members only) by pure re-ranking of the same CI-floor. No new CI computation. Guarantees cohort rank is a deterministic function of CI-floors and cohort membership — auditable by hand.
Degrees of freedom (df): The number of independent values in a dataset after constraints. In Rankquant, σ_u uses df = n_u − 1 (Bessel's correction). For the product-level aggregate, the minimum admission rule is df ≥ 1 per reviewer (n_u ≥ 2). Deep dive at /theory/degrees-of-freedom/.
Bessel's correction: The n−1 denominator in the sample-variance formula (rather than n). Corrects the downward bias in σ² that would otherwise arise when the sample mean is estimated from the same data. Rankquant uses Bessel-corrected σ_u for every reviewer.
Standard deviation (σ): The square root of variance: a measure of spread around the mean. In Rankquant, each reviewer has a personal σ_u and (for R3 imputation) each source has a pooled σ̃_s. Z-score normalization uses σ_u as the per-reviewer scale factor.
Bias–variance tradeoff: The fundamental statistical tension: reducing bias in an estimator typically increases variance and vice versa. Rankquant's choice of CI-floor ranking (rather than mean ranking) accepts slight conservative bias in exchange for a large variance penalty on thin-sample products. Similar intuition drives Bayesian shrinkage.
Reviewer fixed effects: A regression-modeling framework in which each reviewer is assigned their own intercept, effectively removing their main effect from the product-quality estimate. Rankquant's per-reviewer normalization is a normalized analog: subtracting μ_u and dividing by σ_u is equivalent to absorbing a reviewer-specific intercept and scale.
Central Limit Theorem: The theorem that the distribution of a sample mean becomes approximately normal as sample size grows, regardless of the underlying distribution's shape. Justifies treating the sampling distribution of Ẑ (the product-level mean z-score) as approximately normal for the CI-floor computation once N ≥ ~6.
Intraclass correlation coefficient (ICC): A measure of reliability for continuous ratings. ICC = σ²_between / (σ²_between + σ²_within). In Rankquant, we compute reviewer-level ICC(1,1) on the z-scored data to quantify how much reviewers agree about product quality after personal-scale differences are removed. Deep dive at /theory/inter-rater-reliability/.
Cohen's kappa (κ): Chance-corrected agreement statistic for two raters on a categorical scale. κ = 0 means chance-level agreement; κ = 1 means perfect. Used for binary reviewers (Rotten Tomatoes fresh/rotten, Michelin star/no-star) that can't be z-score normalized in the standard way.
Fleiss' kappa: A generalization of Cohen's κ to more than two raters rating the same items categorically. Rankquant uses it for panels of binary reviewers where pairwise Cohen's would produce combinatorially many numbers.
Variance decomposition: Splitting total variance into additive components — in review data typically between-product, between-reviewer, and residual. Per-reviewer normalization mathematically zeros out the between-reviewer component, leaving between-product + residual for ICC computation.
Self-selection bias: A bias in voluntary review systems: consumers with strongly positive (or sometimes strongly negative) experiences are much more likely to leave reviews than consumers with middling experiences. Drives rating inflation. Per-reviewer normalization partially corrects for it by rescaling each reviewer's distribution to unit variance.
Right-skew: A distributional shape where most observations cluster near the top of the scale. Online review distributions are typically right-skewed (in the sense of ceiling-clustered). Rank-based percentile mapping is robust to right-skew; mean-based aggregation is not.
Outlier filtering: Removing reviews that diverge extremely from the bulk distribution — typically obvious manipulation (review bombs, 1-star content-farm reviews) before normalization. Rankquant applies a simple IQR-based filter on the raw-rating distribution before computing reviewer statistics.
Review bomb: A coordinated effort to lower a product's review score via large numbers of 1-star reviews from newly-created or low-activity accounts. Detected via reviewer-behavior heuristics (n_u = 1 with σ_u undefined excludes these reviewers automatically from R1/R2/R3).
Reproducibility: The property that running Rankquant's normalization on the same inputs (reviewer ratings + source weights + published constants) produces identical outputs. All inputs are published; users can verify any percentile themselves by running the open-source code.
Maximum likelihood estimation (MLE): A method for estimating parameters by maximizing the likelihood of observing the data under a model. Rankquant uses Bessel-corrected variance (the unbiased estimator, df = n−1) rather than the MLE variance (biased, uses n) because unbiasedness matters more than asymptotic efficiency at small sample sizes.
Empirical Bayes: A framework where prior parameters are estimated from the data rather than specified a priori. Rankquant's imputed σ̃_s (used in R3) is empirical-Bayes in spirit: the "prior" dispersion of constant-rater reviewers is estimated from their source's pooled SD.
Shrinkage estimator: An estimator that moves raw observations toward a central value (shrinkage target) to trade bias for reduced variance. IMDb's Top 250 formula uses Bayesian shrinkage toward a global mean. Rankquant's CI-floor is a close cousin — instead of shrinking the estimate toward a prior, it subtracts a standard-error term. Both penalize small samples in a principled way.
James–Stein estimator: Stein's 1956 finding that when simultaneously estimating three or more means, shrinking them all toward a common grand mean dominates the unshrunken sample means in total squared error. Rankquant's empirical-CDF percentile step is a rank-based analog of this shrinkage intuition.
Fixed-effects vs random-effects aggregation: In meta-analysis, fixed-effects models assume all studies share one true effect; random-effects models allow per-study heterogeneity. Rankquant's reviewer-level aggregation is fixed-effects (we treat each reviewer as producing noisy observations of the same product-quality signal). A random-effects upgrade is a future methodology version.
Bootstrap resampling: A nonparametric method: draw B resamples (with replacement) from the original data, compute the statistic on each, use the resulting distribution as the sampling distribution. Rankquant uses bootstrap to validate analytical CI-floor coverage at small N and to estimate uncertainty in the percentile rank itself.
Kolmogorov–Smirnov test: A nonparametric test for whether a sample comes from a specified distribution (one-sample) or whether two samples share a distribution (two-sample). Rankquant uses KS in its pipeline diagnostics to flag reviewer distributions with unusual shapes (strong bimodality, hard truncation) that merit editorial inspection.
Skewness: A measure of distributional asymmetry. Positive skew = long right tail; negative skew = long left tail; 0 = symmetric. Online review distributions have strong negative skew (ceiling clustering). Per-reviewer normalization does not remove skew — it removes scale and location — which is why we use rank-based percentile mapping instead of a z-to-percentile normal CDF in Step 4.
Kurtosis: A measure of tail heaviness. Normal distribution has kurtosis = 3 (excess kurtosis = 0). Review-bombed products have low kurtosis (bimodal, thin tails around the middle); well-calibrated reviewer distributions have kurtosis close to normal.
Interquartile range (IQR): The range between the 25th and 75th percentile (Q3 − Q1). A robust measure of spread, unaffected by extreme outliers. Used in Rankquant's pre-normalization outlier filter: ratings more than 1.5·IQR outside Q1/Q3 are flagged for editorial review.
Median absolute deviation (MAD): A robust alternative to standard deviation: MAD = median(|x_i − median(x)|). Resistant to outliers. Rankquant uses MAD as a cross-check on σ_u when a reviewer's distribution is suspected of contamination (e.g. bot-pattern ratings).
Cohen's d: Standardized effect size: d = (μ_1 − μ_2) / σ_pooled. A Rankquant global-90th-percentile product has Cohen's d of roughly +1.3 vs the database mean — a large effect, meaning the product is genuinely distinguishable from average.
Statistical power (1 − β): The probability of correctly rejecting a false null hypothesis. In Rankquant's context, power is the probability that two meaningfully-different-quality products receive distinguishable CI-floors. At N = 30 per product, power to distinguish effect sizes of d = 0.5 at our 90% confidence level is roughly 0.75.