Rankquant
MethodologyAbout⌕ Search

Statistics can lie — here's how, and what Rankquant does about it

Why this series exists

Rankquant's pitch is that statistics done well produces better consumer decisions than averaging done badly. That pitch has a corollary: statistics done badly can be worse than averaging — because it adds the appearance of rigor without the substance. Every well-known statistical fallacy has a review-aggregation version, and a methodology that doesn't name and pre-commit against them is just averaging in a tuxedo.

We'd rather name them. Each post in this series picks one fallacy, walks through the worked example, and points at the specific methodologyconstant that prevents Rankquant from falling into it. Where a defence is structural — pre-committed in writing — we say so. Where it's editorial — "trust us not to abuse this" — we say so even more loudly, because that's where the real risk lives.

The series so far

Planned future posts

Each will follow the same structure: name the fallacy, show the worked example, point at the structural defence in our methodology.

The principle behind the series

The right way to defend against statistical malpractice is to make malpractice structurally visible — to commit to the constants, the cohort definitions, and the confidence levels in writing, on a dated page, before any score is published. If we change them later, the change is itself a published event with a version bump and a rationale. Anyone — readers, journalists, statisticians, AI engines — can see the difference between the methodology as it was when a score was computed and the methodology as it is today.

That is what separates an editorial product from a marketing one. The editorial product invites you to argue with the methodology. The marketing product invites you to trust the brand. Rankquant is the first kind.

Each fallacy and the structural defence Rankquant has committed to.
Simpson's paradoxCohort percentiles are computed as a re-ranking of the same global CI-floor — never a separate computation. Both numbers are always shown together so cross-cohort flips are visible, not hidden.
Small-sample illusion90% one-tailed CI-floor with SE = 1/√N_eff. A thin-sample mean cannot rank ahead of a thick-sample mean unless the gap exceeds the SE penalty.
Goodhart's Law on weightsSource weights published on day one and version-bumped publicly when they change. Historical scores stay queryable at the weights that produced them.
Survivorship biasFull database — all 0–100 percentiles — is published. We don't curate "best of" without showing the rest.
Ecological fallacyEvery score is labelled with the peer set it was computed against. Cross-peer-set comparisons are explicitly flagged as not directly meaningful.
P-hacked confidence90% one-tailed pre-committed in writing before any score was computed. Documented in /methodology and /theory/confidence-intervals.
Each fallacy and the structural defence Rankquant has committed to.

Frequently asked questions

Are these fallacies hypothetical, or has Rankquant actually run into them?+
Both. Simpson's paradox shows up routinely in our data — wines that rank well in their price band but mid-table globally are common, and we surface both numbers so the cohort-vs-global gap is itself informative. The small-sample illusion is the reason the CI-floor exists. The Goodhart-on-weights and ecological-fallacy posts are pre-emptive: we published the structural defences before they could become problems.
Why publish the failure modes? Doesn't this give competitors a playbook?+
Two reasons. First, the failure modes are textbook — anyone who has taken a statistics class knows them already. There is no proprietary edge in keeping them quiet. Second, the value of a methodology authority is exactly that it names the failure modes openly. A review site that doesn't mention Simpson's paradox is either ignorant of it or hoping you don't notice. Both are worse positions than ours.
Where does editorial trust enter the picture?+
In a few places: which sources we admit (and at what weight); how we define a category boundary; which products are flagged for manual review. We document each of these and version-bump them publicly when they change. None of them affect a published score retroactively — historical scores stay at the methodology version that produced them.

Related: Full methodology · Founding metrics — the five primitives · Why we rank on the 90% CI-floor