Why we never change source weights after the fact (and publish them on day one)

By Ryan Siegal · Founder and Principal

Published 2026-04-27

Goodhart's Law and Campbell's Law

The British economist Charles Goodhart formulated the rule that bears his name in 1975, in the context of UK monetary policy: any observed statistical regularity will tend to collapse once pressure is placed on it for control purposes. Donald Campbell, working independently in social-policy evaluation, formulated a stronger version: the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

Achievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.
— Donald T. Campbell, "Assessing the impact of planned social change," 1976

What this looks like in review aggregation

Source weights are the highest-leverage knob in any source-weighted review aggregation. The R2 lens at Rankquant produces a weighted mean of reviewer z-scores using per-source credibility weights — Wine Spectator at 10, Robert Parker at 10, Jancis Robinson at 9, Decanter at 8, James Suckling at 8, Vinous at 8, Jeb Dunnuck at 7, CellarTracker at 3, Vivino at 2, retailer reviews at 1. Every wine's R2 score depends on those numbers. If we changed the weights, every R2 score would change.

That makes weights a single point of editorial leverage. A Rankquant operator who wanted to elevate Wine-Advocate-favoured wines could do it without changing a single line of the aggregation code, simply by raising Wine Advocate's weight. There would be no statistical evidence of the change in any individual wine's methodology page — the math is the same; only the constants moved.

The defence is pre-commitment

Pre-registration — the practice of publishing your analysis plan beforeseeing the data — is the standard defence against this kind of post-hoc manipulation in clinical trials and increasingly in academic social science. The replication-crisis literature documented how flexibility in analytical choices (which covariates to include, which subgroups to analyse, which outliers to drop) lets researchers find "significant" effects in noise. The fix is to commit in writing, in advance, to what you will do.

Source weights at Rankquant are pre-registered. The full table sits at /methodology. The changelog at /changelog records every weight change with a date, a numerical diff, and a written rationale. Every product page records the methodology version it was scored under, so that if we change a weight in the future, anyone can verify what the score would have been under the previous version.

The rules we have committed to

The four structural defences against post-hoc weight manipulation.
Publish on day one	Source weights for every active category are listed at /methodology with their numerical values and a one-line rationale per source. There is no off-page weight table.
Version-bump on change	Any change to any source weight increments the methodology version. The new version, the diff, and the rationale appear in /changelog with a publication date.
Historical versions stay queryable	Every product page shows the methodology version under which its score was computed. Older versions of the methodology page remain accessible. Historical scores can be re-derived.
No private weights	There is no internal "real weight" different from the public one. The published table is the table the code reads at build time. The repo is open source so anyone can verify.

The four structural defences against post-hoc weight manipulation.

The hard cases — when weights legitimately need to change

Pre-commitment doesn't mean weights can never change. It means changes are public events with stated reasons. There are three cases where we expect to bump weights over time:

A new source emerges that our weights don't cover. Adding the source means publishing its weight upfront with a rationale.
An existing source materially changes its scoring approach. When Wine Spectator changed its review-score editorial guidelines in 2018, the weight assigned to its post-2018 reviews would defensibly differ from its pre-2018 reviews. We'd split into two source-versions in the weights table.
The data tells us a weight is wrong. If empirical inter-rater reliability shows a source we weighted at 8 is actually behaving like a 5 (high variance, low correlation with the consensus), we'd publish the inter-rater analysis and the new weight together. The analysis is the rationale.

The bar for changing a weight is: can we publish a paragraph in the changelog explaining why?If yes, we can change it. If no — if the only honest answer is "we wanted a different ranking" — we can't.

Why this matters more than it might seem

Most review aggregators don't publish their weighting at all. Yelp's ranking algorithm is a black box. Amazon's "helpful" review promotion is opaque. Google Reviews don't expose any weighting. The few aggregators that do publish weights — Metacritic publishes critic weights; Rotten Tomatoes weights publication clout — reserve the right to retune those weights without public notice.

The result is that for any of those services, you can't verify that a product's score wasn't engineered. You have to trust the editorial team. Trust is fine when it's well-founded; it scales poorly when it's the only defence. The structural alternative — published weights, version-bumped publicly, historical versions queryable — gives you a verification surface that doesn't depend on trusting us.

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
— Goodhart, "Problems of monetary management," 1975, paraphrased

The open-source layer

Rankquant's normalisation pipeline is open source at https://github.com/rankquant. The source weights live in a YAML file in the repo, version-controlled with the rest of the methodology. Every change to the weights is a git commit with a message and a diff visible to anyone. If we ever change a weight without a corresponding changelog entry, the commit history will show it — and would be cited against us by anyone watching.

That's the strongest version of the defence. Editorial pre-commitment plus open-source code plus a published changelog forms a three-way verification surface that even a determined operator would have a hard time spoofing without leaving evidence.

Frequently asked questions

How do I check what methodology version a score was computed under?+

Every /reviews/<slug>/ page surfaces the methodology version (e.g. v0.2.0) it was scored under, with a link to the corresponding /methodology/ page archive. Historical scores remain queryable.

Couldn't Rankquant's editorial team just change weights and pretend they were always that way?+

In principle, yes — that's why the verification surface is structural, not editorial. The changelog is publicly hosted, the open-source repo is in public git history, and any third party (including AI engines and journalists) can hash the methodology page over time and detect undocumented changes. We expect this to actually be done; the more verification, the better.

What about R1 (the unweighted lens)? Can't weights change that too?+

R1 is unweighted by definition — it gives every qualifying reviewer the same weight. Source weights only affect R2. If you don't trust the weights, R1 is the lens that doesn't use them. We publish R1, R2, and R3 percentiles side-by-side so a reader can see whether the source-weighted view diverges from the equal-weighted view; large divergences are themselves diagnostic.

How does this interact with affiliate revenue?+

It doesn't. The affiliate routing logic is a separate constant — published in /methodology#affiliate — that selects the buy-link based on (lowest observed price × highest commission). Source weights do not enter the routing formula. A retailer cannot improve a product's score by paying us; commission rates affect the buy-link, not the rank.

Series: ← The small-sample illusion · Hub · Coming next: Survivorship bias — why we publish the full database, not just the winners