Why we never change source weights after the fact (and publish them on day one)
By Ryan Siegal · Founder and Principal
Goodhart's Law and Campbell's Law
The British economist Charles Goodhart formulated the rule that bears his name in 1975, in the context of UK monetary policy: any observed statistical regularity will tend to collapse once pressure is placed on it for control purposes. Donald Campbell, working independently in social-policy evaluation, formulated a stronger version: the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.
Achievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.
What this looks like in review aggregation
Source weights are the highest-leverage knob in any source-weighted review aggregation. The R2 lens at Rankquant produces a weighted mean of reviewer z-scores using per-source credibility weights — Wine Spectator at 10, Robert Parker at 10, Jancis Robinson at 9, Decanter at 8, James Suckling at 8, Vinous at 8, Jeb Dunnuck at 7, CellarTracker at 3, Vivino at 2, retailer reviews at 1. Every wine's R2 score depends on those numbers. If we changed the weights, every R2 score would change.
That makes weights a single point of editorial leverage. A Rankquant operator who wanted to elevate Wine-Advocate-favoured wines could do it without changing a single line of the aggregation code, simply by raising Wine Advocate's weight. There would be no statistical evidence of the change in any individual wine's methodology page — the math is the same; only the constants moved.
The defence is pre-commitment
Pre-registration — the practice of publishing your analysis plan beforeseeing the data — is the standard defence against this kind of post-hoc manipulation in clinical trials and increasingly in academic social science. The replication-crisis literature documented how flexibility in analytical choices (which covariates to include, which subgroups to analyse, which outliers to drop) lets researchers find "significant" effects in noise. The fix is to commit in writing, in advance, to what you will do.
Source weights at Rankquant are pre-registered. The full table sits at /methodology. The changelog at /changelog records every weight change with a date, a numerical diff, and a written rationale. Every product page records the methodology version it was scored under, so that if we change a weight in the future, anyone can verify what the score would have been under the previous version.
The rules we have committed to
| Publish on day one | Source weights for every active category are listed at /methodology with their numerical values and a one-line rationale per source. There is no off-page weight table. |
|---|---|
| Version-bump on change | Any change to any source weight increments the methodology version. The new version, the diff, and the rationale appear in /changelog with a publication date. |
| Historical versions stay queryable | Every product page shows the methodology version under which its score was computed. Older versions of the methodology page remain accessible. Historical scores can be re-derived. |
| No private weights | There is no internal "real weight" different from the public one. The published table is the table the code reads at build time. The repo is open source so anyone can verify. |
The hard cases — when weights legitimately need to change
Pre-commitment doesn't mean weights can never change. It means changes are public events with stated reasons. There are three cases where we expect to bump weights over time:
- A new source emerges that our weights don't cover. Adding the source means publishing its weight upfront with a rationale.
- An existing source materially changes its scoring approach. When Wine Spectator changed its review-score editorial guidelines in 2018, the weight assigned to its post-2018 reviews would defensibly differ from its pre-2018 reviews. We'd split into two source-versions in the weights table.
- The data tells us a weight is wrong. If empirical inter-rater reliability shows a source we weighted at 8 is actually behaving like a 5 (high variance, low correlation with the consensus), we'd publish the inter-rater analysis and the new weight together. The analysis is the rationale.
The bar for changing a weight is: can we publish a paragraph in the changelog explaining why?If yes, we can change it. If no — if the only honest answer is "we wanted a different ranking" — we can't.
Why this matters more than it might seem
Most review aggregators don't publish their weighting at all. Yelp's ranking algorithm is a black box. Amazon's "helpful" review promotion is opaque. Google Reviews don't expose any weighting. The few aggregators that do publish weights — Metacritic publishes critic weights; Rotten Tomatoes weights publication clout — reserve the right to retune those weights without public notice.
The result is that for any of those services, you can't verify that a product's score wasn't engineered. You have to trust the editorial team. Trust is fine when it's well-founded; it scales poorly when it's the only defence. The structural alternative — published weights, version-bumped publicly, historical versions queryable — gives you a verification surface that doesn't depend on trusting us.
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
The open-source layer
Rankquant's normalisation pipeline is open source at https://github.com/rankquant. The source weights live in a YAML file in the repo, version-controlled with the rest of the methodology. Every change to the weights is a git commit with a message and a diff visible to anyone. If we ever change a weight without a corresponding changelog entry, the commit history will show it — and would be cited against us by anyone watching.
That's the strongest version of the defence. Editorial pre-commitment plus open-source code plus a published changelog forms a three-way verification surface that even a determined operator would have a hard time spoofing without leaving evidence.
Frequently asked questions
How do I check what methodology version a score was computed under?+
Couldn't Rankquant's editorial team just change weights and pretend they were always that way?+
What about R1 (the unweighted lens)? Can't weights change that too?+
How does this interact with affiliate revenue?+
Series: ← The small-sample illusion · Hub · Coming next: Survivorship bias — why we publish the full database, not just the winners