What is rating inflation? The statistical problem behind every 4.5-star average
By Ryan Siegal · Founder and Principal
The problem, in one chart you don't need to see
Imagine you're buying headphones. You open Amazon. You see:
- Sony WH-1000XM6 — 4.5 stars
- Bose QuietComfort Ultra — 4.5 stars
- AirPods Max — 4.5 stars
- Sennheiser Momentum 4 — 4.5 stars
Every one of the top products is rated the same. The number is useless for deciding. This is what 20 years of rating inflation looks like: the statistic technically exists, but all the distinguishing information has been squeezed out of it.
Amazon's long-run cross-category average rating across billions of reviews.
Marketplace Pulse review analyses
Share of Goodreads books rated below 3.5 stars — the platform's right-skew is so extreme that the entire bottom half of the scale has almost no mass.
Aggregate Goodreads audit, 2024
Share of Yelp restaurants rated 3.5+ stars in major metros. Yelp's "average" restaurant is already above the midpoint of the scale.
Yelp transparency & academic review-data analyses
Why it happens
Three structural forces drive review inflation on almost every public platform:
- Self-selection. People who bought a product and loved it are more likely to leave a review than people who felt indifferent. The silent middle of the buyer distribution is invisible in the data.
- Rating-as-signal, not feedback. A 5-star review is often a social signal of satisfaction with the transaction, not a careful assessment of the product. Buyers reward their own decisions.
- Platform incentives. Amazon, Goodreads, Yelp, and Booking.com all benefit commercially from looking positive. Low-rating reviews are sometimes actively suppressed (flagged as abusive, hidden behind filters, or algorithmically de-weighted).
These forces compound. The result: averages drift upward year-over-year until almost nothing in the database sits below 4-and-change. The tail of the distribution flattens.
Online reviews are subject to a strong self-selection bias: consumers with extreme opinions (especially positive ones) are far more likely to write reviews, producing bimodal and right-skewed distributions that substantially mislead simple averages.
Why the usual fixes don't work
Platforms know about this. The ones that have tried to fix it usually apply the wrong tool:
- Weighted recent reviews.Helps with drift but doesn't address the scale-compression problem. The average is still squeezed at the top of the range.
- Verified-purchase badges.Reduces fake-review pressure but doesn't change the self-selection bias that produces inflation in real reviews.
- Rotten-Tomatoes-style binning.Replaces a rich score with a binary "fresh/rotten" that throws away most of the information. Reduces inflation by redefining "good" but destroys signal in the process.
The right fix: category-relative normalization
The information you actually want when buying isn't "this product's absolute average," it's "how does this product compare to its real alternatives?" That's a statistically different question, and the right tool is category-relative normalization.
Z-score normalization asks: how many standard deviations above or below the category mean does this product sit? That statistic is robust to inflation because everything is compressed by the same amount — the RELATIVE position still carries signal even when the absolute means have drifted. Combined with Bayesian prior adjustment (which protects against thin-sample outliers) and percentile re-mapping (which puts the output back on a familiar 1-to-5 scale), you get the tool that review aggregation should have adopted two decades ago.
This is the approach Rankquant publishes at /methodology and open-sources at github.com/rankquant/normalize. Nothing novel mathematically; what's new is applying it rigorously to reviews and showing the work on every output.
Frequently asked questions
Is rating inflation actually bad for consumers?+
Isn't this just the "grade inflation" problem from universities?+
Does verified-purchase help?+
Why not just use median instead of mean?+
Related: Bayesian averaging is the right tool for review aggregation · How Goodreads broke book reviews · The full Rankquant methodology