Rankquant
MethodologyAbout⌕ Search

What is rating inflation? The statistical problem behind every 4.5-star average

The problem, in one chart you don't need to see

Imagine you're buying headphones. You open Amazon. You see:

Every one of the top products is rated the same. The number is useless for deciding. This is what 20 years of rating inflation looks like: the statistic technically exists, but all the distinguishing information has been squeezed out of it.

4.4/5 avg

Amazon's long-run cross-category average rating across billions of reviews.

Marketplace Pulse review analyses

<3%

Share of Goodreads books rated below 3.5 stars — the platform's right-skew is so extreme that the entire bottom half of the scale has almost no mass.

Aggregate Goodreads audit, 2024

97%

Share of Yelp restaurants rated 3.5+ stars in major metros. Yelp's "average" restaurant is already above the midpoint of the scale.

Yelp transparency & academic review-data analyses

Why it happens

Three structural forces drive review inflation on almost every public platform:

  1. Self-selection. People who bought a product and loved it are more likely to leave a review than people who felt indifferent. The silent middle of the buyer distribution is invisible in the data.
  2. Rating-as-signal, not feedback. A 5-star review is often a social signal of satisfaction with the transaction, not a careful assessment of the product. Buyers reward their own decisions.
  3. Platform incentives. Amazon, Goodreads, Yelp, and Booking.com all benefit commercially from looking positive. Low-rating reviews are sometimes actively suppressed (flagged as abusive, hidden behind filters, or algorithmically de-weighted).

These forces compound. The result: averages drift upward year-over-year until almost nothing in the database sits below 4-and-change. The tail of the distribution flattens.

Online reviews are subject to a strong self-selection bias: consumers with extreme opinions (especially positive ones) are far more likely to write reviews, producing bimodal and right-skewed distributions that substantially mislead simple averages.

Hu, Zhang & Pavlou, Decision Support Systems, 2006

Why the usual fixes don't work

Platforms know about this. The ones that have tried to fix it usually apply the wrong tool:

The right fix: category-relative normalization

The information you actually want when buying isn't "this product's absolute average," it's "how does this product compare to its real alternatives?" That's a statistically different question, and the right tool is category-relative normalization.

Z-score normalization asks: how many standard deviations above or below the category mean does this product sit? That statistic is robust to inflation because everything is compressed by the same amount — the RELATIVE position still carries signal even when the absolute means have drifted. Combined with Bayesian prior adjustment (which protects against thin-sample outliers) and percentile re-mapping (which puts the output back on a familiar 1-to-5 scale), you get the tool that review aggregation should have adopted two decades ago.

This is the approach Rankquant publishes at /methodology and open-sources at github.com/rankquant/normalize. Nothing novel mathematically; what's new is applying it rigorously to reviews and showing the work on every output.

Frequently asked questions

Is rating inflation actually bad for consumers?+
Yes. When 90% of products are rated 4+ on a 1-5 scale, the score can't be used to discriminate between them. Consumers still rely on the number — they just have to guess which 4.5 is actually the best, often by reading individual reviews, which is slow and biased by recency effects.
Isn't this just the "grade inflation" problem from universities?+
Structurally similar. Both arise from self-selection pressure and platform incentives to avoid negative signals. The statistical response — normalize grades against class or peer-set distribution — has been standard university practice for decades. Applying the same tool to consumer reviews is overdue.
Does verified-purchase help?+
Partially. Verified-purchase filters out the most blatant paid-review manipulation, but the self-selection bias that drives inflation operates just as strongly on real buyers. A real buyer who loved a product is still far more likely to review it than one who was indifferent.
Why not just use median instead of mean?+
Medians help with outlier robustness but don't fix scale compression. If 80% of reviews are 4 or 5 stars, the median is always in that range — it loses discrimination power the same way the mean does. Z-score normalization addresses the distributional shape itself, not just central tendency.

Related: Bayesian averaging is the right tool for review aggregation · How Goodreads broke book reviews · The full Rankquant methodology