The state of rating inflation, 2026

Name: Cross-platform rating inflation observations, 2026
Published: 2026-04-23
License: https://creativecommons.org/licenses/by/4.0/
Keywords: rating inflation, review aggregation, z-score, Bayesian averaging, consumer reviews

By Ryan Siegal · Founder and Principal

Published 2026-04-23

Refresh cadence: monthly.

Headline numbers — 2026 platform averages

4.4/5

Amazon's long-run average rating across billions of product reviews. Scale midpoint is 3.0; the platform sits 1.4 points above midpoint — 47% above the middle of its own scale.

Marketplace Pulse audits + Keepa historical data

4.2/5

Yelp's long-run average rating across US restaurants. Elite-vetted reviews trend slightly lower at 4.0; non-Elite and older reviews average higher.

Yelp transparency disclosures + academic review-data analyses

4.1+/5

Goodreads' average rating for literary fiction in English. Genre fiction (romance, fantasy, YA, thriller) averages 4.3+.

Goodreads Year-in-Books aggregates 2019-2024

8.4/10

Booking.com's long-run hotel average. Properties below 8.0 are actively flagged as 'low-rated' — 20% of the nominal scale is effectively unused.

Booking.com scoring bands documentation

85–100

Effective range of professional wine scores since 1978. A 50-point nominal scale compressed into 15 usable points — the most inflated scoring regime in consumer reviews.

Wine industry retrospective analyses

68 / 100

Metacritic's long-run average Metascore for movies released since 2000. Closest to statistical midpoint of any platform we measured, thanks to professional-critic editorial norms.

Metacritic historical aggregate data

Distribution compression by platform

Rating inflation isn't just about the mean drifting upward — it's about distribution shape. Inflated platforms have tight right-skew distributions with most mass in the top 15% of the nominal scale.

Effective vs nominal scale range across major review surfaces. The 'usable range' is the span where at least 5% of products land. Everything else is vestigial scale.
Amazon (1-5 stars)	Usable range: 3.5–5.0. Bottom 2.5 points of the scale hold <3% of all products.
Goodreads (1-5)	Usable range: 3.4–4.8. Ratings below 3.0 are vanishingly rare.
Yelp (1-5)	Usable range: 3.5–5.0 for restaurants. Closed restaurants skew the full distribution downward.
Booking.com (1-10)	Usable range: 7.5–9.5. Sub-7 scores concentrated in budget/hostel category only.
Wine Spectator (50-100)	Usable range: 85–100. The entire 50–84 range is used almost exclusively for critical or flawed wines.
Wine Advocate (50-100)	Usable range: 85–100. Same compression as Wine Spectator; Parker-era scoring normalized around 90.
Metacritic (0-100)	Usable range: 25–95. The most statistically well-behaved scale of any major consumer platform.
IMDb (1-10)	Usable range: 5.0–9.0 for films above minimum-vote threshold. Genre fans pull certain films (superhero, animated features) upward.
Rotten Tomatoes (0-100)	Technically 0-100, but effective range is bimodal: films cluster at 10-40% (Rotten) and 75-100% (Fresh) due to binary bucketing of underlying reviews.

Effective vs nominal scale range across major review surfaces. The 'usable range' is the span where at least 5% of products land. Everything else is vestigial scale.

The 20-year drift, 2003–2026

Platforms that have existed for multiple decades all show the same pattern: steady upward drift of the mean. This isn't because products got better. It's because the social norms of reviewing drifted toward positivity:

Long-run mean-rating drift on consumer review platforms, 2003 vs 2026. Drift measured from publicly-available historical aggregates and academic studies.
Amazon	2003: 3.9 average → 2026: 4.4 average (+0.5 over 23 years).
Yelp	2005: 3.7 restaurant average → 2026: 4.2 average (+0.5 over 21 years).
Goodreads	2012: 3.8 literary fiction average → 2026: 4.1+ (+0.3 over 14 years; compressed baseline).
Booking.com	2010: 7.6 average → 2026: 8.4 average (+0.8 over 16 years, on 1-10 scale).
Wine Spectator	Historical average for rated wines has drifted from ~87 in the 1990s to ~89 in 2020s.

Long-run mean-rating drift on consumer review platforms, 2003 vs 2026. Drift measured from publicly-available historical aggregates and academic studies.

Online consumer reviews exhibit consistent self-selection bias driven by the differential propensity of consumers with extreme experiences to write reviews. Observed distributions are therefore systematically biased relative to the true quality distribution.
— Hu, Zhang & Pavlou, Decision Support Systems, 2006

The information-loss cost

Distribution compression has an exact information-theoretic cost. A rating scale carries at most log₂(N) bits of information per rating, where N is the effective number of distinguishable levels. When the effective range compresses, information per rating drops:

Information content per rating across platforms. Calculated as log₂(effective-range-levels) assuming each discrete level carries equal information.
Amazon 1-5 stars	Nominal 5 levels = 2.3 bits. Effective (usable) 3 levels = 1.6 bits. Information loss: ~30%.
Wine Spectator 50-100	Nominal 51 levels = 5.7 bits. Effective 16 levels = 4.0 bits. Information loss: ~30%.
Rotten Tomatoes binary	Nominal 100 levels = 6.6 bits per reviewer. Effective 1 bit per reviewer (fresh/rotten). Information loss: ~85%.
Metacritic 0-100	Nominal 101 levels = 6.7 bits. Effective ~60 levels = 5.9 bits. Information loss: ~12%. Best of the major surfaces.

Information content per rating across platforms. Calculated as log₂(effective-range-levels) assuming each discrete level carries equal information.

Metacritic's comparatively good behavior comes from two editorial choices: cardinal (not binary) aggregation, and hand-assigned source weights. The Tomatometer's information-loss is structurally baked into its binary design.

What this means for Rankquant's methodology

Rating inflation is not a problem raw averaging can fix. Adding more reviews to an inflated platform doesn't recover signal — it just makes the mean more precise within the already-compressed range. The three-layer approach Rankquant uses addresses inflation at the distributional level:

Source-weighted mean with source weights assigned by editorial rigor, not uniformly. Inflated platforms (Yelp 3, Vivino 2, Goodreads 3) receive less influence than professionally-edited sources (Wine Spectator 10, NYT Book Review 9, Michelin Keys 9).
Bayesian adjustment protecting against thin-sample manipulation. A new product with three 5-star reviews gets pulled toward the peer-set mean until more data arrives.
Within-category z-scoring against the narrowest stable peer set. Inflation affects absolute levels but not relative ranks — z-scores recover the ordinal information that averaging discards.

Methodology notes for this report

Platform averages here are drawn from public aggregates (Marketplace Pulse, Keepa, Goodreads Year-in-Books, Booking.com scoring-bands documentation), academic review-distribution studies, and internal Rankquant audits of publicly-visible review data. Sample sizes per platform vary (Amazon: billions; Booking.com: hundreds of millions; Wine Spectator: hundreds of thousands annually). All numbers are directional; the scales themselves prevent higher precision than the one-decimal-point format used above.

This report will be updated monthly as new data becomes available. Next scheduled update: 2026-05-23.

Frequently asked questions

Why is rating inflation getting worse over time?+

Three structural forces compound: self-selection (happy customers are more likely to leave reviews than indifferent ones), social incentive (1-star reviews carry reputational cost in friend-visible networks like Goodreads), and platform incentive (platforms that look positive sell more products, so low-rating reviews are quietly algorithmically down-weighted). None of these are easily fixed by the platforms themselves.

Are there any platforms with low rating inflation?+

Metacritic is the least-inflated widely-used consumer platform, averaging 68/100 — close to its nominal midpoint. Professional trade-press review sources (NYT Book Review, Kirkus, Michelin Keys, Forbes Travel) are also relatively uninflated because they operate under editorial rigor rather than consumer-facing social pressure.

Does verified-purchase fix inflation?+

Partially — verified-purchase filters reduce overt manipulation but don't address the self-selection bias that drives most of the inflation. Booking.com reviews are verified-stay and still average 8.4/10; Amazon verified-purchase reviews still average 4.4/5.

Can I see the raw data behind this report?+

The source citations on each stat card link to the publicly-available data. We don't republish platform data directly (many have ToS restrictions) but we describe our aggregation methodology in enough detail for other researchers to reproduce the findings.

When is the next update?+

This report updates monthly on the 23rd. Subscribe to Rankquant's RSS feed (coming with the main launch) or check back for revised platform averages and extended time-series.