The state of rating inflation, 2026
By Ryan Siegal · Founder and Principal
Refresh cadence: monthly.
Headline numbers — 2026 platform averages
Amazon's long-run average rating across billions of product reviews. Scale midpoint is 3.0; the platform sits 1.4 points above midpoint — 47% above the middle of its own scale.
Marketplace Pulse audits + Keepa historical data
Yelp's long-run average rating across US restaurants. Elite-vetted reviews trend slightly lower at 4.0; non-Elite and older reviews average higher.
Yelp transparency disclosures + academic review-data analyses
Goodreads' average rating for literary fiction in English. Genre fiction (romance, fantasy, YA, thriller) averages 4.3+.
Goodreads Year-in-Books aggregates 2019-2024
Booking.com's long-run hotel average. Properties below 8.0 are actively flagged as 'low-rated' — 20% of the nominal scale is effectively unused.
Booking.com scoring bands documentation
Effective range of professional wine scores since 1978. A 50-point nominal scale compressed into 15 usable points — the most inflated scoring regime in consumer reviews.
Wine industry retrospective analyses
Metacritic's long-run average Metascore for movies released since 2000. Closest to statistical midpoint of any platform we measured, thanks to professional-critic editorial norms.
Metacritic historical aggregate data
Distribution compression by platform
Rating inflation isn't just about the mean drifting upward — it's about distribution shape. Inflated platforms have tight right-skew distributions with most mass in the top 15% of the nominal scale.
| Amazon (1-5 stars) | Usable range: 3.5–5.0. Bottom 2.5 points of the scale hold <3% of all products. |
|---|---|
| Goodreads (1-5) | Usable range: 3.4–4.8. Ratings below 3.0 are vanishingly rare. |
| Yelp (1-5) | Usable range: 3.5–5.0 for restaurants. Closed restaurants skew the full distribution downward. |
| Booking.com (1-10) | Usable range: 7.5–9.5. Sub-7 scores concentrated in budget/hostel category only. |
| Wine Spectator (50-100) | Usable range: 85–100. The entire 50–84 range is used almost exclusively for critical or flawed wines. |
| Wine Advocate (50-100) | Usable range: 85–100. Same compression as Wine Spectator; Parker-era scoring normalized around 90. |
| Metacritic (0-100) | Usable range: 25–95. The most statistically well-behaved scale of any major consumer platform. |
| IMDb (1-10) | Usable range: 5.0–9.0 for films above minimum-vote threshold. Genre fans pull certain films (superhero, animated features) upward. |
| Rotten Tomatoes (0-100) | Technically 0-100, but effective range is bimodal: films cluster at 10-40% (Rotten) and 75-100% (Fresh) due to binary bucketing of underlying reviews. |
The 20-year drift, 2003–2026
Platforms that have existed for multiple decades all show the same pattern: steady upward drift of the mean. This isn't because products got better. It's because the social norms of reviewing drifted toward positivity:
| Amazon | 2003: 3.9 average → 2026: 4.4 average (+0.5 over 23 years). |
|---|---|
| Yelp | 2005: 3.7 restaurant average → 2026: 4.2 average (+0.5 over 21 years). |
| Goodreads | 2012: 3.8 literary fiction average → 2026: 4.1+ (+0.3 over 14 years; compressed baseline). |
| Booking.com | 2010: 7.6 average → 2026: 8.4 average (+0.8 over 16 years, on 1-10 scale). |
| Wine Spectator | Historical average for rated wines has drifted from ~87 in the 1990s to ~89 in 2020s. |
Online consumer reviews exhibit consistent self-selection bias driven by the differential propensity of consumers with extreme experiences to write reviews. Observed distributions are therefore systematically biased relative to the true quality distribution.
The information-loss cost
Distribution compression has an exact information-theoretic cost. A rating scale carries at most log₂(N) bits of information per rating, where N is the effective number of distinguishable levels. When the effective range compresses, information per rating drops:
| Amazon 1-5 stars | Nominal 5 levels = 2.3 bits. Effective (usable) 3 levels = 1.6 bits. Information loss: ~30%. |
|---|---|
| Wine Spectator 50-100 | Nominal 51 levels = 5.7 bits. Effective 16 levels = 4.0 bits. Information loss: ~30%. |
| Rotten Tomatoes binary | Nominal 100 levels = 6.6 bits per reviewer. Effective 1 bit per reviewer (fresh/rotten). Information loss: ~85%. |
| Metacritic 0-100 | Nominal 101 levels = 6.7 bits. Effective ~60 levels = 5.9 bits. Information loss: ~12%. Best of the major surfaces. |
Metacritic's comparatively good behavior comes from two editorial choices: cardinal (not binary) aggregation, and hand-assigned source weights. The Tomatometer's information-loss is structurally baked into its binary design.
What this means for Rankquant's methodology
Rating inflation is not a problem raw averaging can fix. Adding more reviews to an inflated platform doesn't recover signal — it just makes the mean more precise within the already-compressed range. The three-layer approach Rankquant uses addresses inflation at the distributional level:
- Source-weighted mean with source weights assigned by editorial rigor, not uniformly. Inflated platforms (Yelp 3, Vivino 2, Goodreads 3) receive less influence than professionally-edited sources (Wine Spectator 10, NYT Book Review 9, Michelin Keys 9).
- Bayesian adjustment protecting against thin-sample manipulation. A new product with three 5-star reviews gets pulled toward the peer-set mean until more data arrives.
- Within-category z-scoring against the narrowest stable peer set. Inflation affects absolute levels but not relative ranks — z-scores recover the ordinal information that averaging discards.
Methodology notes for this report
Platform averages here are drawn from public aggregates (Marketplace Pulse, Keepa, Goodreads Year-in-Books, Booking.com scoring-bands documentation), academic review-distribution studies, and internal Rankquant audits of publicly-visible review data. Sample sizes per platform vary (Amazon: billions; Booking.com: hundreds of millions; Wine Spectator: hundreds of thousands annually). All numbers are directional; the scales themselves prevent higher precision than the one-decimal-point format used above.
This report will be updated monthly as new data becomes available. Next scheduled update: .
Frequently asked questions
Why is rating inflation getting worse over time?+
Are there any platforms with low rating inflation?+
Does verified-purchase fix inflation?+
Can I see the raw data behind this report?+
When is the next update?+
Related: What is rating inflation (primer) · Bayesian averaging explained · The full Rankquant methodology