Why Rotten Tomatoes' Tomatometer is statistically meaningless

By Ryan Siegal · Founder and Principal

Published 2026-04-23

What the Tomatometer actually measures

Despite looking like a 0-100 score, the Tomatometer is a percentage of reviewers, not a percentage of quality. Each critic's full-text review is assigned a binary fresh/rotten label by a Rotten Tomatoes editor using a threshold (typically around 60% in the reviewer's native scale, or positive-overall sentiment). The Tomatometer is the fraction of that binary that comes out "fresh".

So a film where 100 critics all wrote glowing 95/100 reviews has the same Tomatometer as one where 100 critics all wrote barely-positive 62/100 reviews: 100%. The first film is universally acclaimed; the second is universally considered mediocre-but-watchable. The Tomatometer cannot distinguish them.

2 buckets

The Tomatometer is a binary — every critic review is reduced to one bit of information.

Rotten Tomatoes methodology documentation

0-100

Metacritic's weighted Metascore preserves the cardinal score with editor-assigned source weights.

Metacritic methodology

~60%

The threshold Rotten Tomatoes uses to label a reviewer's score 'fresh'. A 59 is rotten; a 61 is fresh. Two-point difference in the review = sign-change in aggregate.

Published editorial guidelines

The information loss is quantifiable

In information-theoretic terms, a binary contains at most 1 bit of information. A 100-point score contains about 6.6 bits (log₂ 100). The Tomatometer throws away roughly 85% of the information in each critic's review before aggregation.

That loss compounds at the aggregate level. With 100 critic reviews of a film:

Cardinal aggregate (Metacritic): 100 critics × 6.6 bits ≈ 660 bits of signal before weighting, with the weighted mean recovering most of it.
Tomatometer: 100 critics × 1 bit = 100 bits, with the percentage recovering a small fraction.

Empirically, this shows up as the Tomatometer's inability to distinguish "broadly liked" from "universally acclaimed." Both can pin at 95-100%. A cardinal aggregate would show the former at 70-80 and the latter at 90-95.

Why the binary exists anyway

The fresh/rotten binary made sense when RT launched in 1998: most critics weren't using numeric scales, and a binary classification worked around the heterogeneity of star counts, letter grades, and pure-prose reviews. "Recommendation or not" was a lowest common denominator.

It's 2026 and almost every publication now publishes a numeric score or at minimum supplies a 5-star equivalent. The technical justification for the binary has disappeared. The Tomatometer format survives because it's instantly understandable and has become a marketing signal in its own right.

Rotten Tomatoes has become, despite itself, the Michelin Guide of American cinema. A binary built as a convenience now shapes what studios finance.
— Harper's Magazine, on movie-rating culture

Metacritic's approach has its own problems

Metacritic's Metascore is a weighted mean of critic scores on a 0-100 scale, with source weights assigned by Metacritic editors. Preserves information. But:

The source weights are opaque — Metacritic doesn't publish them.
Not all critics supply a numeric score; Metacritic's editors convert prose to a number using unpublished heuristics.
The scale still suffers inflation pressure from contemporary critic norms.

Metacritic is better than Rotten Tomatoes but still not what a consumer needs for a confident recommendation. The ideal tool combines the Metacritic-style cardinal aggregate with within-category z-scoring — which is what Rankquant does for movies. Source weights are published, scale is deterministic, and the output is a 1-5 score where a 5 means top 5% of the peer set.

What Rankquant does for movies

Rankquant's movie source weights (published at /methodology):

Metacritic (pre-weighted professional aggregate) — 8
Rotten Tomatoes (critics) — 6 (binary loses info)
Letterboxd (enthusiast audience, cleaner signal than IMDb) — 6
IMDb (weighted average) — 5

The Tomatometer is included but down-weighted because of the binary-information-loss problem. Metacritic carries more weight because it preserves cardinal information. Letterboxd is a cleaner crowd signal because its audience is cinephiles rather than general viewers.

Normalized across the peer set (genre × era × budget tier), the resulting 1-5 score distinguishes "broadly liked" from "universally acclaimed" — something no single source alone can do.

Frequently asked questions

Is Rotten Tomatoes useless?+

Not useless — just low-information per critic review. The Tomatometer is a useful popularity-of-approval signal, but it doesn't distinguish intensity of approval. If you need to know whether a film is broadly recommended, the Tomatometer works. If you need to rank films by quality, it doesn't.

Doesn't RT also have an "Audience Score"?+

Yes, on a 0-100 scale, and it's more informative than the Tomatometer. Audience scores suffer the usual crowd-platform inflation biases (self-selection toward fans, review-bomb campaigns for contested releases) but retain cardinal information.

What's the argument FOR binary aggregates?+

Clarity. A 91% on the Tomatometer is instantly communicable: 91% of critics think it's good. A Metacritic 74 requires interpretation. The binary format sacrifices signal for legibility. For marketing purposes that's acceptable; for buying decisions it's not.

Does Rankquant include audience scores or only critics?+

Both. Crowd sources (Letterboxd for cinephiles, IMDb for general viewers) are weighted in addition to professional critic sources. The weighting reflects our belief in each source's signal quality; all weights are published.