Plain-English version of the formula, plus its known limits.
The 0–100 Bias Index
The Bias Index is a statistical leaning score, not an accusation. It compares a referee's per-match decision patterns to the league-wide average across the same season, and rolls five sub-scores into one composite 0–100 number.
0–33 · close to or under league average — low.
34–66 · noticeable asymmetry — mid.
67–100 · pronounced asymmetry — high, worth a closer look.
High values do not imply wrongdoing. They flag that the data pattern deviates from the league baseline by an amount large enough to be interesting.
Sub-scores & weights (ADR-009)
Each sub-score is a 0–100 normalisation of one decision asymmetry.
Sub-score
Weight
Status
Penalty imbalance (home vs away)
0.55
live
Card imbalance (home vs away)
0.45
live
VAR overruled rate
0.00
N/A · weighted out
Stoppage-time bias
0.00
N/A · weighted out
Disallowed-goals bias
0.00
N/A · weighted out
The three "N/A" sub-scores have no public data source we trust (ADR-012 logs the eight discovery URLs we tested), so their weight has been redistributed onto the two live sub-scores. That's statistically cleaner than fixing them at a fake-neutral 50 — the score now reflects only what we measured. If a VAR data feed becomes available in Phase 4+, the composite tightens automatically.
Data source & coverage
All match data comes from FBref via the open-source soccerdata library (ADR-008). Currently covered: Bundesliga, Premier League, La Liga, Serie A, Ligue 1 and Primeira Liga, seasons 2024-2025 and 2025-2026.
Only referees with at least 10 matches in a given (league, season) appear in the leaderboard. Smaller samples produce extreme bias values from a single fluke match and aren't statistically meaningful.
Why the range? Confidence intervals
Every referee page also shows a 90% confidence interval next to the bias index — e.g. "Bias 51, 90% CI 32–70". This is the range we're statistically confident the "true" bias would fall in if the referee officiated infinitely many comparable matches under the same league conditions.
We compute it as a Wald interval from the per-match Poisson standard error of the live sub-scores (penalties + cards), propagated through the weighted bias-index composition. Wide intervals mean small sample (~11 matches) and high uncertainty; narrow intervals mean ~25+ matches and a tighter estimate.
90% instead of 95% was a conscious choice: at our typical sample sizes a 95% interval would be ~25 points wide and visually drown out the point estimate. 90% keeps the band readable while still honestly reporting that small samples are noisy. See ADR-011 for the derivation.
Update frequency
Pipeline runs every day at 05:05 UTC via GitHub Actions (ADR-014). New match data therefore reaches whistlewatch.fans within ~24 hours of the FBref upload. If no matches were played the previous day, the deploy step is skipped to keep the git history clean.
Known limitations
No VAR review counts (FBref does not expose per-match VAR fields)
No stoppage-time-per-half breakdown
No disallowed-goals counter
The weights are provisional — Phase 3+ will empirically calibrate them (ADR-010 planned)