Whistlewatch

How we calculate Bias

Plain-English version of the formula, plus its known limits.

The 0–100 Bias Index

The Bias Index is a statistical leaning score, not an accusation. It compares a referee's per-match decision patterns to the league-wide average across the same season, and rolls five sub-scores into one composite 0–100 number.

  • 0–33 · close to or under league average — low.
  • 34–66 · noticeable asymmetry — mid.
  • 67–100 · pronounced asymmetry — high, worth a closer look.

High values do not imply wrongdoing. They flag that the data pattern deviates from the league baseline by an amount large enough to be interesting.

Sub-scores & weights (ADR-009)

Each sub-score is a 0–100 normalisation of one decision asymmetry.

Sub-scoreWeightStatus
Penalty imbalance (home vs away)0.30live
Card imbalance (home vs away)0.20live
VAR overruled rate0.20N/A · held at neutral 50
Stoppage-time bias0.15N/A · held at neutral 50
Disallowed-goals bias0.15N/A · held at neutral 50

The three "N/A" sub-scores are weighted out at 50 (perfectly neutral) so the live two carry the index. As data sources for VAR / stoppage / disallowed goals are added in later phases, the composite will tighten.

Data source & coverage

All match data comes from FBref via the open-source soccerdata library (ADR-008). Currently covered: Bundesliga, Premier League, La Liga, Serie A, Ligue 1 and Primeira Liga, seasons 2024-2025 and 2025-2026.

Only referees with at least 10 matches in a given (league, season) appear in the leaderboard. Smaller samples produce extreme bias values from a single fluke match and aren't statistically meaningful.

Update frequency

Pipeline runs every day at 05:05 UTC via GitHub Actions (ADR-014). New match data therefore reaches whistlewatch.fans within ~24 hours of the FBref upload. If no matches were played the previous day, the deploy step is skipped to keep the git history clean.

Known limitations

  • No VAR review counts (FBref does not expose per-match VAR fields)
  • No stoppage-time-per-half breakdown
  • No disallowed-goals counter
  • No statistical-significance bands yet — Phase 3+ will add them (ADR-011 planned)
  • The weights themselves are provisional — Phase 3+ will empirically calibrate them (ADR-010 planned)