2025 Tournament — Model Performance
Trained on 2013–2024 only · 2025 was a true holdout the model never saw during training
Accuracy by Round
How the model performed at each stage of the tournament
Methodology: The model was trained exclusively on 2013–2024 NCAA Tournament data
(693 games, 11 seasons). The 2025 season was held out entirely — these predictions represent
exactly what the model would have said on Selection Sunday 2025, before a single tournament game
was played. Features include adjusted offensive/defensive efficiency, shooting percentages,
rebounding rates, tempo, turnover rates, seeding, and conference affiliation.
No in-tournament or real-time data was used.
Round by Round Results
Green border = correct · Red border = incorrect · Blue badge = upset game
Biggest Wins & Misses
Biggest Wins = correct picks with highest confidence · Biggest Misses = wrong picks the model was most confident about
Calibration Analysis
A well-calibrated model’s actual win rate should match its predicted confidence. Gold = actual · Gray = expected
Actual win rate
Expected (confidence midpoint)
Calibration Data
Sample counts and accuracy per bucket
Bracket Score Simulation
How would this model’s deterministic pre-tournament bracket have scored on ESPN Tournament Challenge?
Unlike the accuracy analysis (which evaluates each game independently), bracket scoring compounds —
a wrong pick in Round 1 busts every downstream slot that team was projected to reach.
How the Score Compares
ESPN Tournament Challenge 2025 national benchmarks (approximate)
Points Earned by Round
■ Correct
■ Direct miss — model picked wrong team from the two that actually played
■ Cascade miss — model’s pick never made it to this round
Every Pick
How bracket scoring works:
ESPN awards 10→20→40→80→160→320 points per correct pick each round.
The maximum possible score is 1,920 points.
A “cascade miss” means the model’s pick for that slot was eliminated in an earlier round,
so the slot was already worth zero before the game was played — this is fundamentally different
from the per-game accuracy metric which evaluates every game independently against actual participants.
The model’s 84.1% per-game accuracy does not translate linearly to bracket score
because errors compound exponentially in single-elimination play.