2025 Tournament — Model Performance
Trained on 2013–2024 only · 2025 was a true holdout the model never saw during training
Accuracy by Round
How the model performed at each stage of the tournament
Methodology: The model was trained exclusively on 2013–2024 NCAA Tournament data (693 games, 11 seasons). The 2025 season was held out entirely — these predictions represent exactly what the model would have said on Selection Sunday 2025, before a single tournament game was played. Features include adjusted offensive/defensive efficiency, shooting percentages, rebounding rates, tempo, turnover rates, seeding, and conference affiliation. No in-tournament or real-time data was used.
Round by Round Results
Green border = correct · Red border = incorrect · Blue badge = upset game
Biggest Wins & Misses
Biggest Wins = correct picks with highest confidence · Biggest Misses = wrong picks the model was most confident about
Calibration Analysis
A well-calibrated model’s actual win rate should match its predicted confidence. Gold = actual · Gray = expected
Actual win rate Expected (confidence midpoint)
Calibration Data
Sample counts and accuracy per bucket
Bracket Score Simulation
How would this model’s deterministic pre-tournament bracket have scored on ESPN Tournament Challenge? Unlike the accuracy analysis (which evaluates each game independently), bracket scoring compounds — a wrong pick in Round 1 busts every downstream slot that team was projected to reach.
How the Score Compares
ESPN Tournament Challenge 2025 national benchmarks (approximate)
Points Earned by Round
■ Correct   ■ Direct miss — model picked wrong team from the two that actually played   ■ Cascade miss — model’s pick never made it to this round
Every Pick
How bracket scoring works: ESPN awards 10→20→40→80→160→320 points per correct pick each round. The maximum possible score is 1,920 points. A “cascade miss” means the model’s pick for that slot was eliminated in an earlier round, so the slot was already worth zero before the game was played — this is fundamentally different from the per-game accuracy metric which evaluates every game independently against actual participants. The model’s 84.1% per-game accuracy does not translate linearly to bracket score because errors compound exponentially in single-elimination play.