March Methods — 2025 Results

2025 Tournament — Model Performance

Trained on 2013–2024 only · 2025 was a true holdout the model never saw during training

Accuracy by Round

How the model performed at each stage of the tournament

Methodology: The model was trained exclusively on 2013–2024 NCAA Tournament data (693 games, 11 seasons). The 2025 season was held out entirely — these predictions represent exactly what the model would have said on Selection Sunday 2025, before a single tournament game was played. Features include adjusted offensive/defensive efficiency, shooting percentages, rebounding rates, tempo, turnover rates, seeding, and conference affiliation. No in-tournament or real-time data was used.

Round by Round Results

Green border = correct · Red border = incorrect · Blue badge = upset game

Biggest Wins & Misses

Biggest Wins = correct picks with highest confidence · Biggest Misses = wrong picks the model was most confident about

Calibration Analysis

A well-calibrated model’s actual win rate should match its predicted confidence. Gold = actual · Gray = expected

Actual win rate Expected (confidence midpoint)

Calibration Data

Sample counts and accuracy per bucket

Bracket Score Simulation

How would this model’s deterministic pre-tournament bracket have scored on ESPN Tournament Challenge? Unlike the accuracy analysis (which evaluates each game independently), bracket scoring compounds — a wrong pick in Round 1 busts every downstream slot that team was projected to reach.

How the Score Compares

ESPN Tournament Challenge 2025 national benchmarks (approximate)

Points Earned by Round

■ Correct ■ Direct miss — model picked wrong team from the two that actually played ■ Cascade miss — model’s pick never made it to this round

Every Pick

How bracket scoring works: ESPN awards 10→20→40→80→160→320 points per correct pick each round. The maximum possible score is 1,920 points. A “cascade miss” means the model’s pick for that slot was eliminated in an earlier round, so the slot was already worth zero before the game was played — this is fundamentally different from the per-game accuracy metric which evaluates every game independently against actual participants. The model’s 84.1% per-game accuracy does not translate linearly to bracket score because errors compound exponentially in single-elimination play.