300 likes | 1.97k Views
Simulating Sports: The Inputs and the Engines Paul Bessire Product Manager, Quantitative Analysis and Content FOX Sports Interactive, WhatIfSports.com July 15, 2009 Table of Contents WhatIfSports.com Overview Challenges with Simulating Baseball Plate Appearance Decision Tree
E N D
Simulating Sports: The Inputs and the Engines Paul Bessire Product Manager, Quantitative Analysis and Content FOX Sports Interactive, WhatIfSports.com July 15, 2009
Table of Contents • WhatIfSports.com Overview • Challenges with Simulating Baseball • Plate Appearance Decision Tree Or “Improving the log5 Normalization Model for Batter/Pitcher Matchups” • Pedro vs. Ruth (mostly second presentation)
About WhatIfSports.com • February 2000 - Launched in Cincinnati with SimMatchup • 2001 - SimLeague Baseball (like Strat-o-Matic) and Basketball; Paul Bessire runs free leagues • 2002 – SimLeague Football and Hockey; Paul wins own baseball league with “Streaking Ho-Hos” • 2004 – Hoops Dynasty and Gridiron Dynasty; Paul joins WIS part-time “between school” • 2005 – WhatIfSports.com acquired by FOX Interactive Media; Paul comes on full-time • 2006 – Hardball Dynasty and Clutch Racing Dynasty; All simulations rewritten with Paul’s help • 2008 – FC Dynasty • Present – 600,000+ registered users, part of FOX Sports TV group
Sports Simulation • Play-by-play • A “play” means something different for each sport • Probabilities for every individual outcome • Random number generation • Pitch-by-pitch (or basketball/hockey pass-by-pass) not needed • Account for every possible statistical interaction during a game • Can be recreated quickly • 200+ games/second • All data tracked • Every outcome is different • Boxscore (link) • Many relevant applications (second presentation)
Baseball Challenges • Missing Player Data • Defensive Metrics • Ballpark Effects • Era Adjustments • Assigning Value (Salaries ~ RC27# * PA or ERC# * BF + Fielding + Extremes) • Career “Seasons” (Pujols #3 in career $/PA, Musial #16; Gibson #31 in $/IP) • Fatigue (Projected PA vs Actual PA/162 or Projected IP & GP% vs Actual IP/162 & Historical GP%)
Missing Player Data • Typically solved with Regression • Linear: Pitchers’ 2B or 3B per hit allowed or Pitches Thrown per BF • Multivariate: Ballpark Effects • May be Era and/or Ballpark Adjusted • Discriminate Analysis/Cluster Analysis • Catcher’s Arm Ratings • Basketball Positional Effectiveness • Fitting to a curve/distribution • Player Generation and Development • Assigning Ratings or Grades
Pitchers HBP/BF BB/(BF – HBP) OAV 1B/Hit Allowed 2B/Hit Allowed # (regression) 3B/Hit Allowed # (regression) HR/Hit Allowed K/Out # (regression) GO/FO # (regression for GO) BF # (approx. ~ outs + hits + bb + hbp) Pitches Thrown/BF # (regression) Relative Range Factor # (WIS formula) Fielding Percentage # (fit to curve for grade) Handedness (historical impact) Ballpark Effects # (multivariate regression) League Averages Hitters HBP/PA BB/(PA – HBP) AVG 1B/Hit 2B/Hit 3B/Hit HR/Hit K/Out # (regression) GO/FO # (regression for GO) PA Relative Range Factor # (WIS formula) Fielding Percentage # (fit to curve for grade) Catcher Arm Rating # (discriminate analysis) CS% (Runner) # (regression for CS) Speed Rating # (WIS forumla) Handedness (historical impact) Ballpark Effects # (multivariate regression) League Averages Significant Stats ( # has missing data)
Pitchers Wins Losses Saves Holds Complete Games Shutouts ERA (kind of – 2B and 3B approx) Unearned Runs Games Started Pitch Types Performance in Counts Other Situational Stats Hitters RBI IBB Runs (kind of – in Speed Formula) GIDP (kind of – in Speed Formula) SF (kind of – in PA, but also situational) SH (kind of – in PA, in but also situational) SBA (kind of – attempts, but also setting) Performance in Counts Other Situational Stats Insignificant Stats
WIS Relative Range Factor • Range Factor • Important because range can turn hits into outs and outs into hits • Generally defined as (Putouts + Assists)/(Innings/9) • Reliant on many factors • Wildly inconsistent across eras • Does not include errors • Need another metric… • WIS Relative Range Factor • Similar to Bill James RRF, but not as robust (data limitations) • Approximates plays made/possible plays made • Used to approximate + and – plays • Includes errors • Era-adjusted
Ballpark Effects • LINK
PA Decision Tree - Normalization Every step in PA uses modified* log5 normalization (Bill James AVG example): H/AB = ((AVG * OAV) / LgAVG) / ((AVG * OAV) / LgAVG + (1- AVG )*(1- OAV)/(1-LgAvg)) Where, LgAVG = (PLgAVG + BLgAVG)/2 2000 Pedro vs. 1923 Ruth Example: H/AB = ((.393 * .167) / .2791) / ((.393 * .167) / .2791+ (1- .393)*(1- .167)/(1-.2791)) Where, LgAVG = (.283 + .276)/2 or .2791 Result = .2504 * Modified due to a flaw in the assumption above that the batter and pitcher carry equal (50/50) weights on each possible outcome of the PA event. Also accounts for handedness and ballpark.
Plate Appearance Unusual Event (IBB, WP, PB, SB, CS, SH, Hit and Run, Pickoff, Balk) Normal PA HBP (per PA or BFP) Not HBP BB (per PA or BFP – HBP) At Bat… PA Decision Tree – Steps 1* * No ballpark or handedness adjustments made yet.
PA Decision Tree – Steps 2 * Historical handedness adjustment and ballpark hits multiplier used.
Hit* Normal – In Play HR* (HR/Hit) Out (Plus Play) Normal Hit 3B* (3B/Hit * multiplier for lost HR) 2B* (2B/Hit * multiplier for lost HR) 1B PA Decision Tree – Steps 3 * Ballpark multipliers used.
PA Decision Tree – Matchup Weights Addresses previous 50/50 assumption using League-Adjusted Variance to form batter and pitcher weights for each step:
Matchup Weights: What does this mean? • Batter always has more control (even with HBP and BB) • Makes final decision (Swing or not) • Dictates strike zone • Less consistent • Doubles and Triples are (mostly) out of pitcher’s control (BABIP) • Does not necessarily batting is more important • 9 vs. 1 • Fewer pitcher outliers means elite pitchers are more valuable
PA Decision Tree - Normalization Batting Average Example using Matchup Weights: H/AB = ((1.066*AVG * .934*OAV) / LgAVG) / ((1.066*AVG * .934*OAV) / LgAVG + (1.066- 1.066*AVG )*(.934- .934*OAV)/(1-LgAvg)) Where, LgAVG = (.934*PLgAVG + 1.066*BLgAVG)/2 2000 Pedro vs. 1923 Ruth Example (with handedness): H/AB = ((1.066*.393 * .167 * .934) / .2795) / ((.393 * .167) / .2795+ (1- .393)*(1- .167)/(1-.2795)) Where, LgAVG = (1.066*.283 + 0.934*.276)/2 or .2795 Result * Handedness = .2502 * 1.045 Final Result = .2614
Thanks Questions? – @lunch or after second presentation Email: PBessire@WhatIfSports.com Phone: 513-291-0321 See me for business card with promo code