What is Beta_Rank
Updated: Nov 20, 2019
Many of you are probably familiar with advanced statistics by now, but I think a good number of fans and media are still consuming and using statistics in football that are less than ideal. Any measure where you are using
metric=(value/number of games)
is not very descriptive of how good a team actually is. Since not every team runs the same type of offense comparing a slow it down offense against a hurry up offense over a season is not apples to apples. That doesn't even get into whether it is fair to compare the yards or points per game, drive, or play for very different schedule strengths. It isn't.
Trying to make Football Stats like Baseball Stats
I am going to argue that what works in baseball, getting down to foundation statistics like On Base Percentage doesn't work as well in football. I would also argue it doesn't work that well in baseball in predicting team/game outcomes. You can't simply work off of yards and play level data alone in football because the relationship between yards and points is very different from bases and runs. Getting into the endzone in football can be as simple as an explosive play, like a home run, but chaining together a bunch of play calls that turn into first downs is not much at all like chaining together a bunch of singles or walks. There is more choice involved, more chance for execution error, far more moving parts, and a far weaker relationship between one event and another. I am not saying yards per play level measures are not important, they are, but if you have not placed them in system that also factors them into a path dependent drive that leads to points you are missing something. Its part of why pure yards based models do badly at predicting college football scores compared to pure points based models. Sure there is some turnover luck involved in points, but there is also a measurable skill involved in chaining plays that gain yards together into a drive that gets you points. You can't just skip over points and stick to yards any more than you should skip over yards and stick to points.
A more sophisticated approach to College Football Advanced Stats
Beta_Rank is my answer. I was frustrated as I delved deeper into college football advanced stats. Some measures generate nonsense results, some produce results that are just Yards per Play with a weak adjustment for schedule, but the more I read about the methodology the more convinced I became that I could do better. After all, its what I do for a living. I have 10+ years professional experience as an econometrician (or data scientist if you like). I have built online targeting algorithms that decide whether to you should see an ad, what we should bid, and what ad you should see in less than half second, I have built multi-touch attribution models that help companies correctly asses digital performance, and most importantly I have professional experience in Multi-Level Hierarchical Bayesian models. I didn't do this for Joe Blow company you have never heard of. I have worked at American Express, The New York Times, Blackwood Seven, and I am currently the Lead Data Scientists at an Auto Insurance firm. I did my graduate work in economics at Vanderbilt University. I have the chops if you will.
College football data is actually highly regularized and small compared to the big data challenges I solve professionally. Beta_Rank is built on drive level data from Sports Source Analytics, the same company that provides data to the College Football Playoff Committee, and even though I often refer to it as "the model" as shorthand; it's actually several interlocking models that produce better predictions using better math that attempts to replicate the relationship between plays to yards to drives to points. Within the model yards based metrics are used a dependent variables in the first stage to create team measures on these factors. These are then factored into a points dependent variable model that captures drive factors. It is completely feasible for two teams with the same average yards per play to put up very different points per drive and the model explicitly takes this into account. Each metric has a dynamic score that is retrained every week with more data.
What you get is an idea of what teams are actually good and bad at and with a strong opponent adjustment. Who you play really matters in Beta_Rank and the model is constantly readjusting what it thinks about a team as the season goes on. This may seem like a deep dive but its worth pointing out that all model outputs (posterior distributions) are essentially =
(prior distributional assumption)x(actual data).
If you haven't played anyone very good yet, example: we don't have data for you against a top ten opponent, then any model is going to fill that prediction with the prior exclusively. So if your team has been playing a bunch of lousy bums and suddenly plays Alabama your score could change quite a bit, because suddenly we actually know something (not everything) about how you play against top competition instead of just projecting it. This is true for any part of the prior distribution, so if you have a murderer's row of a schedule and then play the Little Sister's of the Poor; you could move quite a bit too. I would argue this is a good thing. There are folks in the advanced analytics space that make statements praising the stability of their metrics late in the football season, but that is wrongheaded. In non-probabilistic space metric stability is crucial and simpler to achieve due to metrics often not having wide drill down, but pretending probabilistic metrics with wide drill down (like a rank order of CFB teams), should not move late in the season is a fundamental misunderstanding of probability theory and a case of over-engineering. Teams are unlikely to move a lot late in the season because we have a weight of data behind them, but where new and old data sits on the prior distribution and what that new data and old data are should determine how much new data moves rankings; not human decisions.
What are Beta_Rank's Measures?
Beta_Rank: The sum of Offensive, Defensive, and Special Teams Beta_Ranks. Defense is subtracted because very good defenses have negative Defensive Beta_Rank scores.
O_Score: The Offensive Beta_Rank score for a team. It is the sum of the teams scores on 4 measures of offensive performance, Drive Efficiency, Play Efficiency, Explosiveness, Avoiding Negative Drives.
D_Score: The Defensive Beta_Rank score for a team. It is the sum of the teams scores on 4 measures of offensive performance, Drive Efficiency, Play Efficiency, Explosiveness, Causing Negative Drives. Naturally for Defense the values are reversed.
Spcl_Tm_Score: The sum of offensive and defensive special teams regarding punts, punt retruns, punts inside the 20, kickoffs, and kick returns. Field Goals are included here, where I uses an expected points dependent variable subtracted from an actual points dependent variable. Sched_Strength: The sum of Offensive and Defensive Schedule Strength divided by the number of games played. If you played a murder's row of Offenses your Offensive score would be very high, if you played a murder's row of Defenses you would have a very negative score. The sum of these scores is your total Sched_Strength. Takes into account where your games were played; home, road, neutral.
Record_Strength: The value of the games you actually won.
Drive_Efficiency: The team offensive and defensive residual in the model. This captures the point value left unexplained by the other components of the model (yards per play, explosive drives, negative drives, starting field position, special teams, home/away). In other words who still scores more points even when controlling for the other offensive factors.
Play_Efficiency: This is the effect of yards per play on all points on NCAA drives multiplied by the team specific yards per play controlling for opponent, starting field position, and home/away.
Explosive Drives: This is the effect of a binary where yards per play >7.5 on all points on NCAA drives multiplied by the team offense and defense specific yards per play controlling for opponent, starting field position, and home/away. It corrects for multiple linear solutions to the yards per play to points relationship.
Negative Drives: This is the effect of a binary where yards per play < 3.3333 and turnovers on all points on NCAA drives multiplied by the team offense and defense specific yards per play controlling for opponent, starting field position, and home/away. It corrects for multiple linear solutions to the yards per play to points relationship.
The model output is a rating or pure prediction of what a team should do if they played tomorrow. It is not a ranking of what team most deserves to be somewhere. Deserve is an interesting question and as I have learned about humans since leaving graduate school, humans really care a lot about fairness (even if we don't put it in our economic models). I do generate a "deserve" metric that looks at who you beat called "Strength of Record." Since I want to predict who would win tomorrow, I care less about data from the beginning of the season as the season goes on. There is a time decay built into the model that weights more recent performance more than early performance. Doing this makes the model more accurate at prediction; which is the ultimate goal. It allows the model to capture teams that surge late, like 2016 USC, without punishing them too much for playing badly early. It also allows the model to make an better attempt to correct for injuries.
It's not perfect. I don't control for injuries, because trying to do so would be take all of my time and wildly saturate the model. I also don't have a ton of data early in the season so I run a separate preseason predictive model that tries to establish some foundational factors for a program and predict how they should do in the coming year.
prediction=B0+B1(recruiting)+B2(prior performance)+B3(returning production)+e
Its more complex than that by a good margin, I have made some changes to increase model fit and I would like a few more seasons worth of data to train it on, but it is pretty good and gets most of the teams +-10 of where they end up in the final ranking. This model is weighted in a decay through week 6 with the actual data. As more data comes in I decrease the weight of the preseason model till I eventually phase it out.
There is also the question of grouped data; also called conference play. Any topological mapping in a metric space of the college football season will make this problem pretty apparent. All college football models are better at predicting conference than they are out of conference by the end of the season, but the level of accuracy isn't so different as to prevent us from doing a full rank order of CFB teams. Beta_Rank was only 3% more accurate on the final conference games of the season than bowl games.
I do think there are ways the model could improve, but most of those have to do with data collection and labeling. I am always open to suggestions, unless your suggestion is juicing prediction with a random selector from a distribution and setting the number of draws below convergence to the distribution, because that is just crap math.
You can listen to me talk about Beta_Rank on some podcasts:
Here with Bryant on 12 Pac Radio.
Here with Parker Fleming, @statsowar, on the Frogs-O-War pod.
If you have any questions hit me at @beta_rank_fb