How Voxitatis derives the predictive statistics for IL varsity football games (Δ & α)

Thursday, August 2, 2012

For several years, Voxitatis has been reporting some statistics with Greek letters for every varsity high school football game in Illinois. This post explains how these statistics are calculated and their empirical reliability in predicting the outcomes of football games at the high school level.

First, it must be pointed out that these statistics are designed to predict the winner of a hypothetical game between two teams. In order to do that, we had to devise some systematic way of estimating how strong a team is. Then, we needed a formula for calculating team strength that we could reliably measure by collecting valid, up-to-date data about the teams in Illinois.

We therefore decided to use points scored and points allowed. This number is available reliably from the Illinois High School Association for every varsity football game played by Illinois high schools. The logic goes that the more points a team can score and the fewer points it allows to be scored against it, the stronger that team is. A team that scores more points and gives up fewer points, on average, than a hypothetical opponent will be predicted to win a game between the two teams by our statistics.

We first tested to see whether or not the average points scored by a team was a reliable predictor of that team’s ability to win a game against an opponent who scored fewer points, on average. We assigned the variable a to the average points scored per game and ran a test with data from 2007. We got this:

Plot showing point difference in actual games vs. Voxitatis a statistic

This plot shows the line of best fit and a low correlation coefficient of 0.545. We had to do better. So, we tested the difference in actual points scored in games against the average number of points given up, which we assigned the variable c. That graph is shown here:

Plot showing point difference in actual games vs. Voxitatis c statistic

This one was even worse, with a correlation coefficient of about -0.53. Not really a random distribution, but we needed to do better. So, we figured that teams that play weak teams tend to score more points. And we corrected the a statistic by subtracting the average number of points given up by the team’s opponents in the season, which we assigned the variable b. Next, we corrected the number of points allowed by subtracting the average number of points scored by that team’s opponents (variable d).

Now, we have a new statistic, which we called mu, a Greek letter, using the formula

mu = (a – b) – (c – d)

And we call the statistic delta (Δ), which you see on the screen. It is defined as the difference between the two teams’ mu values. The team with the higher mu value would be expected to win the game to be played between those two teams, producing a positive value for Δ when we subtract its opponent’s mu value from its own mu value. Likewise, the team with a negative value for Δ would be expected to lose the game.

Early in the 2008 and 2009 seasons, we tested Δ again and found that it was a decent predictor—80% right when the absolute value was greater than about 25. The higher the absolute value of Δ, the more likely it was to be a reliable predictor of the winner. However, as the season went on, we found that a team’s ability to win games depended much more on their own averages than on their opponents’ averages. In other words, a regression analysis showed that the correction factors b and d were not figuring into the predictor as much as the values for a and c.

Therefore, we calculated a new statistic called alpha (α). This is based on recomputed values for mu, which we now labeled nu:

nu = (a – 0.5 b) – (c – 0.25 d)

We named the difference between the two teams’ nu values as alpha (α) and ran our tests early in the 2010 season. As you see below, α was a slightly better predictor of the winner than Δ, even early in the season.

To see data from late in the season, examine the spreadsheet below:

As shown in the double bar graph and spreadsheet, α tends to work better, at least a few years ago. As expected, later in the season, as more data is available about each team’s ability to score points and hold their opponents to fewer points, the statistics are better able to predict a winner. If the absolute value of the statistics are high late in the season, it almost never happens that they’re wrong, in fact. We continue to evaluate the reliability of our statistics and will issue a more comprehensive report at the end of the 2012 football season.

Finally, as we reported in 2007 on our initial report, outliers tend to skew results, as in the case where a game finishes with a score of 77-6. Strong teams may stop running the score up after only, say, 45 points. Using only points in our calculations, the team that runs the score up against a very weak opponent appears stronger than the team who stops after a sufficient margin of victory is attained, even though the latter may actually be much stronger than the former. That’s the nature of football, and it is not possible to account for these coaching decisions or other “human factors” in our statistics, given that we don’t measure what each team is “capable of,” simply what they actually did on the field.

Also, we have noticed over the years that when very strong teams play each other or very weak teams play each other, the margin of victory is highly uncertain, almost “too close to call.” No matter how much we may pretend that our statistics predict the outcome of games, the score of a game between equally matched teams often comes down to a simple lucky bounce of the football or an intangible quality of the game. Our analysis, we hope, does nothing to reduce the significance of the fundamentals of football, which are shown on the field only, and not on a stat sheet.