Friday, August 26, 2016

2016 Season Simulation: The Problem of Upsets

My 2016 season simulation assumes that all teams will perform as their ratings say they will.  There are no upsets.  Because of this, the simulation can help give insight into the role of upsets in a real season.

According to the simulation, there will be 7 teams with 0 losses and 11 teams with 1 loss.  The 7 teams with 0 losses are:


The 11 teams with 1 loss are:


In the real world, it isn't going to happen that way.  Over the last nine years, there has been a total of 9 teams that have been undefeated -- an average of 1 per year.  There has been a total of 28 teams with only 1 loss -- an average of 3.1 per year.  Indeed, over the first week of the current season, of the 0-loss teams in the simulation, Dayton and Southern California already have suffered losses.  Indeed, Dayton has suffered two.  Likewise, of the 1-loss teams, Idaho, Monmouth, Sacramento State, and Wisconsin already have lost games the simulation projected them to win.

The simple message:  There are plenty of games in which teams don't perform in accord with their ratings.  There are upsets, and almost all teams are likely to be upset, or pull off upsets, some time or times during the season.  Yes, it's true that my simulation is based on a lot of assumptions and isn't perfect.  But, there will be a significant number of upsets for any rating system.  Even for ratings based not on pre-season information, but rather on the actual game results during the season, almost all teams will have been upset, or have pulled off an upset, in relation to their ratings one or more times during the course of the season. Indeed, using the NCAA's current ARPI rating system, over the last 9 years, looking even at the most one-sided 5% of games according to the ratings, the higher rated team has tied in 9 games and lost 1.  It doesn't matter how good you are and how poor your opponent is, there always is the possibility of an upset.

So what does this mean about what you are seeing when you are looking at ratings.  I'll use Santa Clara this year as an example.  The simulation says that Santa Clara ends up as the #60 team.  Yet the first weekend of the season, they beat Southern California (#7) and California (#23).  Following the weekend, I created an updated simulation where I used the actual results of games played the first weekend and the simulated results for the balance of the season.  In the updated simulation, Santa Clara moved up to #34.  So, here's the question:  Does Santa Clara's updated simulated rank mean the original rank was wrong?  Or does it mean the original rank was fine, but they simply pulled off some of the inevitable upsets that will occur over the course of the season?  Or is it a combination of the two?  The honest answer is: We'll never know for sure.

Ratings produced by mathematical systems, whether they're of the predictive type (telling how teams will do in future games) or of the retrodictive type (telling how teams have done in past games), or a combination of the two, are inherently ambiguous.  This is because it's impossible, except in the most extreme cases, to know whether a game result that is contrary to the ratings is due to (1) the ratings being wrong or (2) the game simply being among the inevitable upsets.

The NCAA uses the Adjusted RPI is one of the main factors for making at large selections and seeding teams in the NCAA Tournament.  The NCAA is clear that the ARPI is a measure (however imperfect) of teams' performances over the course of the season.  Teams that pulled off true upsets get credit for those wins as though the games were not upsets; and teams that are upset get dinged for those losses as though the games were not upsets.  Thus the ARPI does not tell, nor purport to tell, who the best teams are.  It can't do that, because it can't tell the difference between an incorrect rating and an upset.

I don't want, however, to single out the ARPI.  The same is true for any mathematical system.  When a result is contrary to the ratings, the system can't tell whether (1) the cause of the contrariness is incorrect ratings caused by the nature of the system or (2) the game simply is an upset.

In sports where teams play a great number of games in a season (such as major league baseball), a team's record, and its rank based on its record, is a pretty good indicator of how strong the team is.  It's played enough games for upsets, in both directions, to have evened out.  On the other hand, for Division 1 women's soccer, where teams play an average of between 19 and 20 games per season (regular season and conference tournaments), the problem of upsets is significant.  There, in particular, ratings produced by mathematical systems include an inherent level of ambiguity.

No comments:

Post a Comment