[Coming Next: Pre-Season NCAA Tournament Seed and At Large Selection Candidates]
In the preceding post, I showed how I assign pre-season strength ratings and ranks to teams. In this post, I show, if the assigned strength ratings and ranks are correct, an approximation of what teams’ end-of-season ranks will be given their schedules.
Here is the process I use to generate the end-of-season ranks:
1. After downloading all the team schedules for the coming year, for each game, I calculate the pre-season strength rating difference between the teams. I then adjust the rating difference to account for home field advantage. (In neutral site games, there is no game location adjustment.) In rating terms, based on games played since 2010, home field advantage on average is worth 0.0145. So if the better rated team is the home team, I increase the rating difference between the teams by 0.0145 and if the better rated team is the away team I decrease the difference by 0.0145.
2. Using the location-adjusted rating difference for a game, I then determine a predicted outcome for that game. To do this, I use a table that shows, for each rating difference level (to four decimal places), the likelihood of the better rated team winning, tieing, or losing the game. The table is based on the location-adjusted rating differences and results for all games played since 2010.
In predicting the outcomes, if the win likelihood of a team is 50% or greater, I predict a win for that team and a loss for its opponent. If the win likelihood of the better rated team is less than 50%, then I predict a tie, even though one team is more likely to win than the other. I do this because if the win likelihood of the better rated team is less than 50% and I predict a win by the better rated team, I am more likely than not to be wrong: The result is more likely to be a tie or a loss than a win. Of course, predicting a tie also is more likely than not to be wrong, since the result is more likely to be a loss or a win. I have chosen to predict a tie because although more likely than not to be wrong, it will be closer to the right result than if I had predicted a win and and the result was a loss. One side effect of doing this is that the system predicts more ties than actually are likely to occur -- for the upcoming season it predicts 28% of games as ties whereas the historic actual number of ties is 21%.
There is another side effect, due to my assuming that a team with a 50% or more win likelinood will win the game: It overstates their wins or their losses. As an example, suppose a team has a 75% win probability in each of four games. My system says they will win all 4 games. From a statistical perspective, however, one would expect them to win 3 games and lose 1. Unfortunately, at this point my system design does not recognize that.
3. For conference tournaments, based on the in-conference predicted game results I determine conference standings and set the conference tournament brackets. For conference tournament games that are ties, the team with the better location-adjusted rating is the winner. This continues through each round of the conference tournament.
4. With all of the game results for the season, I then calculate simulated team end-of-season ratings and ranks, for both the current NCAA RPI and my Balanced RPI.
5. Since the simulated end-of-season ratings and ranks are based on every game result being consistent with teams’ assigned pre-season strength ratings and ranks, one might think that the end-of-season ranks should match the pre-season strength ranks. They do not. Here is why:
a. This year, including conference tournaments, the average number of games per team will be 18.4. This is slightly fewer than the 18.7 average since 2013 (excluding Covid-affected 2020).
b. For a mathematical rating system for sport teams to be truly reliable, the teams need to play about 25 to 30 games. In general, as the number of games increases the system is more reliable and as the number decreases it is less reliable. The NCAA RPI staff publicly recognized years ago that you have to have enough games for a rating system to be reliable:
"Sports like softball and baseball actually play the most games and it could be argued that they [their RPI ratings] are the most accurate because the sample is larger. Soccer falls somewhere in the middle of the RPI sports in terms of number of games. A football RPI would be very difficult to use since each game would have such an enormous impact on a team’s rating. In soccer, Division I teams play at least 20 games, and many play at least 25."
6. After going through these steps, here are (a) teams
The first table puts the teams in Rank Order and the second in Alphabetical Order:
Rank Order:
Alphabetical Order:
No comments:
Post a Comment