Introduction
I will post reports over the course of the season related to team RPI ratings and ranks and NCAA Tournament prospects. For the first part of the season, I will base the reports on simulated team ratings and ranks as applied to the full season schedule in order to produce (1) simulated game results for the entire season and (2) accompanying simulated NCAA Tournament at large selections and seeds. Each week as teams play games, I will replace simulated game results with actual game results. When team actual RPI ratings become reliable enough for the NCAA to start publishing RPI ranks, I will switch over to using team actual RPI ratings as the basis for simulating results of games not yet played. Also, at or around that time, I will publish additional information showing which teams are possible contenders for NCAA Tournament at large postions and seeds even though my current simulations show them as not getting at large selections or seeds.
In this quite long post, I will provide detailed background information to explain how I develop the simulation, how to use the information in the upcoming reports, and the information's limitations. If you have trouble understanding what I have posted, have questions about it, or have suggestions about it, you can email me at cpthomas@q.com. Especially for coaches who monitor my simulations as we go through the season as a help in evaluating their teams’ prospects, I strongly recommend reviewing what I have written below.
The specific topics I will cover below are:
Simulated Ratings and Ranks
Simulated Game Results
Simulated NCAA Tournament At Large Selections and Seeds
Data, by Team
What If My Top 50 Results Are Different Than the Simulation Calls For?
Refined Likelihood of Getting an At Large Selection
Simulated Ratings and Ranks
My data base covers all regular season and conference tournament games played since 2007. Since I focus on the information available to the Women’s Soccer Committee when it makes its NCAA Tournament decisions on at large selections and seeds, in other words available to the Committee before the Tournament, my data base does not include NCAA Tournament games.
In addition, although I have the data for the 2020-21 season (that ended in Spring 2021), I do not include it in the data base I use for simulated ratings and ranks and for simulated NCAA Tournament at large selections and seeds. I exclude it because I consider it unreliable for those purposes due to teams as a whole not having played enough games and not having played enough non-conference and out-of-region games. (For more on this, see my earlier posts about the 2020-21 season.) This means that if a team, relative to its past trends, did unusually well (e.g., Rice) or poorly (e.g., Stanford) during the 2020-21 season, it will not show up in my simulated ratings and ranks and simulated NCAA Tournament bracket. This is unfortunate, but there is no practical way around it. As I replace simulated game results with actual results and then shift over to using actual RPIs as the basis for simulating results of games not yet played, this problem gradually will disappear. I also will explain some things that a coach with a team in this situation can do, during the season, to get a picture of where his or her team will stand if it has results more similar to the team’s 2020-21 performance.
To simulate team 2021 ratings and ranks, I look at team rating and rank trends over the period from 2007 through 2019. I use Kenneth Massey ranking trends rather than RPI ranking trends, as his rankings do a better job of ranking teams from stronger and weaker conferences within a single ranking system and otherwise are at least as good as the RPI. I use computer generated trend lines to see what a team rank will be next year, if its ranking trend continues. For each team, I consider a straight-line trend and a polynomial order 2 trend. A straight-line trend best represents a team that is going in a steady direction: getting better, getting poorer, or staying the same, all at about the same rate over time. A polynomial order 2 trend best represents a team that seems to have a baseline position where it usually resides, has had a period where it has performed better or poorer than the baseline, and now appears to be returning to the baseline baseline. In addition, where a head coach has not been with a team since 2007 but has been there at least 4 years, I consider the same two trends but only for the period from 1 year before the coach arrived to 2019. Once I have the trend lines, I have rules I follow to select a simulated rank for the team:
I must assign a rank based on either the straight-line trend or the polynomial trend, or I can assign the same rank the team had in the most recent data base year, which for this year is 2019. If the 2019 Massey Rank is from 1 to 30, I determine the straight line trended rank for the current year using a computer-generated straight line trend formula and the polynomial rank using a computer-generated polynomial trend formula. If either rank is within 5 positions of the 2019 Massey rank, then that will be my simulated Massey rank for the team. If both are within 5, then the one closest to the 2019 Massey rank will be the simulated Massey rank. If neither is within 5, then the 2019 rank will be the simulated Massey rank. This process is the same for teams with 2019 Massey ranks from 31 to 60, except that the rank difference I use is 10 rather than 5. For teams from 61 to 100 the rank difference is 15. For teams over 100, the rank difference is 20.
For teams in the coach-arrived-after-2007 group, I have additional straight line and polynomial trends to consider and use these trends if they produce simulated ranks closer to the 2019 ranks than the full-period trends, again subject to the 5, 10, 15, and 20 position rules of the preceding paragraph.
Once I have all teams’ simulated 2021 Massey ranks, I must come up with all teams’ simulated RPI ranks. To do this, I put the teams in the order of their simulated Massey Ranks from best to poorest and assign RPI ranks from #1 to #342 accordingly. I then assign each team, as its simulated 2021 RPI rating, the average rating since 2007 of teams that have had that team's simulated 2021 RPI rank. (For teams new to Division I in 2020 and 2021, I have other rules I follow to assign them ranks and ratings.)
I follow the above process with no exceptions.
Simulated Game Results
To simulate the result of a game, I start with the two teams' simulated RPI ratings and calculate the difference between them. I then adjust this difference if one of the teams has home field advantage. Home field, on average across all Division I teams, is worth 0.0148 in relation to the difference between two opponents' RPI ratings.
Once I have computed a game's location-adjusted RPI difference, if it is 0.0133 or less, then I assign the game a simulated game result of a tie. I do this because at that rating difference level, each team's likelihood of winning the game is only 50% or less. If the location-adjusted rating difference is greater than 0.0133, then one of the teams has a greater than 50% chance of winning and I assign that team a simulated game result of a win and the other team a loss.
There are limitations to this way of assigning simulated game results. One limitation is that in real life, there will be fewer ties than the simulation produces. The other limitation is that in real life, one does not expect a team to win all games in which its likelihood of winning is greater than 50%. For example, suppose a team plays 10 games with a location-adjusted rating difference in each game that gives it a 70% win likelihood. In real life, one would not expect the team to win all 10 games, but rather would expect it to win only 7. Nevertheless, the simulation has it winning all 10 games, since this is the only way to do the simulation in a manner that produces simulated end-of-season RPI ratings and ranks.
Simulated NCAA Tournament At Large Selections and Seeds
I have a complex program I use at the end of the season to generate likely Committee at large selections and seeds based on the Committee's decisions since 2007. For purposes of simulating the selections and seeds over the course of the season, however, this year I am going to use different method that I believe will be more useful for coaches. It also is simpler. The method focuses on four factors:
Team RPI Ratings
Team RPI Ranks
Team Top 50 Results Scores
Team Top 50 Results Ranks
I developed the Top 50 Results Scores factor about 10 years ago based on my observation that in making at large selections, the Committee seemed to favor teams that had good results -- wins or ties -- against highly ranked opponents, with the Committee decisions heavily slanted towards very good results. The scoring system I developed is in the following table:
Simulated Record: Wins, Losses, Ties
Simulated RPI Element 1: Winning record (Wins + 0.5 x Ties)/(Wins + Losses + Ties)
Simulated RPI Element 2: Average of Opponents' Winning Percentages
Simulated RPI Element 3: Average of Opponents' Opponents' Winning Percentages
Simulated Adjusted RPI: (Element 1 + 2 x Element 2 + Element 3)/4, adjusted with NCAA bonuses and penalties for certain good or poor non-conference results
Simulated Adjusted RPI Rank
Simulated Total Top 50 Results Score
Simulated Total Top 50 Results Rank
Simulated paired RPI Rank and Top 50 Results Rank factor score
Simulated paired RPI Rank and Top 50 Results Rank factor rank
Simulated paired RPI Rank and Top 50 Results Score factor score
The next to last of these is the paired factor I use to simulate At Large Selections.
What If My Top 50 Results Are Different Than the Simulation Calls For?
If a coach wants to see how his or her team's NCAA Tournament prospects will change if the team has different Top 50 Results than the simulation calls for, there is a way for the coach to do it. This will involve using the paired RPI Rank and Top 50 Results Score factor score, which is the last item on the preceding list.
Here are instructions for how to do this:
1. Determine the Proposed Top 50 Results Changes.
a. With the team schedule in hand, use the simulated RPI numbers to see what game result the simulation has called for in each game on the schedule. This will produce the overall win-loss-tie record indicated by the weekly report.
b. Decide what different results against Top 50 opponents you want to try out.
2. Determine What Your Revised RPI and RPI Rank Will Be.
a. Based on the different Top 50 results you decided on, determine what your resulting win-loss-tie record will be. With this revised record, calculate your revised RPI Element 1. The formula for this calculation is under Information for Teams, above.
b. Determine what your revised RPI will be. To do this, use your revised RPI Element 1 and your unchanged Elements 2 and 3 (which will not change significantly even though your game results have changed). The formula for this calculation likewise is under Information for Teams, above. Don't worry about the adjusted RPI bonuses and penalties, they are not significant enough to make a difference for purposes of this process.
c. Look to see where your revised RPI will put you, in the overall RPI Rank list.
3. Determine What Your Revised Top 50 Results Score and Rank Will Be.
a. Use the above scoring system table to see what your team's revised Top 50 Results Score will be.
b. Look to see where your revised Top 50 Results Score will put you, in the overall Top 50 Results Rank list.
4. Calculate Your Revised Paired RPI Rank and Top 50 Results Rank Factor Score.
a. The formula for this is RPI Rank + (1.0261 x Top 50 Results Rank).
5. See Where Your New Revised Paired RPI Rank and Top 50 Results Rank Factor Score Will Put You, in the Overall List for That Paired Factor Score.
a. If you are among the Top 33 that are not Automatic Qualifiers, this means you are likely to get an At Large Selection.
Refined Likelihood of Getting an At Large Selection
It also is possible to do a refined calculation of a team's likelihood of getting an at large selection:
1. From a weekly report, find the team's paired RPI Rank and Top 50 Results Score factor score.
2. With that paired factor score, use the following table to determine the team's approximate likelihood of getting an at large selection:
No comments:
Post a Comment