Tuesday, February 23, 2021

NCAA TOURNAMENT: KEEPING THE RPI HONEST - PART 2

 In the preceding post, I described two tests I created for evaluating whether the RPI will be usable for the 2020-21 NCAA Tournament at large selections.  In this post, I will show how those tests apply to simulated end-of-season RPI rankings based on the full season schedule as of February 19.

I will not go into a full discussion here of how I do simulated rankings, but here is a brief outline of how I do it:

1.  For each team, I do a statistical analysis of its rank trend over the time since 2007 and also since a year before the coach arrived, if the coach arrived after 2007.  Based on this, I assign the team a simulated rank and rating for this year.

 2.  Using the full season calendar for this year, for each game I use the opponents’ simulated ratings, as adjusted for home field advantage, to determine a simulated game result of win-loss, or tie.  When I do this, if a team’s location-adjusted rating advantage over its opponent is big enough that its win likelihood statistically is over 50 percent, I treat the better rated team as winning.  (This is different than real life, where a team that statistically should win sometimes ties and sometimes loses.)

3.  After determining all of the simulated game results, I apply the RPI formula to the results, to calculate simulated RPI ratings and ranks for all teams.

For this year, here are the Top 60 teams in the simulated rankings I developed in Step 1 above.  In a normal year, I do not expect the final actual rankings to match these.  In most cases, team simulated rankings will be in the rough vicinity of the final actual rankings.  There will be some teams, however, that will have final actual rankings significantly different than their simulated rankings.


Whereas some individual teams ordinarily will have final actual rankings significantly different than these simulated rankings, with conferences the differences should be smaller and with larger groups they should be even smaller.  Thus I expect the number of teams a conference has in the Top 60 in the final actual rankings to be reasonably similar to the number of teams the conference has in my simulated Top 60.  And, I expect the number of Top 60 teams in the preceding article’s highlighted and not-highlighted conference groups to be quite similar for the final actual rankings and my simulated rankings.

To test whether my expectation is right regarding conferences and the highlighted and not-highlighted groups, I compared the final actual 2019 RPI rankings to my pre-season simulated 2019 RPI rankings.  The following table shows the result:


This table shows that for the numbers of teams individual conferences had in the Top 60 and Top 30, there was some variation between actual end of season RPI rankings and my pre-season simulated rankings. On the other hand, for the highlighted and not-highlighted groups, the numbers of teams the groups had in the final real Top 60 and Top 30 are almost identical.  Thus so far as the highlighted and not-highlighted groups are concerned, my pre-season full season simulation is a good predictor of how many teams the groups will have in the Top 60 and Top 30 in the final actual rankings.

This means it is fair to use my pre-season simulation for 2020-21 as a reasonable indicator of how many teams the highlighted and not-highlighted groups are likely to have in the Top 60 and Top 30 of the actual final rankings.

Here are two tables.  The first table shows my 2020 pre-season simulation Top 60, after going through the three simulation process steps described near the top of this article.  The second table shows what this means in terms of conferences and the highlighted and not-highlighted groups, comparing the simulated 2020 numbers to the average numbers since 2013.



Applying my two tests from the preceding article to the highlighted and not-highlighted group numbers at the bottom of the table just above:

Top 60 Test:  The RPI Top 60 should include roughly 49 teams from the highlighted conferences and 11 teams from the not-highlighted conferences.  The actual numbers can range on either side of these test numbers, but 45 teams should be the minimum from the highlighted group and 15 the maximum from the not-highlighted group.

Rather than the average 49-11 split, and the historically most extreme 45-15 split, between the highlighted and not-highlighted conferences called for by the test, the split is 29-31.  

Top 30 Test:  The RPI Top 30 should include roughly 28 teams from the highlighted conferences and 2 teams from the not-highlighted conferences.  The actual numbers can range on either side of these test numbers, but 27 teams should be the minimum from the highlighted group and 3 the maximum from the not-highlighted group.

Rather than the average 28-2 split, and the historically most extreme 27-3 split, between the highlighted and not-highlighted conferences called for by the test, the split is 17-13.  

You Be the Judge:  These numbers, of course, are only an early season indicator of what the Women’s Soccer Committee will be seeing.  What the Committee actually will see will start coming into view next week when the NCAA starts releasing its weekly RPI reports.  I will apply my two tests to each weekly report and will provide the results here, to create a running picture on the viability of the RPI as a selection tool.

Given my simulation numbers for the highlighted and not-highlighted groups, however, is it likely the RPI will be usable as an at large selection tool for the 2020-21 NCAA Tournament?  This is what the Committee will have to decide.

No comments:

Post a Comment