Friday, October 9, 2020

THE 2020-2021 SEASON: RPI RATING AND RANKING ISSUES

 This will be the first of three posts.  In this one, I will summarize publicly available information about the 2020-21 season structure.  I also will describe issues that will make it impossible to have reliable RPI ratings and ranks based on the planned Fall schedules. And, I will give some thoughts on how we might end up having at least somewhat usable ratings and ranks come the Spring.

In my second post, I will compare three sets of pre-season rankings for the conferences that are playing conference schedules in the Fall.  The rankings are those of the coaches of each conference, Chris Henderson’s, and mine.

In my third post, I will describe some new work I have done on the patterns the Womens Soccer Committee has followed over the last 13 years in its at large selections and seeding for the NCAA tournament.

Current Information About the 20-21 Season.  Of the 31 conferences, 4 are playing Fall conference schedules:  the ACC, Big 12, SEC, and Sun Belt.  Of the other conferences, none are playing in the Fall except that a few will have some non-conference games:  CUSA, Missouri Valley, Patriot, Southern, and Southland.  Sixty teams are playing at least 1 Fall game, with 281 not playing in the Fall.

Two conferences, the Southland and Southwestern have announced their conference formats for Spring soccer (and SWAC has announced the dates of all its conference games).  The Southland definitely is allowing non-conference games.  The Southwestern has not made a statement about non-conference games, but its teams will have some dates on which they will be able to play non-conference games if allowed.

There presently are 243 Fall games scheduled, not counting conference tournaments and also not counting 8 postponed games likely to be re-scheduled.  This compares to about 3,000 games during a normal season.

For the teams that are playing at least some games in the Fall, they have scheduled an average of 9.4 games each (including conference tournaments):  7.3 conference and 1.1 non-conference.  That comes out to 88.3% conference and 11.7% non-conference.  With one exception (Pittsburgh), the ACC, Big 12, and SEC are playing conference only, so the bulk of the non-conference games involve the Sun Belt or other conferences that are playing a few Fall non-conference games.

Over the last 7 years (since the last major conference realignment), teams played an average of 18.2 games per year.  Of these, 56% were conference games and 44% non-conference.

For the Spring, according to the schedule adopted by the Division I Board of Directors, the regular season competition season will be February 3 to April 17.  The Committee will make its NCAA Tournament at large selections and seeds on Sunday, April 18.  (Ending the season on Saturday with the Committee making its decisions on Sunday, which appears to be what has been decided, is a little different than normal, where the season ends on a Sunday and the NCAA announces the bracket on Monday.) The bracket will consist of 31 conference automatic qualifiers plus 17 at large selections for a total of 48 teams.  It seems reasonable to assume that if some conferences do not play in the Spring, then the tournament bracket still will be 48 teams and the number of at large teams will increase.  Games teams play in the Fall will count as part of their season schedules for purposes of NCAA tournament bracket formation.  The NCAA tournament will run from Saturday, April 24 to Sunday, May 16.  The exact overall scheduling and game locations for the tournament are yet to be announced, but the NCAA Board of Governors has directed that all tournament game sites will be pre-determined and the number of sites will be reduced, for health and safety and operational management purposes.

Ratings and Ranks Reliability.  So far as RPI ratings and ranks are concerned, they will not be meaningful during the Fall.

As a general rule, the more games teams play the more reliable are mathematical rating systems like the RPI.  For most rating systems to function well, teams must play between 25 and 30 games.  As the numbers of games decrease, the ratings’ reliability decreases.

The NCAA has not published information on the minimum number of games it believes teams need to play in order for the RPI to have an acceptable level of reliability.  Regarding football, however, it has said that "a [Football Championship Subdivision] RPI would be very difficult to use" because of the small number of games FCS teams play.  This is why the NCAA does not use the RPI in selecting at large teams for the FCS NCAA Tournament.  FCS teams play up to 12 pre-NCAA Tournament games per year.  Given this, we know the NCAA believes 12 games per year is not enough for the RPI to be reliable.

To determine how many games teams need to play for the RPI to be reliable, I used the 2019 season as a case study.  In 2019, teams played an average of 18.74 games.  I started with teams’ actual end of season RPI ranks.  I then deleted all games from the first weekend of the season, which reduced the number of games teams played to 17.04 games.  All of the deleted games were non-conference.  I then recalculated the RPI and determined teams’ revised ranks.  After that, I repeated the process for successive weeks, for each week determining the new number of games per team and teams’ revised RPI ranks.  At the bottom of this post is a table that shows the results of this project.  The table shows the Top 250 teams, in order, with their actual end-of-season RPI ranks and their ranks after each successive deletion of games.  It also shows the teams’ NCAA tournament seeds and which unseeded teams got at large selections.  Using this table, I looked to see the number of games at which the RPI no longer was able to produce rankings that were reasonably close to teams’ actual end-of-season ranks.  I highlighted in green team RPI ranks that seemed unreasonably poor compared to their actual final ranks and in orange team ranks that seemed unreasonably good.  My conclusion from the table is that even slightly over 15 games per team produces too many unreasonable rankings for the RPI to be usable.  (In considering the table, it is important to know that historically, teams ranked 58 and poorer have not gotten at large selections; and the poorest rank getting a #1 seed has been 8, #2 has been 14, #3 has been 23, and #4 has been 26.)

Thus, for Fall 2020’s games alone, the planned 9.4 games per team is not enough for the RPI to be usable.

In addition, as stated above, historically 54.0% of games have been in-conference and 46.0% non-conference.  As shown in detail at the RPI: Non-Conference RPI page of the RPI for Division I Women’s Soccer website, even with this percentage of non-conference games, the RPI has trouble ranking teams from different conferences properly in relation to each other: It tends to rate teams from stronger conferences too poorly and teams from weaker conferences too well. When teams significantly reduce their numbers of non-conference games, this problem gets worse.  You can see this in the table at the bottom of this post, where as I deleted successive numbers of non-conference games, the rankings of teams from the strongest conferences tend to get poorer and the rankings of teams from other conferences tend to get better.  You can see this problem from a different perspective in the following table based on the 2019 season:


In this table, the second and third columns from the left show the 2019 actual conference average ARPI ratings and the resulting conference ranks.  The next column to look at is the fifth from the left, which shows the conference average ARPI ratings derived only from the conference regular season competition.  As you can see, all conferences have an identical 0.5000 rating except for the ACC at 0.4999.  The only reason for the ACC difference is that Syracuse v Virginia got canceled so that all teams did not play the same number of games.  So long as all teams play the same number of games in conference regular season competition, conferences always will have 0.5000 ratings if the ratings are based on only their conference competition.  In other words, based only on conference regular season competion, the RPI cannot distinguish among the conferences.  If you look at the fourth column from the left, you will see that conference tournaments slightly disrupt this equality.  This, however, has nothing to do with how strong the conferences are, it only has to do with the structure of their conference tournaments.  Finally, if you look at the sixth and seventh columns from the left, you will see the RPI ratings and ranks of the conferences derived only from their non-conference games.  The eighth column shows the difference between these ranks and the actual RPI ranks.  In most cases, and particularly for the stronger conferences, these differences are small.  Thus in-conference games pull conference and conference team ratings to 0.5000, whereas non-conference games pull conference and conference team ratings to the level of their true strength among the entire body of Division I teams.  Because of this, any reduction in the proportion of non-conference games will result in teams from stronger conferences being underrated and teams from other conferences being overrated.  The table at the bottom of this post shows this effect.

In addition, historically 81.9% of games are played within geographic regional groups and 18.1% are non-regional.  As shown on the RPI: Regional Issues page of the RPI website, this is not a big enough proportion of non-regional games for the RPI to be able to properly rate and rank teams from one regional group in relation to teams from other regional groups.  Thus if teams move to more region-based non-conference schedules to reduce travel, it will cause the RPI to discriminate against some teams from some geographic regions -- most notably the West -- and in favor of others even more than it already does.

Usable Ratings and Ranks in the Spring.  As I look at the Spring season, I see some possibilities for scheduling that will make RPI ratings and ranks at least somewhat usable:

1.  The ACC, Big 12, SEC, and Sun Belt are playing their conference schedules in the Fall, so they can play non-conference schedules in the Spring.  This will be especially critical for the ACC, Big 12, and SEC to do since they are playing almost entirely conference-only Fall schedules.

2.  The Spring game playing season is 11 weeks.  This will give conferences and teams not playing in the Fall the opportunity to play close to full conference and non-conference schedules.

Thus it is possible, by the end of the Spring game playing season, that teams will have played close to their normal schedules.  If that happens, then the RPI will work more or less as it normally does.

I see two issues, however, that may cause problems.  One will be if teams significantly reduce the numbers of games they play, in particular reducing their proportions of non-conference games.  The second will be if teams shift to scheduling more regionally close non-conference opponents, significantly reducing their proportions of non-regional games.

NCAA Tournament Bracket Formation.  If either of these problems occurs, the Womens Soccer Committee will need to be aware of it and do the best it can, with the available data, to compensate for the RPI’s weaknesses.  If either occurs to the extreme, it will create a big problem for the Committee.  Since I believe there will be the opportunity for significant numbers of non-conference games, my biggest concern is about the possibility that we will see almost entirely region-based non-conference scheduling.  If that occurs, then the Committee will need to recognize that although the RPI may be good for rating and ranking teams within a geographic region in relation to each other, it will have little value when it comes to rating and ranking teams from different regions in relation to each other.  In particular, it will tend to underrate teams from the West.  I am not sure whether or how the Committee will be able to deal with this problem.



No comments:

Post a Comment