Tuesday, February 25, 2025

2025 ARTICLE 2: GRADING THE NCAA RPI AS A RATING SYSTEM, 2025 UPDATE

INTRODUCTION

My "correlator" program evaluates rating systems for Division I women's soccer.  The correlator uses the combined games data from seasons beginning with 2010.  I update the correlator's evaluations each year by adding the just-completed year's data..  The correlator data base now includes just short of 44,000 games.

I have completed the evaluation updates for the NCAA RPI and for the Balanced RPI following the 2024 season.

For games data from 2010 and after, for both rating systems, I use game results as though the "no overtime" rule had been in effect the entire time. 

For 2010 and after, for both rating systems I use ratings computed as though the 2024 NCAA decision, to count ties as 1/3 rather than 1/2 of a win for RPI formula Winning Percentage purposes, had been in effect the entire time. 

For 2010 and after, for the NCAA RPI, I use ratings computed as though the 2024 NCAA decision altering the RPI bonus and penalty adjustment structure had been in effect the entire time.  The Balanced RPI does not have a bonus/penalty structure.

Thus the evaluations use the entire data base to show how well the current NCAA RPI formula performs, as compared to the Balanced RPI.

Why use the Balanced RPI for the comparison?  It uses the same data as the NCAA RPI; and the NCAA could implement it as a replacement for the NCAA RPI simply by adjusting and adding to its current NCAA RPI computation program.  Thus it is a realistic measuring stick against which to evaluate the NCAA RPI.  Typically but not always, the Balanced RPI's rankings are similar to Massey's.

DISCUSSION

Ability of Teams to "Trick" the System Through Smart Scheduling



In 2025 Article 1, I showed that the NCAA RPI has significant differences between a team's NCAA RPI rank and its rank within the NCAA RPI formula as a Strength of Schedule Contributor to its opponents' ratings.  The above table shows this for the NCAA RPI as compared to the Balanced RPI.  The first three columns with numbers are self-evident.  The five columns on the right show, for each rating system, the percent of teams for which the NCAA RPI rank versus the RPI formula's SoS Contributor rank difference is 5, 10, 15, 20, and 25 or fewer positions.

As you can see from the entire table, for the Balanced RPI, teams' full ranks are either identical or almost identical to their SoS Contributor ranks.  For the NCAA RPI, the differences are big.

For the NCAA RPI, beccause of these differences, teams can "trick" the ratings and ranks through smart scheduling.  The following table shows this:


This table is from the 2024 season, so it shows a "real life" example.  Although the first two color coded columns refer to the ARPI Rank "2015 BPs," their numbers actually are for the 2024 version of the NCAA RPI.  And the two color coded columns on the right are for the Balanced RPI.  I've included the Massey ranks so you can use them as a "credibility" check for the Balanced RPI ranks.

If I am a coach looking for a non-conference opponent with my NCAA Tournament prospects in mind, I might think that a game against any of the four teams would have an about equal effect on my likely result, my NCAA RPI rank, and my NCAA Tournament prospects.  I would be wrong:

It is true that the NCAA RPI ranks will be telling the Women's Soccer Committee that the four teams are about equal.

In terms of the teams' true strength, however, Liberty and James Madison are considerably weaker than the NCAA RPI says and California is significantly stronger.  Both the Balanced RPI and Massey indicate that the true order of strength of the teams is Liberty as the weakest, followed by James Madison, then Oklahoma, then California as the strongest.

In addition, looking at the NCAA RPI formula's ranks of the teams as SoS Contributors to their opponents' NCAA RPI ratings, the order of contributions is Liberty as the best contributor, followed by James Madison, then California, then Oklahoma. 

Thus Liberty is the best team as an opponent.  The NCAA ranks tell the Committee it is equal in strength to the other three teams.  In terms of actual strength (Massey and the Balanced RPI), however, it is the weakest of the teams by a good margin.  And it will make the best contribution under the NCAA RPI formula to its opponents' Strengths of Schedule by a good margin.  And, by a comparable analysis, James Madison is second best as an opponent.  Thus by scheduling Liberty or James Madison and avoiding Oklahoma and California I am able to "trick" the system and the Committee into thinking I'm better than I really am.

In grading the NCAA RPI as a rating system, this ability of teams to "trick" the system through smart non-conference scheduling is a big "fail."  And, as the Balanced RPI shows, it is an unnecessary fail.

Ability of the System to Rate Teams from a Conference Fairly in Relation to Teams from Other Conferences


This chart, based on the NCAA RPI, shows the relationship between conferences' ratings and how their teams perform in relation to their ratings.  The conferences are arranged from left to right in order of strength: The conference with the best average rating is on the left and with the poorest on the right.  For each conference, the correlator determines what its teams' combined winning percentage in non-conference games should be based on the rating differences (as adjusted for home field advantage) between the teams and their opponents and also determines what their actual winning percentage is.  The axis on the left shows the differences between these two numbers.  For example, the conferences with the best rating, on the left, wins roughly 5% more games than it should according to its ratings.  The black line is a trend line that shows the relationship between conferencce strength and conference performance relative to ratings.  The formula on the chart shows what the expected difference is at any point on the conference strength line.

As you can see, stronger conferences perform better than their ratings say they should and weaker conferences perform more poorly.  In other words, stronger conferences are underrated and weaker conferences are overrated.  If you have read 2025 Article1, this is exactly as expected.


This table comes from the data underlying the above chart and a comparable chart for the Balanced RPI (see the chart, below).  The first three columns with numbers are based on individual conferences' performance in relation to their ratings.  In the Conferences Actual Less Likely Winning Percentage, High column is the performance of the conference that most outperforms its rating: The NCAA RPI's winning percentage for this conference is 4.9% better than it should be according to its rating.  In the Conference Less Likely Winning Percentage, Low column is the conference that most underperforms its rating: for the NCAA RPI by -7.1%.  The Conference Actual Less Likely Winning Percentage, Spread column is the difference between these two numbers: for the NCAA RPI 12.0%.  This last number is one measure of how good or poor the rating system is at rating conferences' teams in relation to teams from other conferences.

The fourth column with numbers, Conferences Actual Less Likely Winning Percentage, Over and Under Total, is the total amount by which all conferences' teams perform better or poorer than what their performance should be based on their ratings: for the NCAA RPI 64.6%.  This is another measure of how good or poor a rating system is at rating conferences' teams.

The fifth through seventh columns are for discrimination in relation to conference strength and come from the trend line formula in the conferences chart.  The Conferences Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, High is what the trend line says is the "expected"performance for the strongest conference on the left of the chart: for the NCAA RPI 4.3% better than it should be according to its rating.  And the Conference Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Low is for the epected performance of the weakest conference on the right: for the NCAA RPI 5.6% poorer than it should be.  The Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Spread is the difference between the High and the Low; for the NCAA RPI 9.9%.  This is a measure of the NCAA RPI's discrimination in relation to conference strength.

If you compare the numbers for the Balanced RPI to those for the NCAA RPI, you can see that (1) the NCAA RPI performs significantly more poorly than the Balanced RPI at rating teams from a confernce in relation to teams from other conferences and (2) the NCAA RPI discriminates against stronger and in favor of weaker conferences whereas the Balanced RPI has virtually no discrimination in relation to conferencee strength.

The following chart confirms that the Balanced RPI does not discriminate in relation to conference strength:


Ability of the System to Rate Teams from a Geographic Region Fairly in Relation to Teams from Other Geographic Regions

I divide teams among four geographic regions based on where the majority or plurality of their opponents are located: Middle, North, South, and West.


This chart, for the NCAA RPI, is lilke the first "conferences" chart above, but is for the geographic regions.  The regions are in order of average NCAA RPI strength from the strongest on the left to the weakest on the right.  Although the trend line suggests that the NCAA RPI discriminates against  stronger and in favor of weaker regions, I do not find the chart particularly persuasive and the R squared number on the chart supports this.  The R squared number is a measure of how well the data match up with the trend line.  An R squared number of 1 is a perfect match and 0 is no match at all.  The R squared number on the chart of 0.4 is a relatively weak match.  Thus although the chart may indicate some discrimination against stronger and in favor of weaker regions, region strength may not be the main driver of the region performance differences.

Here is a second chart for regions, but rather than relating the regions' performance to their strength it relates performance to their levels of internal parity as measured by the proportion of intra-regional ties.



This chart suggests that the higher the proportion of a region's intra-regional games that are ties and, by logical extension, the higher the level of parity within the region, the more its teams' actual performance in games against teams from other regions exceeds their expected performance based on their ratings.  And, as you can see from the R squared value, this trend line is much more representative of the data than for the chart based on region strength.  What this suggests is that the NCAA RPI has a problem properly rating teams from a region in relation to teams from other regions when the regions have different levels of intra-region parity.  It discriminates against regions with high intra-region parity and in favor of regions with less parity.  If you consider the description in 2025 Article 1 of how the NCAA constructs the RPI, this is what one would expect: The NCAA RPI rewards teams that play opponents with good winning percentages, largely without reference to the strength of those opponents' opponents.  If a region has a low level of parity, there are many intra-region opponents to choose from that will have good winning percentages.  But if a region has a high level of parity, there are fewer opponents to choose from that will have good winning percentages.

For further confirmation that one would expect the NCAA RPI to underrate regions with higher levels of parity and overrate regions with lower levels, see the "Why Does the NCAA RPI Have a Regions Problem?" section of the RPI: Regional Issues page at the RPI for Division I Women's Soccer website.


This table for regions is like the left-hand side of the table above for conferences.  As you can see it shows that the NCAA RPI does a poor job of rating teams from a region in relation to teams from the other regions, when compared to the job the Balanced RPI does.


This second table is like the right-hand side of the conferences table.  The first three columns with numbers are for the trend in relation to the proportion of ties -- parity -- within the regions and the next three columns are for the trend in relation to region strength.  As you can see, in relation to both parity and strength, the NCAA RPI has significant discrimination as compared to the Balanced RPI.

Here are the charts for the Balanced RPI, which you can compare to the above charts for the NCAA RPI:





Ability of the System to Produce Ratings That Will Match Overall Game Results


This table is a look at the simple question: How often does the better rated team, after adjustment for home field advantage, win, tie, and lose?  As you can see, compared to the Balanced RPI, the NCAA RPI's better rated team wins 0.6% fewer times.  This is not a big difference, since an 0.1% difference represents about 3 games per year, so 0.6% represents 18 games per year out of about 3,000 games.  Nevertheless, the Balanced RPI performs better.


This is like the preceding table except that it covers only games that involve at least one team in the rating system's top 60.  Since the NCAA RPI and Balanced RPI have different Top 60 teams, their Top 60 teams have different numbers of ties.  This makes it preferable to compare the systems based on how their ratings match with results in games that are not ties.  As you can see, after discounting ties, the Balanced RPI is consistent with results 0.2% of the time more than the NCAA RPI.  Here too, this is not a big difference since an 0.1% difference represents 1 game per year out of about 1,000 games involving at least one Top 60 team.

What this shows is that the difference between how the NCAA RPI and Balanced RPI perform is not in how consistent their ratings are with game results overall.  Both have similar error rates, with the Balanced RPI performing slightly better.

The difference is in where the systems' ratings miss matching with actual results.  In an ideal system, all "misses" are random, so that the system does not favor or disfavor any identifiable group of teams.  The Balanced RPI appears to accomplish this and shows, as a measuring stick, what one reasonably can expect a rating system to do.  As the conferences and geographic regions analyses show, the NCAA RPI does not accomplish this.

CONCLUSION

Based on the ability of schedulers to "trick" the NCAA RPI and on its conference- and region-based discrimination as compared to what the Balanced RPI shows is achievable, the NCAA RPI continues to get a failing grade as a rating system for Division I women's soccer.

No comments:

Post a Comment