INTRODUCTION
My "correlator" program evaluates rating systems for Division I women's soccer. The correlator uses the combined games data from seasons beginning with 2010. I update the correlator's evaluations each year by adding the just-completed year's data.. The correlator data base now includes just short of 44,000 games.
I have completed the evaluation updates for the NCAA RPI and for the Balanced RPI following the 2024 season.
For games data from 2010 and after, for both rating systems, I use game results as though the "no overtime" rule had been in effect the entire time.
For 2010 and after, for both rating systems I use ratings computed as though the 2024 NCAA decision, to count ties as 1/3 rather than 1/2 of a win for RPI formula Winning Percentage purposes, had been in effect the entire time.
For 2010 and after, for the NCAA RPI, I use ratings computed as though the 2024 NCAA decision altering the RPI bonus and penalty adjustment structure had been in effect the entire time. The Balanced RPI does not have a bonus/penalty structure.
Thus the evaluations use the entire data base to show how well the current NCAA RPI formula performs, as compared to the Balanced RPI.
Why use the Balanced RPI for the comparison? It uses the same data as the NCAA RPI; and the NCAA could implement it as a replacement for the NCAA RPI simply by adjusting and adding to its current NCAA RPI computation program. Thus it is a realistic measuring stick against which to evaluate the NCAA RPI. Typically but not always, the Balanced RPI's rankings are similar to Massey's.
DISCUSSION
Ability of Teams to "Trick" the System Through Smart Scheduling
In 2025 Article 1, I showed that the NCAA RPI has significant differences between a team's NCAA RPI rank and its rank within the NCAA RPI formula as a Strength of Schedule Contributor to its opponents' ratings. The above table shows this for the NCAA RPI as compared to the Balanced RPI. The first three columns with numbers are self-evident. The five columns on the right show, for each rating system, the percent of teams for which the NCAA RPI rank versus the RPI formula's SoS Contributor rank difference is 5, 10, 15, 20, and 25 or fewer positions.
For the NCAA RPI, beccause of these differences, teams can "trick" the ratings and ranks through smart scheduling. The following table shows this:
This table is from the 2024 season, so it shows a "real life" example. Although the first two color coded columns refer to the ARPI Rank "2015 BPs," their numbers actually are for the 2024 version of the NCAA RPI. And the two color coded columns on the right are for the Balanced RPI. I've included the Massey ranks so you can use them as a "credibility" check for the Balanced RPI ranks.
If I am a coach looking for a non-conference opponent with my NCAA Tournament prospects in mind, I might think that a game against any of the four teams would have an about equal effect on my likely result, my NCAA RPI rank, and my NCAA Tournament prospects. I would be wrong:
It is true that the NCAA RPI ranks will be telling the Women's Soccer Committee that the four teams are about equal.
In terms of the teams' true strength, however, Liberty and James Madison are considerably weaker than the NCAA RPI says and California is significantly stronger. Both the Balanced RPI and Massey indicate that the true order of strength of the teams is Liberty as the weakest, followed by James Madison, then Oklahoma, then California as the strongest.
In addition, looking at the NCAA RPI formula's ranks of the teams as SoS Contributors to their opponents' NCAA RPI ratings, the order of contributions is Liberty as the best contributor, followed by James Madison, then California, then Oklahoma.
Thus Liberty is the best team as an opponent. The NCAA ranks tell the Committee it is equal in strength to the other three teams. In terms of actual strength (Massey and the Balanced RPI), however, it is the weakest of the teams by a good margin. And it will make the best contribution under the NCAA RPI formula to its opponents' Strengths of Schedule by a good margin. And, by a comparable analysis, James Madison is second best as an opponent. Thus by scheduling Liberty or James Madison and avoiding Oklahoma and California I am able to "trick" the system and the Committee into thinking I'm better than I really am.
In grading the NCAA RPI as a rating system, this ability of teams to "trick" the system through smart non-conference scheduling is a big "fail." And, as the Balanced RPI shows, it is an unnecessary fail.
Ability of the System to Rate Teams from a Conference Fairly in Relation to Teams from Other Conferences
This chart, based on the NCAA RPI, shows the relationship between conferences' ratings and how their teams perform in relation to their ratings. The conferences are arranged from left to right in order of strength: The conference with the best average rating is on the left and with the poorest on the right. For each conference, the correlator determines what its teams' combined winning percentage in non-conference games should be based on the rating differences (as adjusted for home field advantage) between the teams and their opponents and also determines what their actual winning percentage is. The axis on the left shows the differences between these two numbers. For example, the conferences with the best rating, on the left, wins roughly 5% more games than it should according to its ratings. The black line is a trend line that shows the relationship between conferencce strength and conference performance relative to ratings. The formula on the chart shows what the expected difference is at any point on the conference strength line.
As you can see, stronger conferences perform better than their ratings say they should and weaker conferences perform more poorly. In other words, stronger conferences are underrated and weaker conferences are overrated. If you have read 2025 Article1, this is exactly as expected.
This table comes from the data underlying the above chart and a comparable chart for the Balanced RPI (see the chart, below). The first three columns with numbers are based on individual conferences' performance in relation to their ratings. In the Conferences Actual Less Likely Winning Percentage, High column is the performance of the conference that most outperforms its rating: The NCAA RPI's winning percentage for this conference is 4.9% better than it should be according to its rating. In the Conference Less Likely Winning Percentage, Low column is the conference that most underperforms its rating: for the NCAA RPI by -7.1%. The Conference Actual Less Likely Winning Percentage, Spread column is the difference between these two numbers: for the NCAA RPI 12.0%. This last number is one measure of how good or poor the rating system is at rating conferences' teams in relation to teams from other conferences.
The fourth column with numbers, Conferences Actual Less Likely Winning Percentage, Over and Under Total, is the total amount by which all conferences' teams perform better or poorer than what their performance should be based on their ratings: for the NCAA RPI 64.6%. This is another measure of how good or poor a rating system is at rating conferences' teams.
The fifth through seventh columns are for discrimination in relation to conference strength and come from the trend line formula in the conferences chart. The Conferences Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, High is what the trend line says is the "expected"performance for the strongest conference on the left of the chart: for the NCAA RPI 4.3% better than it should be according to its rating. And the Conference Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Low is for the epected performance of the weakest conference on the right: for the NCAA RPI 5.6% poorer than it should be. The Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Spread is the difference between the High and the Low; for the NCAA RPI 9.9%. This is a measure of the NCAA RPI's discrimination in relation to conference strength.
If you compare the numbers for the Balanced RPI to those for the NCAA RPI, you can see that (1) the NCAA RPI performs significantly more poorly than the Balanced RPI at rating teams from a confernce in relation to teams from other conferences and (2) the NCAA RPI discriminates against stronger and in favor of weaker conferences whereas the Balanced RPI has virtually no discrimination in relation to conferencee strength.
The following chart confirms that the Balanced RPI does not discriminate in relation to conference strength:
Ability of the System to Rate Teams from a Geographic Region Fairly in Relation to Teams from Other Geographic Regions
This chart, for the NCAA RPI, is lilke the first "conferences" chart above, but is for the geographic regions. The regions are in order of average NCAA RPI strength from the strongest on the left to the weakest on the right. Although the trend line suggests that the NCAA RPI discriminates against stronger and in favor of weaker regions, I do not find the chart particularly persuasive and the R squared number on the chart supports this. The R squared number is a measure of how well the data match up with the trend line. An R squared number of 1 is a perfect match and 0 is no match at all. The R squared number on the chart of 0.4 is a relatively weak match. Thus although the chart may indicate some discrimination against stronger and in favor of weaker regions, region strength may not be the main driver of the region performance differences.
This table for regions is like the left-hand side of the table above for conferences. As you can see it shows that the NCAA RPI does a poor job of rating teams from a region in relation to teams from the other regions, when compared to the job the Balanced RPI does.
This second table is like the right-hand side of the conferences table. The first three columns with numbers are for the trend in relation to the proportion of ties -- parity -- within the regions and the next three columns are for the trend in relation to region strength. As you can see, in relation to both parity and strength, the NCAA RPI has significant discrimination as compared to the Balanced RPI.
This table is a look at the simple question: How often does the better rated team, after adjustment for home field advantage, win, tie, and lose? As you can see, compared to the Balanced RPI, the NCAA RPI's better rated team wins 0.6% fewer times. This is not a big difference, since an 0.1% difference represents about 3 games per year, so 0.6% represents 18 games per year out of about 3,000 games. Nevertheless, the Balanced RPI performs better.
This is like the preceding table except that it covers only games that involve at least one team in the rating system's top 60. Since the NCAA RPI and Balanced RPI have different Top 60 teams, their Top 60 teams have different numbers of ties. This makes it preferable to compare the systems based on how their ratings match with results in games that are not ties. As you can see, after discounting ties, the Balanced RPI is consistent with results 0.2% of the time more than the NCAA RPI. Here too, this is not a big difference since an 0.1% difference represents 1 game per year out of about 1,000 games involving at least one Top 60 team.
No comments:
Post a Comment