Wednesday, February 26, 2025

2025 ARTICLE 3: THE NCAA RPI AND ITS EFFECT ON NCAA TOURNAMENT AT LARGE SELECTIONS

In my first two 2025 articles, I showed the NCAA RPI's defects and how they cause the RPI to discriminate against teams from some conferences and regions and in favor of teams from others.  In this post, I discuss how determinative the NCAA RPI is in the NCAA Tournament at large participant selection process.

Suppose the NCAA were to award at large NCAA Tournament positions to teams based strictly on their NCAA RPI ranks.  How much difference would it make from having the Women's Soccer Committee make the at large selections?  In other words, how many changes would there be from the Committee's awards?  Before reading further, as a test of your own sense of the NCAA RPI's importance in the at large selection process, write down what you think the average number of changes would be per year if the NCAA simply made at large selections based on teams' NCAA RPI ranks.  Later in this article, you'll be able to compare your guess to the actual number.

NCAA Tournament At Large Selection Factors

The NCAA requires the Committee to consider certain factors when making its NCAA Tournament at large selections.  As those of you who follow my work know, I have converted those factors into a series of individual factors and also have paired them to create an additional series of paired factors in which each individual factor has a 50% weight.  Altogether, this produces a series of 118 factors.  Some of the NCAA's individual factors have numerical scoring systems -- for example, the NCAA RPI and NCAA RPI Ranks -- and some do not -- for example, Head to Head Results.  For those factors that do not have NCAA-created scoring systems, I have created scoring systems.

It is possible, by comparing the teams to which the Committee has given at large positions to teams' scores for a factor, to see how close the match-up is between the Committee's at large selections and the factor scores.  The following table shows the factors that best match the Committee's at large selections over the 17 years from 2007 through 2024 (excluding Covid-affected 2020):



As you can see, the Committee's at large selections match teams' NCAA RPI ranks 92.6% of the time.  The Committee has "overruled" the NCAA RPI 42 times (568-526) over the 17 year data period.  The following table shows how the Committee overrules have played out over the years:


As the table shows, over the years, the Committee's selections have differed from the NCAA RPI ranks by from 1 to 4 positions.  The average difference has been 2.47 positions per year.  (This is the answer to the question at the top of this article.)  The median has been 2.  A way to think about this is that on average all the Committee's work has resulted in a change of only 2 to 3 teams per year from what the at large selections would have been if the NCAA RPI made the selections.  This suggests that no matter what the Committee members may think, the NCAA RPI mostly controls the at large selection process, with the Committee's work making differences only at the fringes.

NCAA Tournament At Large Factor Standards

For each of the factors, it is possible to use the 17 years' data to identify what I call "yes" --or "In" -- and "no" -- or "Out" -- standards.  For an "In" standard, any team that has done better than that standard over the 17 year data period always has gotten an at large selection.  Conversely, any team that has done more poorly than an "Out" standard never has gotten an at large selection.  The following table shows the standards for the NCAA RPI Ratings and NCAA RPI Ranks:


In the table, the At Large column has the "In" standards and the No At Large column the "Out" standards.  Thus teams with NCAA RPI ratings better than 0.6045 always have gotten at large selections and teams with ratings poorer than 0.5654 never have gotten selections.  Likewise teams with NCAA RPI ranks better than 27 always have gotten at large selections and teams with ranks poorer than 57 never have.

It is possible to match up teams' end-of-season scores for all of the factors with the factor standards to produce a table like the following one for the 2024 season.  There is an explanation following the table.


The table includes all NCAA RPI Top 57 teams that were not conference champion Automatic Qualifiers.  It is limited to the Top 57 since no team ranked poorer than #57 has ever gotten an at large selection.

NCAA Seed or Selection: This column shows the Committee decision for each team:

 1, 2, 3, and 4 are for 1 through 4 seeds

4.5, 4.6, 4.7, and 4.8 are for 5 through 8 seeds

6 is for unseeded teams that got at large selections

7 is for unseeded teams that did not get at large selections

8 is for teams disqualified from at large selection due to winning percentages belows 0.500

Green is for at large selections and red is for not getting at large selections.

NCAA RPI Rank for Formation:  This is teams' NCAA RPI ranks.

At Large Status Based on Standards:  This column is based on the two grey columns on the right.  The first grey column shows the number of at large "In" factor standards a team has met and the second shows the number of at large "Out" standards for the team.  In 2024, the Tournament had 34 at large openings.  Counting down the "In" and "Out" columns, there were 31 teams that met at least 1 "In" standard and 0 "Out" standards.  In the At Large Status Based on Standards column, these teams are marked "In" and color coded green.  This means the standards identified 31 teams to get at large selections, leaving 3 additional openings to fill.  Counting down further, there were 5 teams that met 0 "In" and 0 "Out" standards.  This means those teams could not be definitively ruled 'In" but also could not be definitively ruled "Out."   This means those 5 teams should be "Candidates" for the 3 remaining openings.  They are marked "Candidate" and color coded yellow.  And counting down further are teams that met 0 "In" standards and at least 1 "No" standard.  This means the standards identified those teams as not getting at large selections.  They are marked "Out" and color coded red.

Supplementing the NCAA Tournament At Large Factor Standards with a Tiebreaker

If you look at the first table in this article, you will see that the NCAA RPI Rank and Top 50 Results Rank paired factor is the best individual indicator of which teams will get at large selections. After applying the factor standards method described above, it is possible to use Candidate teams' scores for this factor as a "tiebreaker" to decide which of those teams should fill any remaining at large openings.

The following table adds use of this tiebreaker to the factor standards method for the 2024 season:


In the table, the "NCAA RPI Rank and Top 50 Results Rank As At Large Tiebreaker" column shows teams' scores for that factor,  The lower the score, the better.  In the At Large Status Based on Standards and Tiebreaker Combined column, the "In" green cells are for teams that get at large selections based on the Standards plus those teams from the Candidates that get at large selections based on the Tiebreaker.  If you compare these to the actual NCAA Seeds or Selections on the left, you will see that the Standards and Tiebreaker Combined at large selections match the actual selections for all but 1 at large position.

In relation to the power of the RPI in directing Committee decisions, it is important to note that the Tiebreaker is based on teams' NCAA RPI Ranks and their ranks based on their Top 50 Results.  Teams' Top 50 Results scores come from a scoring system I developed based on my observations of Committee decisions.  The scoring system awards points based on good results -- wins and ties -- against opponents ranked in the NCAA RPI Top 50, with the awards depending on the ranks of the opponents and heavily slanted towards good results against very highly ranked opponents.  Since the Top 50 Results scores are based on opponents' NCAA RPI ranks, even this part of the Tiebreaker is NCAA RPI dependent.

The following table adds to the preceding tables an At Large Status Based on NCAA RPI Rank column to give an overall picture of how the different "selection methods" compare -- the Committee method, the NCAA RPI rank method, and the Standards and Tiebreaker Combined method.


 Summary Data

The following table shows a summary of the data for each year:


 In this table, I find the color coded information at the bottom in the High, Low, Average, and Median rows most informative.  The information in the green column shows the difference between what the Committee has decided on at large selections over the years as compared to what the decisions would have been if the NCAA simply used the NCAA RPI.  The information in the salmon column shows what the difference would have been -- about 1 1/3 positions per year, with a median of 1 -- if the NCAA used a more refined method than the NCAA RPI, but one stilll very heavily influenced by the NCAA RPI.

Altogether, the numbers suggest that the NCAA RPI exerts an almost determinative influence on which teams get NCAA Tournament at large positions.  This does not mean the Committee members think that is the case, they may believe that they are able to value other factors as much as or even more than the NCAA RPI.  But whatever the individual members think, the numbers suggest that the Committee as a whole is largely under the thumb of the NCAA RPI.

Given the fundamental flaws of the NCAA RPI, as discussed in 2025 Articles 1 and 2, the near-determinative power of the NCAA RPI in the NCAA Tournament at large selection process is particularly disturbing.

Tuesday, February 25, 2025

2025 ARTICLE 2: GRADING THE NCAA RPI AS A RATING SYSTEM, 2025 UPDATE

INTRODUCTION

My "correlator" program evaluates rating systems for Division I women's soccer.  The correlator uses the combined games data from seasons beginning with 2010.  I update the correlator's evaluations each year by adding the just-completed year's data..  The correlator data base now includes just short of 44,000 games.

I have completed the evaluation updates for the NCAA RPI and for the Balanced RPI following the 2024 season.

For games data from 2010 and after, for both rating systems, I use game results as though the "no overtime" rule had been in effect the entire time. 

For 2010 and after, for both rating systems I use ratings computed as though the 2024 NCAA decision, to count ties as 1/3 rather than 1/2 of a win for RPI formula Winning Percentage purposes, had been in effect the entire time. 

For 2010 and after, for the NCAA RPI, I use ratings computed as though the 2024 NCAA decision altering the RPI bonus and penalty adjustment structure had been in effect the entire time.  The Balanced RPI does not have a bonus/penalty structure.

Thus the evaluations use the entire data base to show how well the current NCAA RPI formula performs, as compared to the Balanced RPI.

Why use the Balanced RPI for the comparison?  It uses the same data as the NCAA RPI; and the NCAA could implement it as a replacement for the NCAA RPI simply by adjusting and adding to its current NCAA RPI computation program.  Thus it is a realistic measuring stick against which to evaluate the NCAA RPI.  Typically but not always, the Balanced RPI's rankings are similar to Massey's.

DISCUSSION

Ability of Teams to "Trick" the System Through Smart Scheduling



In 2025 Article 1, I showed that the NCAA RPI has significant differences between a team's NCAA RPI rank and its rank within the NCAA RPI formula as a Strength of Schedule Contributor to its opponents' ratings.  The above table shows this for the NCAA RPI as compared to the Balanced RPI.  The first three columns with numbers are self-evident.  The five columns on the right show, for each rating system, the percent of teams for which the NCAA RPI rank versus the RPI formula's SoS Contributor rank difference is 5, 10, 15, 20, and 25 or fewer positions.

As you can see from the entire table, for the Balanced RPI, teams' full ranks are either identical or almost identical to their SoS Contributor ranks.  For the NCAA RPI, the differences are big.

For the NCAA RPI, beccause of these differences, teams can "trick" the ratings and ranks through smart scheduling.  The following table shows this:


This table is from the 2024 season, so it shows a "real life" example.  Although the first two color coded columns refer to the ARPI Rank "2015 BPs," their numbers actually are for the 2024 version of the NCAA RPI.  And the two color coded columns on the right are for the Balanced RPI.  I've included the Massey ranks so you can use them as a "credibility" check for the Balanced RPI ranks.

If I am a coach looking for a non-conference opponent with my NCAA Tournament prospects in mind, I might think that a game against any of the four teams would have an about equal effect on my likely result, my NCAA RPI rank, and my NCAA Tournament prospects.  I would be wrong:

It is true that the NCAA RPI ranks will be telling the Women's Soccer Committee that the four teams are about equal.

In terms of the teams' true strength, however, Liberty and James Madison are considerably weaker than the NCAA RPI says and California is significantly stronger.  Both the Balanced RPI and Massey indicate that the true order of strength of the teams is Liberty as the weakest, followed by James Madison, then Oklahoma, then California as the strongest.

In addition, looking at the NCAA RPI formula's ranks of the teams as SoS Contributors to their opponents' NCAA RPI ratings, the order of contributions is Liberty as the best contributor, followed by James Madison, then California, then Oklahoma. 

Thus Liberty is the best team as an opponent.  The NCAA ranks tell the Committee it is equal in strength to the other three teams.  In terms of actual strength (Massey and the Balanced RPI), however, it is the weakest of the teams by a good margin.  And it will make the best contribution under the NCAA RPI formula to its opponents' Strengths of Schedule by a good margin.  And, by a comparable analysis, James Madison is second best as an opponent.  Thus by scheduling Liberty or James Madison and avoiding Oklahoma and California I am able to "trick" the system and the Committee into thinking I'm better than I really am.

In grading the NCAA RPI as a rating system, this ability of teams to "trick" the system through smart non-conference scheduling is a big "fail."  And, as the Balanced RPI shows, it is an unnecessary fail.

Ability of the System to Rate Teams from a Conference Fairly in Relation to Teams from Other Conferences


This chart, based on the NCAA RPI, shows the relationship between conferences' ratings and how their teams perform in relation to their ratings.  The conferences are arranged from left to right in order of strength: The conference with the best average rating is on the left and with the poorest on the right.  For each conference, the correlator determines what its teams' combined winning percentage in non-conference games should be based on the rating differences (as adjusted for home field advantage) between the teams and their opponents and also determines what their actual winning percentage is.  The axis on the left shows the differences between these two numbers.  For example, the conferences with the best rating, on the left, wins roughly 5% more games than it should according to its ratings.  The black line is a trend line that shows the relationship between conferencce strength and conference performance relative to ratings.  The formula on the chart shows what the expected difference is at any point on the conference strength line.

As you can see, stronger conferences perform better than their ratings say they should and weaker conferences perform more poorly.  In other words, stronger conferences are underrated and weaker conferences are overrated.  If you have read 2025 Article1, this is exactly as expected.


This table comes from the data underlying the above chart and a comparable chart for the Balanced RPI (see the chart, below).  The first three columns with numbers are based on individual conferences' performance in relation to their ratings.  In the Conferences Actual Less Likely Winning Percentage, High column is the performance of the conference that most outperforms its rating: The NCAA RPI's winning percentage for this conference is 4.9% better than it should be according to its rating.  In the Conference Less Likely Winning Percentage, Low column is the conference that most underperforms its rating: for the NCAA RPI by -7.1%.  The Conference Actual Less Likely Winning Percentage, Spread column is the difference between these two numbers: for the NCAA RPI 12.0%.  This last number is one measure of how good or poor the rating system is at rating conferences' teams in relation to teams from other conferences.

The fourth column with numbers, Conferences Actual Less Likely Winning Percentage, Over and Under Total, is the total amount by which all conferences' teams perform better or poorer than what their performance should be based on their ratings: for the NCAA RPI 64.6%.  This is another measure of how good or poor a rating system is at rating conferences' teams.

The fifth through seventh columns are for discrimination in relation to conference strength and come from the trend line formula in the conferences chart.  The Conferences Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, High is what the trend line says is the "expected"performance for the strongest conference on the left of the chart: for the NCAA RPI 4.3% better than it should be according to its rating.  And the Conference Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Low is for the epected performance of the weakest conference on the right: for the NCAA RPI 5.6% poorer than it should be.  The Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Spread is the difference between the High and the Low; for the NCAA RPI 9.9%.  This is a measure of the NCAA RPI's discrimination in relation to conference strength.

If you compare the numbers for the Balanced RPI to those for the NCAA RPI, you can see that (1) the NCAA RPI performs significantly more poorly than the Balanced RPI at rating teams from a confernce in relation to teams from other conferences and (2) the NCAA RPI discriminates against stronger and in favor of weaker conferences whereas the Balanced RPI has virtually no discrimination in relation to conferencee strength.

The following chart confirms that the Balanced RPI does not discriminate in relation to conference strength:


Ability of the System to Rate Teams from a Geographic Region Fairly in Relation to Teams from Other Geographic Regions

I divide teams among four geographic regions based on where the majority or plurality of their opponents are located: Middle, North, South, and West.


This chart, for the NCAA RPI, is lilke the first "conferences" chart above, but is for the geographic regions.  The regions are in order of average NCAA RPI strength from the strongest on the left to the weakest on the right.  Although the trend line suggests that the NCAA RPI discriminates against  stronger and in favor of weaker regions, I do not find the chart particularly persuasive and the R squared number on the chart supports this.  The R squared number is a measure of how well the data match up with the trend line.  An R squared number of 1 is a perfect match and 0 is no match at all.  The R squared number on the chart of 0.4 is a relatively weak match.  Thus although the chart may indicate some discrimination against stronger and in favor of weaker regions, region strength may not be the main driver of the region performance differences.

Here is a second chart for regions, but rather than relating the regions' performance to their strength it relates performance to their levels of internal parity as measured by the proportion of intra-regional ties.



This chart suggests that the higher the proportion of a region's intra-regional games that are ties and, by logical extension, the higher the level of parity within the region, the more its teams' actual performance in games against teams from other regions exceeds their expected performance based on their ratings.  And, as you can see from the R squared value, this trend line is much more representative of the data than for the chart based on region strength.  What this suggests is that the NCAA RPI has a problem properly rating teams from a region in relation to teams from other regions when the regions have different levels of intra-region parity.  It discriminates against regions with high intra-region parity and in favor of regions with less parity.  If you consider the description in 2025 Article 1 of how the NCAA constructs the RPI, this is what one would expect: The NCAA RPI rewards teams that play opponents with good winning percentages, largely without reference to the strength of those opponents' opponents.  If a region has a low level of parity, there are many intra-region opponents to choose from that will have good winning percentages.  But if a region has a high level of parity, there are fewer opponents to choose from that will have good winning percentages.

For further confirmation that one would expect the NCAA RPI to underrate regions with higher levels of parity and overrate regions with lower levels, see the "Why Does the NCAA RPI Have a Regions Problem?" section of the RPI: Regional Issues page at the RPI for Division I Women's Soccer website.


This table for regions is like the left-hand side of the table above for conferences.  As you can see it shows that the NCAA RPI does a poor job of rating teams from a region in relation to teams from the other regions, when compared to the job the Balanced RPI does.


This second table is like the right-hand side of the conferences table.  The first three columns with numbers are for the trend in relation to the proportion of ties -- parity -- within the regions and the next three columns are for the trend in relation to region strength.  As you can see, in relation to both parity and strength, the NCAA RPI has significant discrimination as compared to the Balanced RPI.

Here are the charts for the Balanced RPI, which you can compare to the above charts for the NCAA RPI:





Ability of the System to Produce Ratings That Will Match Overall Game Results


This table is a look at the simple question: How often does the better rated team, after adjustment for home field advantage, win, tie, and lose?  As you can see, compared to the Balanced RPI, the NCAA RPI's better rated team wins 0.6% fewer times.  This is not a big difference, since an 0.1% difference represents about 3 games per year, so 0.6% represents 18 games per year out of about 3,000 games.  Nevertheless, the Balanced RPI performs better.


This is like the preceding table except that it covers only games that involve at least one team in the rating system's top 60.  Since the NCAA RPI and Balanced RPI have different Top 60 teams, their Top 60 teams have different numbers of ties.  This makes it preferable to compare the systems based on how their ratings match with results in games that are not ties.  As you can see, after discounting ties, the Balanced RPI is consistent with results 0.2% of the time more than the NCAA RPI.  Here too, this is not a big difference since an 0.1% difference represents 1 game per year out of about 1,000 games involving at least one Top 60 team.

What this shows is that the difference between how the NCAA RPI and Balanced RPI perform is not in how consistent their ratings are with game results overall.  Both have similar error rates, with the Balanced RPI performing slightly better.

The difference is in where the systems' ratings miss matching with actual results.  In an ideal system, all "misses" are random, so that the system does not favor or disfavor any identifiable group of teams.  The Balanced RPI appears to accomplish this and shows, as a measuring stick, what one reasonably can expect a rating system to do.  As the conferences and geographic regions analyses show, the NCAA RPI does not accomplish this.

CONCLUSION

Based on the ability of schedulers to "trick" the NCAA RPI and on its conference- and region-based discrimination as compared to what the Balanced RPI shows is achievable, the NCAA RPI continues to get a failing grade as a rating system for Division I women's soccer.