Monday, March 10, 2025

2025 ARTICLE 6: THE KP INDEX (KPI): THE COMMITTEE'S MISTAKE AND THE BIG 10'S OWN GOAL

 From Report of the Women's Soccer Committee December 9, 2024 and January 29, 2025 Meetings:

"KPI, and other ranking systems.  The Committee felt the KPI should be used more in selections as a valued tool in the process.  The Committee reviewed the Massey Ratings and decided not to request it as an additional criterion at this point."

Thus, as it has done for the past two years, the Committee intends to supplement the NCAA RPI with the KPI as a system for ranking teams.  It intends to use the KPI more, however, than it has in the past.  This is a change from a year earlier, when the Committee reported that it had found the KPI not useful and proposed use of Massey beginning in 2026.

The Committee for a while has pushed for use of more than just the NCAA RPI for team ratings and rankings.  The Committee two years ago finally received approval to use the KPI.  After it received that approval, I asked a Committee member where the decision to use the KPI, as the particular system approved for use, came from.  The member advised me it did not come from the Committee, so far as he recalled.  My assumption, therefore, is it came from the NCAA staff.

What Is the KPI? 

The KPI is a product of Kevin Pauga, the Associate Athletic Director for Strategic Initiatives and Conference Planning at Michigan State.  It appears the KPI is something he produces outside his formal work for Michigan State.  Pauga also is regarded as a scheduling expert, employed by the Big 10 and other conferences to produce conference schedules for their teams.  His scheduling system considers many parameters.  I do not know if it considers the relationship between teams' schedules and their NCAA RPI rankings.

According to a New York Times article dated March 26, 2015, Kevin Pauga started using the KPI for college basketball ratings and rankings in 2013.  According to the article, a KPI rating is a

"number that, essentially, correlates to how valuable a team’s wins are versus how damaging its losses are. The ratings run from negative-one (bad) to plus-one (good), and combine variables that might change over the course of a season. His goal is to quantify so-called good wins and bad losses, and to assess how teams compare with one another."

And, according to an NCAA release dated March 5, 2025, for NCAA men's basketball:

"The Kevin Pauga Index metric ranks team resumes by assigning a value to each game played. The best win possible is worth about +1.0, the worst loss about -1.0, and a virtual tie at 0.0.  Adjustments are made to each game's value based on location of the game, opponent quality and percentage of total points scored. Game values are added together and divided by games played to determine a team's KPI ranking."

Beyond these descriptions, it appears the KPI is proprietary, so I do not know exactly what its formulas are.

Is the KPI a Good Rating System for Division I Women's Soccer?

I don't know how well the KPI functions as a rating system for other NCAA sports.  For Division I women's soccer, however, it is a poor rating system.  It has the same defects as the NCAA RPI.

In 2025 Article 2, I explained my system for grading Division I women's soccer rating systems and did a detailed review of how the NCAA RPI performs as a rating system.  As I showed:

"Based on the ability of schedulers to "trick" the NCAA RPI and on its conference- and region-based discrimination as compared to what the Balanced RPI shows is achievable, the NCAA RPI continues to get a failing grade as a rating system for Division I women's soccer."

The following review of the KPI is similar to the review of the NCAA RPI in 2025 Article 2.  I will not go through the same detailed explanation of the grading system here as I gave in Article 2, so you might want to review Article 2 before proceeding further.

Ability of the System to Rate Teams from a Conference Fairly in Relation to Teams from Other Conferences



This table has the conferences arranged in order from those with the best KPI average rating from 2017 through 2024 at the top and those with the poorest at the bottom.  (KPI ratings are available only for years since 2017.)

In the table:

The Conference NonConference Actual Winning Percentage column shows the conference's actual winning percentage against non-conference opponents.  In calculating Winning Percentage, I use the NCAA RPI Winning Percentage formula in effect as of 2024.

The Conference NonConference Likelihood Winning Percentage column shows the conference's expected winning percentage against non-conference opponents, based on the differences between opponents' KPI ratings as adjusted for home field advantage.  The expected winning percentage for each game is determined using a Result Probability Table for the KPI.  The table comes from an analysis of the location-adjusted rating differences and the results of all games played since 2017.  This method for determining expected winning percentages is highly precise when applied to large numbers of games as shown by the following table for the KPI:

 

In this table, the Total Win, Tie, and Loss Likelihood columns show the expected wins, losses, and ties by the higher rated team, after adjustment for home field advantage, for all games since 2017.  The columns show these in absolute numbers and as a percentage of all games played.  The Total Actual Wins, Ties, and Losses columns show similar numbers, but based on the actual results.  As you can see, out of the more than 22,000 games played, the difference between the expected wins and actual wins is 11 games, between the expected ties and actual ties is  6 games, and between the expected losses and actual losses is 4 games.  In other words, as I stated, the Result Probability Table method is a highly precise way of determining expected winning percentages for the KPI.

Returning to the larger table above, the Conference NonConference Actual Less Likely Winning Percentage column shows the difference between the conference teams' actual winning percentages in non-conference games and their expected winning percentages based on their games' location-adjusted KPI rating differences.  A positive difference means the teams' actual winning percentages are better than expected based on the KPI and a negative difference means the teams' actual winning percentages are poorer. 


This chart is based on the conferences table.  In the chart, conferences are arranged in order of those with the highest average KPI ratings on the left to those with the poorest ratings on the right.  The vertical axis is for the difference between a conference's actual and its KPI-expected winning percentage.  As you can see from the chart, stronger conferences (on the left) tend to perform better than their KPI ratings say they should and weaker conferences (on the right) tend to perform more poorly.  In other words, the KPI underrates teams from stronger conferences and overrates teams from weaker conferences.  This is the same pattern the NCAA RPI has.  The downward sloping straight line is a trend line showing the pattern of the data; and the formula on the chart is a formula whose use can tell what the data indicate the actual v expected difference should be for any particular conference average KPI rating.


This table draws from the data underlying the chart and from the chart itself, as well as from similar data and charts for the NCAA RPI and the Balanced RPI.  It thus compares the KPI to the NCAA RPI and also includes the Balanced RPI (which is similar to what the Massey rating system would show), to show what a good rating system can do.

In the first two color coded columns, the "Spread" column shows the performance percentage difference between the conference that most outperforms what the KPI says its performance should be and the conference that most underperforms.  The "Under and Over" column shows the total amounts by which all conferences either outperform or underperform.  Both of these columns are measures of rating system fairness in relation to teams from the different conferences.  As you can see, the KPI and the NCAA RPI both do a poor job, as compared tto what a good rating system can do.

The color coded column on the right comes from the chart and shows the rating system pattern in relation to conference strength, using the chart's trend line formula.  It shows the extent of discrimination in relation to conference strength.  As the column shows, the KPI and NCAA RPI have similar discrimination based on conference strength, whereas a good system as exemplified by the Balanced RPI has virtually no discrimination.

Ability of the System to Rate Teams from a Geographic Region Fairly in Relation to Teams from Other Geographic Regions

The following table and chart are similar to those above for conferences, but are for geographic regions:





The table and chart for geographic regions show an overall pattern similar to that for conferences.  The data point pattern, however, is up and down enough that I do not have a high level of confidence in the relationship the chart shows between regions' average ratings and their actual performance as compared to their KPI-expected performance.  This is indicated by the R squared value I have included in the chart below the trend line formula.  The R squared value is a measure of how well the data fit the trend line, with an R squared value of 1 being a perfect fit and of 0 being no fit.  The R squared value on the chart indicates a mediocre fit.

Here is another table-chart set.  They is based on the relationship between the proportion of games in each region that are ties, as a measure of the amount of parity within the region.




This chart shows a similar patter in relation to region average KPI rating, but with a significantly better R squared value.

Altogether, the tables and charts for regions result in the following comparison table:



As the first two highlighted columns show, the KPI and the NCAA RPI have relatively similar levels of discrimination among conferences, in terms of general fairness.  And as the Balanced RPI row shows, this could avoided by using a better rating system.  And, as the two color coded columns to the right show, to the extent that the KPI discriminates in relation to region parity or region strength, its discimination is similar to that of the NCAA RPI.  Again, the Balanced RPI row shows this could be avoided by using a better rating system.

Ability of the System to Produce Ratings That Will Match Overall Game Results



This table simply shows the extent to which the higher rated team, after adjustment for home field advantage, wins, ties, and loses for the particular rating systems.  Thus it is a gross measure of how well game results correlate with the systems' ratings.  Since the NCAA RPI and Balanced RPI numbers are based on the No Overtime rule having been in effect for all seasons, whereas the KPI numbers for 2017 through 2021 use overtime games' results, the best comparison is in the highlighted column on the right, which disregards tie games.  As a point of reference, a difference of 0.1% represents a difference of roughly 3 games per year.  As you can see, the KPI gets the correct result about 15 times fewer per year (0.5%) than the NCAA RPI and about 39 times fewer (1.3%) than the Balanced RPI.


This is similar to the previous table but limited to games involving at least one team in the system's Top 60.  Again, the best comparison is from the color coded column on the right.  Here, a difference of 0.1% represents one game per year.  Again, the KPI performs more poorly than the NCAA RPI and than the Balanced RPI.

Conclusion About the KPI as a Rating System

As the above information shows, the KPI has the same problems as the NCAA RPI, with higher levels of discrimination among conferences and geographic regions than can be achieved by a good rating system.

So why would the NCAA staff agree to use of the KPI, if it is the NCAA staff that selected the KPI as an additional system as I suspect?  And why would the Women's Soccer Committee double down on the KPI as it intends to do this coming year?  The answer is simple: Because the KPI has the same problems as the NCAA RPI.  It will not rock the NCAA RPI boat, but rather will make the NCAA RPI look legitimate.  Whereas if the Committee were to use Massey, it would have significant differences from the NCAA RPI and thus would expose the NCAA RPI as a poor rating system.  Indeed, I suspect that the staff and Committee looked at Massey ratings as compared to the NCAA RPI, saw that there were significant differences, and did not want to have to deal with the fallout those differences exposed.  I do not know this, I could be wrong, but it is the best explanation I can come up with for the Committee's reversal of its position on the KPI and Massey over the last year.

The Big 10's Own Goal

Finally, why do I say the Big 10, in all of this, has committed an "own goal" blunder?  Remember, Ken Pauga is an Associate Athletic Director at the Big 10's Michigan State.  If you look at the first table above, you will see that the Big 10 is the second most discriminated against conference by the KPI, actually winning 4.1% more games per year against non-conference opponents than the KPI ratings say they should win.

The following table breaks the numbers down by Big 10 team, based on the teams in the conference before absorption of the Pac 12 teams in 2024:


As you can see from the column on the right, the KPI discriminates against 11 teams (including Pauga's Michigan State) and in favor of 3.  If I were to add the four teams absorbed from the Pac 12 in 2024, there would be 3 more discriminated against and 1 more in favor of.

Thus so far as Division I women's soccer is concerned, the Women's Soccer Committee now will be using, apparently with added emphasis, a rating system that comes from a Big 10 school administrator yet discrimates against the Big 10 as a whole and even against his own school.  When it comes to the "last in" and "first out" for NCAA Tournament at large selections, that looks like an "own goal" to me.

Tuesday, March 4, 2025

2025 ARTICLE 4: THE NCAA RPI AND ITS EFFECT ON NCAA TOURNAMENT SEEDS

In  2025 Article 3, I showed how powerful the NCAA RPI is in the NCAA Tournament at large participant selection process.  In this article, I extend that discussion to show the power of the NCAA RPI in the NCAA Tournament seeding process.  I also, at the end, will summarize the data in Article 3 and this article.

This article assumes that you have reviewed Article 3, particularly for an explanation of the "Standards" and "Standards Plus Tiebreaker" systems I use as ways to show the relationship between the season's game results data, the factors the Women's Soccer Committee considers when forming the NCAA Tournament bracket, and the Committee's actual decisions.

#1 Seeds

The following table shows the 7 most "powerful" factors when it comes to the Committee's selection of #1 seeds.  I measure a factor's power, for example in relation to #1 seeds, by considering how many of the factor's seeds would match the Committee's seeds if the NCAA simply awarded seeds based on that factor.


As you can see, for #1 seeds, the NCAA RPI, by itself, can "pick," on average, between 3 and 4 #1 seeds per year.  Further, all of the most powerful factors include the NCAA RPI.

The following table shows the NCAA RPI number of teams picked by year and also includes the number when applying the Standards and Standards Plus Tiebreaker systems.  Since the NCAA RPI is the most powerful single factor for #1 seeds, that is the Tiebreaker factor in the Standards Plus Tiebreaker system for #1 seeds.


The Standards system (the three white columns) is useful for getting a picture of how the Committee might go through the #1 seed selection process.  The "In" Selections column is teams that are clear #1 seeds based on past history.  The Open Positions column is the number of additional teams the Committee needs to pick to fill out the seed group.  The Candidates for Remaining Open Positions column is the number of teams the Committee has to choose from to fill the open positions -- these are teams that have no aspects of their profiles that say "yes, a team with this profile always has gotten a #1 seed" and also none that say "no, a team with this profile never has gotten a #1 seed."  Thus in 2007, as an example, according to the Standards system 3 #1 seeds were clear, with 1 open position and 2 candidates to fill the position.  This happens to match the Median situation over the years: 3 clear #1 seeds, leaving 1 open position to be filled from 2 candidates (see the Median row at the bottom of the table).

The Standards With Tiebreaker system takes the open position(s) candidates and ranks them using the Tiebreaker, with the best-ranked candidate(s) filling the open position(s).  Since the NCAA RPI is the best factor at picking #1 seeds, that is the tiebreaker for the #1 seed Standards Plus Tiebreaker system.  As you can see from the column on the right, the Standards Plus Tiebreaker system picks match 97.1% of the Committee's picks -- in other words all but 2 of the 68 #1 seeds over the years.

#2 Seeds


As you can see, again the most powerful factor is teams' RPIs.  In the table, CO stands for Common Opponents and HTH for Head to Head, referring to the NCAA factors of results against common opponents and head to head results.


The RPI, on its own, would fill correctly 64.7% of all #2 seeds.  This is considerably fewer than the percentage for #1 seeds.  It means that in filling #2 seeds, the Committee is significantly influenced by the RPI but also by other factors.

As with the #1 seeds, the Standards system fills about 3 positions per year, leaving 1 position to be filled from 2 candidates.  The Tiebreaker again is teams' RPIs.  The Standards Plus Tiebreaker system correctly fills 92.6% of the #2 seeds, having missed 5 of the 68 #2 seeds over the years.

#3 Seeds


The RPI continues to be the most powerful factor.


You can see that the Committee's #3 seed picks are less predictable, for all the systems.

#4 Seeds


The RPI remains the most powerful factor.


As you can see, the #4 seeds show a continuing decline in the systems' ability to predict seeds.

#1 to #4 Seeds as a Group

It will help to see how the systems match up with the Committee's #1 through #4 seed decisions, when considering teams as a group without reference to which of those seeds they get.



As the tables show, the RPI itself and the Standards Plus Tiebreaker system do a quite good job of matching with the Committee's decisions on which 16 teams will share the #1 through #4 seeds, with the Standards Plus Tiebreaker missing a median of only 1 of the 16 positions per year.

#5 to #8 Seeds as a Group

With the NCAA having had #5 through #8 seed pods only since 2022, there are not enough data to do meaningful breakdowns like the above for each of those pods.  The following tables show data for the four pods as a group, although in my opinion there still are not enough data to draw firm conclusions.


It is important to note that this table suggests that the RPI, for the #5 through #8 seeds, by itself is not the most powerful decision factor.  Rather, the paired NCAA RPI Rank and Top 50 Results Score factor is the most powerful.  Significantly, this relates to the information in 2025 Report 3, which shows that the paired NCAA RPI Rank and Top 50 Results Rank factor is the most powerful for at large selections.


As this table shows, the Committee's #5 through #8 seeds as a group appear to be pretty data driven, with the Standards Plus Tiebreaker system being a quite good indicator of which teams will be in the group.  It will take more years' data, however, to reach firmer conclusions on this.

Summary

The following table summarizes the above tables as well as the At Large tables from 2025 Article 3:


The RPI is the most powerful factor the Committee considers when it comes to #1 and #2 seeds and which teams will be in the #1 through #4 seed group as a whole.  From the larger seed group, which teams will get #3 and #4 seeds appears to be somewhat data driven but also appears to be somewhat random.

For the #5 through 8 seeds and particularly for at large selections, the Standards system plus the paired RPI and Top 50 Results factors as a tiebreakers give a good match to the Committee's decisions.  Since the Top 50 Results scoring system relates to teams' RPI ranks, the RPI plays a significant role in both halves of the paired factor.

From an overall perspective, the RPI appears to be a critical factor in all of the "close" decisions the Committee must make.

Wednesday, February 26, 2025

2025 ARTICLE 3: THE NCAA RPI AND ITS EFFECT ON NCAA TOURNAMENT AT LARGE SELECTIONS

In my first two 2025 articles, I showed the NCAA RPI's defects and how they cause the RPI to discriminate against teams from some conferences and regions and in favor of teams from others.  In this post, I discuss how determinative the NCAA RPI is in the NCAA Tournament at large participant selection process.

Suppose the NCAA were to award at large NCAA Tournament positions to teams based strictly on their NCAA RPI ranks.  How much difference would it make from having the Women's Soccer Committee make the at large selections?  In other words, how many changes would there be from the Committee's awards?  Before reading further, as a test of your own sense of the NCAA RPI's importance in the at large selection process, write down what you think the average number of changes would be per year if the NCAA simply made at large selections based on teams' NCAA RPI ranks.  Later in this article, you'll be able to compare your guess to the actual number.

NCAA Tournament At Large Selection Factors

The NCAA requires the Committee to consider certain factors when making its NCAA Tournament at large selections.  As those of you who follow my work know, I have converted those factors into a series of individual factors and also have paired them to create an additional series of paired factors in which each individual factor has a 50% weight.  Altogether, this produces a series of 118 factors.  Some of the NCAA's individual factors have numerical scoring systems -- for example, the NCAA RPI and NCAA RPI Ranks -- and some do not -- for example, Head to Head Results.  For those factors that do not have NCAA-created scoring systems, I have created scoring systems.

It is possible, by comparing the teams to which the Committee has given at large positions to teams' scores for a factor, to see how close the match-up is between the Committee's at large selections and the factor scores.  The following table shows the factors that best match the Committee's at large selections over the 17 years from 2007 through 2024 (excluding Covid-affected 2020):



As you can see, the Committee's at large selections match teams' NCAA RPI ranks 92.6% of the time.  The Committee has "overruled" the NCAA RPI 42 times (568-526) over the 17 year data period.  The following table shows how the Committee overrules have played out over the years:


As the table shows, over the years, the Committee's selections have differed from the NCAA RPI ranks by from 1 to 4 positions.  The average difference has been 2.47 positions per year.  (This is the answer to the question at the top of this article.)  The median has been 2.  A way to think about this is that on average all the Committee's work has resulted in a change of only 2 to 3 teams per year from what the at large selections would have been if the NCAA RPI made the selections.  This suggests that no matter what the Committee members may think, the NCAA RPI mostly controls the at large selection process, with the Committee's work making differences only at the fringes.

NCAA Tournament At Large Factor Standards

For each of the factors, it is possible to use the 17 years' data to identify what I call "yes" --or "In" -- and "no" -- or "Out" -- standards.  For an "In" standard, any team that has done better than that standard over the 17 year data period always has gotten an at large selection.  Conversely, any team that has done more poorly than an "Out" standard never has gotten an at large selection.  The following table shows the standards for the NCAA RPI Ratings and NCAA RPI Ranks:


In the table, the At Large column has the "In" standards and the No At Large column the "Out" standards.  Thus teams with NCAA RPI ratings better than 0.6045 always have gotten at large selections and teams with ratings poorer than 0.5654 never have gotten selections.  Likewise teams with NCAA RPI ranks better than 27 always have gotten at large selections and teams with ranks poorer than 57 never have.

It is possible to match up teams' end-of-season scores for all of the factors with the factor standards to produce a table like the following one for the 2024 season.  There is an explanation following the table.


The table includes all NCAA RPI Top 57 teams that were not conference champion Automatic Qualifiers.  It is limited to the Top 57 since no team ranked poorer than #57 has ever gotten an at large selection.

NCAA Seed or Selection: This column shows the Committee decision for each team:

 1, 2, 3, and 4 are for 1 through 4 seeds

4.5, 4.6, 4.7, and 4.8 are for 5 through 8 seeds

6 is for unseeded teams that got at large selections

7 is for unseeded teams that did not get at large selections

8 is for teams disqualified from at large selection due to winning percentages belows 0.500

Green is for at large selections and red is for not getting at large selections.

NCAA RPI Rank for Formation:  This is teams' NCAA RPI ranks.

At Large Status Based on Standards:  This column is based on the two grey columns on the right.  The first grey column shows the number of at large "In" factor standards a team has met and the second shows the number of at large "Out" standards for the team.  In 2024, the Tournament had 34 at large openings.  Counting down the "In" and "Out" columns, there were 31 teams that met at least 1 "In" standard and 0 "Out" standards.  In the At Large Status Based on Standards column, these teams are marked "In" and color coded green.  This means the standards identified 31 teams to get at large selections, leaving 3 additional openings to fill.  Counting down further, there were 5 teams that met 0 "In" and 0 "Out" standards.  This means those teams could not be definitively ruled 'In" but also could not be definitively ruled "Out."   This means those 5 teams should be "Candidates" for the 3 remaining openings.  They are marked "Candidate" and color coded yellow.  And counting down further are teams that met 0 "In" standards and at least 1 "No" standard.  This means the standards identified those teams as not getting at large selections.  They are marked "Out" and color coded red.

Supplementing the NCAA Tournament At Large Factor Standards with a Tiebreaker

If you look at the first table in this article, you will see that the NCAA RPI Rank and Top 50 Results Rank paired factor is the best individual indicator of which teams will get at large selections. After applying the factor standards method described above, it is possible to use Candidate teams' scores for this factor as a "tiebreaker" to decide which of those teams should fill any remaining at large openings.

The following table adds use of this tiebreaker to the factor standards method for the 2024 season:


In the table, the "NCAA RPI Rank and Top 50 Results Rank As At Large Tiebreaker" column shows teams' scores for that factor,  The lower the score, the better.  In the At Large Status Based on Standards and Tiebreaker Combined column, the "In" green cells are for teams that get at large selections based on the Standards plus those teams from the Candidates that get at large selections based on the Tiebreaker.  If you compare these to the actual NCAA Seeds or Selections on the left, you will see that the Standards and Tiebreaker Combined at large selections match the actual selections for all but 1 at large position.

In relation to the power of the RPI in directing Committee decisions, it is important to note that the Tiebreaker is based on teams' NCAA RPI Ranks and their ranks based on their Top 50 Results.  Teams' Top 50 Results scores come from a scoring system I developed based on my observations of Committee decisions.  The scoring system awards points based on good results -- wins and ties -- against opponents ranked in the NCAA RPI Top 50, with the awards depending on the ranks of the opponents and heavily slanted towards good results against very highly ranked opponents.  Since the Top 50 Results scores are based on opponents' NCAA RPI ranks, even this part of the Tiebreaker is NCAA RPI dependent.

The following table adds to the preceding tables an At Large Status Based on NCAA RPI Rank column to give an overall picture of how the different "selection methods" compare -- the Committee method, the NCAA RPI rank method, and the Standards and Tiebreaker Combined method.


 Summary Data

The following table shows a summary of the data for each year:


 In this table, I find the color coded information at the bottom in the High, Low, Average, and Median rows most informative.  The information in the green column shows the difference between what the Committee has decided on at large selections over the years as compared to what the decisions would have been if the NCAA simply used the NCAA RPI.  The information in the salmon column shows what the difference would have been -- about 1 1/3 positions per year, with a median of 1 -- if the NCAA used a more refined method than the NCAA RPI, but one stilll very heavily influenced by the NCAA RPI.

Altogether, the numbers suggest that the NCAA RPI exerts an almost determinative influence on which teams get NCAA Tournament at large positions.  This does not mean the Committee members think that is the case, they may believe that they are able to value other factors as much as or even more than the NCAA RPI.  But whatever the individual members think, the numbers suggest that the Committee as a whole is largely under the thumb of the NCAA RPI.

Given the fundamental flaws of the NCAA RPI, as discussed in 2025 Articles 1 and 2, the near-determinative power of the NCAA RPI in the NCAA Tournament at large selection process is particularly disturbing.

Tuesday, February 25, 2025

2025 ARTICLE 2: GRADING THE NCAA RPI AS A RATING SYSTEM, 2025 UPDATE

INTRODUCTION

My "correlator" program evaluates rating systems for Division I women's soccer.  The correlator uses the combined games data from seasons beginning with 2010.  I update the correlator's evaluations each year by adding the just-completed year's data..  The correlator data base now includes just short of 44,000 games.

I have completed the evaluation updates for the NCAA RPI and for the Balanced RPI following the 2024 season.

For games data from 2010 and after, for both rating systems, I use game results as though the "no overtime" rule had been in effect the entire time. 

For 2010 and after, for both rating systems I use ratings computed as though the 2024 NCAA decision, to count ties as 1/3 rather than 1/2 of a win for RPI formula Winning Percentage purposes, had been in effect the entire time. 

For 2010 and after, for the NCAA RPI, I use ratings computed as though the 2024 NCAA decision altering the RPI bonus and penalty adjustment structure had been in effect the entire time.  The Balanced RPI does not have a bonus/penalty structure.

Thus the evaluations use the entire data base to show how well the current NCAA RPI formula performs, as compared to the Balanced RPI.

Why use the Balanced RPI for the comparison?  It uses the same data as the NCAA RPI; and the NCAA could implement it as a replacement for the NCAA RPI simply by adjusting and adding to its current NCAA RPI computation program.  Thus it is a realistic measuring stick against which to evaluate the NCAA RPI.  Typically but not always, the Balanced RPI's rankings are similar to Massey's.

DISCUSSION

Ability of Teams to "Trick" the System Through Smart Scheduling



In 2025 Article 1, I showed that the NCAA RPI has significant differences between a team's NCAA RPI rank and its rank within the NCAA RPI formula as a Strength of Schedule Contributor to its opponents' ratings.  The above table shows this for the NCAA RPI as compared to the Balanced RPI.  The first three columns with numbers are self-evident.  The five columns on the right show, for each rating system, the percent of teams for which the NCAA RPI rank versus the RPI formula's SoS Contributor rank difference is 5, 10, 15, 20, and 25 or fewer positions.

As you can see from the entire table, for the Balanced RPI, teams' full ranks are either identical or almost identical to their SoS Contributor ranks.  For the NCAA RPI, the differences are big.

For the NCAA RPI, beccause of these differences, teams can "trick" the ratings and ranks through smart scheduling.  The following table shows this:


This table is from the 2024 season, so it shows a "real life" example.  Although the first two color coded columns refer to the ARPI Rank "2015 BPs," their numbers actually are for the 2024 version of the NCAA RPI.  And the two color coded columns on the right are for the Balanced RPI.  I've included the Massey ranks so you can use them as a "credibility" check for the Balanced RPI ranks.

If I am a coach looking for a non-conference opponent with my NCAA Tournament prospects in mind, I might think that a game against any of the four teams would have an about equal effect on my likely result, my NCAA RPI rank, and my NCAA Tournament prospects.  I would be wrong:

It is true that the NCAA RPI ranks will be telling the Women's Soccer Committee that the four teams are about equal.

In terms of the teams' true strength, however, Liberty and James Madison are considerably weaker than the NCAA RPI says and California is significantly stronger.  Both the Balanced RPI and Massey indicate that the true order of strength of the teams is Liberty as the weakest, followed by James Madison, then Oklahoma, then California as the strongest.

In addition, looking at the NCAA RPI formula's ranks of the teams as SoS Contributors to their opponents' NCAA RPI ratings, the order of contributions is Liberty as the best contributor, followed by James Madison, then California, then Oklahoma. 

Thus Liberty is the best team as an opponent.  The NCAA ranks tell the Committee it is equal in strength to the other three teams.  In terms of actual strength (Massey and the Balanced RPI), however, it is the weakest of the teams by a good margin.  And it will make the best contribution under the NCAA RPI formula to its opponents' Strengths of Schedule by a good margin.  And, by a comparable analysis, James Madison is second best as an opponent.  Thus by scheduling Liberty or James Madison and avoiding Oklahoma and California I am able to "trick" the system and the Committee into thinking I'm better than I really am.

In grading the NCAA RPI as a rating system, this ability of teams to "trick" the system through smart non-conference scheduling is a big "fail."  And, as the Balanced RPI shows, it is an unnecessary fail.

Ability of the System to Rate Teams from a Conference Fairly in Relation to Teams from Other Conferences


This chart, based on the NCAA RPI, shows the relationship between conferences' ratings and how their teams perform in relation to their ratings.  The conferences are arranged from left to right in order of strength: The conference with the best average rating is on the left and with the poorest on the right.  For each conference, the correlator determines what its teams' combined winning percentage in non-conference games should be based on the rating differences (as adjusted for home field advantage) between the teams and their opponents and also determines what their actual winning percentage is.  The axis on the left shows the differences between these two numbers.  For example, the conferences with the best rating, on the left, wins roughly 5% more games than it should according to its ratings.  The black line is a trend line that shows the relationship between conferencce strength and conference performance relative to ratings.  The formula on the chart shows what the expected difference is at any point on the conference strength line.

As you can see, stronger conferences perform better than their ratings say they should and weaker conferences perform more poorly.  In other words, stronger conferences are underrated and weaker conferences are overrated.  If you have read 2025 Article1, this is exactly as expected.


This table comes from the data underlying the above chart and a comparable chart for the Balanced RPI (see the chart, below).  The first three columns with numbers are based on individual conferences' performance in relation to their ratings.  In the Conferences Actual Less Likely Winning Percentage, High column is the performance of the conference that most outperforms its rating: The NCAA RPI's winning percentage for this conference is 4.9% better than it should be according to its rating.  In the Conference Less Likely Winning Percentage, Low column is the conference that most underperforms its rating: for the NCAA RPI by -7.1%.  The Conference Actual Less Likely Winning Percentage, Spread column is the difference between these two numbers: for the NCAA RPI 12.0%.  This last number is one measure of how good or poor the rating system is at rating conferences' teams in relation to teams from other conferences.

The fourth column with numbers, Conferences Actual Less Likely Winning Percentage, Over and Under Total, is the total amount by which all conferences' teams perform better or poorer than what their performance should be based on their ratings: for the NCAA RPI 64.6%.  This is another measure of how good or poor a rating system is at rating conferences' teams.

The fifth through seventh columns are for discrimination in relation to conference strength and come from the trend line formula in the conferences chart.  The Conferences Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, High is what the trend line says is the "expected"performance for the strongest conference on the left of the chart: for the NCAA RPI 4.3% better than it should be according to its rating.  And the Conference Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Low is for the epected performance of the weakest conference on the right: for the NCAA RPI 5.6% poorer than it should be.  The Actual Less Likely Winning Percentage Trend Related to Conference Average Rating, Spread is the difference between the High and the Low; for the NCAA RPI 9.9%.  This is a measure of the NCAA RPI's discrimination in relation to conference strength.

If you compare the numbers for the Balanced RPI to those for the NCAA RPI, you can see that (1) the NCAA RPI performs significantly more poorly than the Balanced RPI at rating teams from a confernce in relation to teams from other conferences and (2) the NCAA RPI discriminates against stronger and in favor of weaker conferences whereas the Balanced RPI has virtually no discrimination in relation to conferencee strength.

The following chart confirms that the Balanced RPI does not discriminate in relation to conference strength:


Ability of the System to Rate Teams from a Geographic Region Fairly in Relation to Teams from Other Geographic Regions

I divide teams among four geographic regions based on where the majority or plurality of their opponents are located: Middle, North, South, and West.


This chart, for the NCAA RPI, is lilke the first "conferences" chart above, but is for the geographic regions.  The regions are in order of average NCAA RPI strength from the strongest on the left to the weakest on the right.  Although the trend line suggests that the NCAA RPI discriminates against  stronger and in favor of weaker regions, I do not find the chart particularly persuasive and the R squared number on the chart supports this.  The R squared number is a measure of how well the data match up with the trend line.  An R squared number of 1 is a perfect match and 0 is no match at all.  The R squared number on the chart of 0.4 is a relatively weak match.  Thus although the chart may indicate some discrimination against stronger and in favor of weaker regions, region strength may not be the main driver of the region performance differences.

Here is a second chart for regions, but rather than relating the regions' performance to their strength it relates performance to their levels of internal parity as measured by the proportion of intra-regional ties.



This chart suggests that the higher the proportion of a region's intra-regional games that are ties and, by logical extension, the higher the level of parity within the region, the more its teams' actual performance in games against teams from other regions exceeds their expected performance based on their ratings.  And, as you can see from the R squared value, this trend line is much more representative of the data than for the chart based on region strength.  What this suggests is that the NCAA RPI has a problem properly rating teams from a region in relation to teams from other regions when the regions have different levels of intra-region parity.  It discriminates against regions with high intra-region parity and in favor of regions with less parity.  If you consider the description in 2025 Article 1 of how the NCAA constructs the RPI, this is what one would expect: The NCAA RPI rewards teams that play opponents with good winning percentages, largely without reference to the strength of those opponents' opponents.  If a region has a low level of parity, there are many intra-region opponents to choose from that will have good winning percentages.  But if a region has a high level of parity, there are fewer opponents to choose from that will have good winning percentages.

For further confirmation that one would expect the NCAA RPI to underrate regions with higher levels of parity and overrate regions with lower levels, see the "Why Does the NCAA RPI Have a Regions Problem?" section of the RPI: Regional Issues page at the RPI for Division I Women's Soccer website.


This table for regions is like the left-hand side of the table above for conferences.  As you can see it shows that the NCAA RPI does a poor job of rating teams from a region in relation to teams from the other regions, when compared to the job the Balanced RPI does.


This second table is like the right-hand side of the conferences table.  The first three columns with numbers are for the trend in relation to the proportion of ties -- parity -- within the regions and the next three columns are for the trend in relation to region strength.  As you can see, in relation to both parity and strength, the NCAA RPI has significant discrimination as compared to the Balanced RPI.

Here are the charts for the Balanced RPI, which you can compare to the above charts for the NCAA RPI:





Ability of the System to Produce Ratings That Will Match Overall Game Results


This table is a look at the simple question: How often does the better rated team, after adjustment for home field advantage, win, tie, and lose?  As you can see, compared to the Balanced RPI, the NCAA RPI's better rated team wins 0.6% fewer times.  This is not a big difference, since an 0.1% difference represents about 3 games per year, so 0.6% represents 18 games per year out of about 3,000 games.  Nevertheless, the Balanced RPI performs better.


This is like the preceding table except that it covers only games that involve at least one team in the rating system's top 60.  Since the NCAA RPI and Balanced RPI have different Top 60 teams, their Top 60 teams have different numbers of ties.  This makes it preferable to compare the systems based on how their ratings match with results in games that are not ties.  As you can see, after discounting ties, the Balanced RPI is consistent with results 0.2% of the time more than the NCAA RPI.  Here too, this is not a big difference since an 0.1% difference represents 1 game per year out of about 1,000 games involving at least one Top 60 team.

What this shows is that the difference between how the NCAA RPI and Balanced RPI perform is not in how consistent their ratings are with game results overall.  Both have similar error rates, with the Balanced RPI performing slightly better.

The difference is in where the systems' ratings miss matching with actual results.  In an ideal system, all "misses" are random, so that the system does not favor or disfavor any identifiable group of teams.  The Balanced RPI appears to accomplish this and shows, as a measuring stick, what one reasonably can expect a rating system to do.  As the conferences and geographic regions analyses show, the NCAA RPI does not accomplish this.

CONCLUSION

Based on the ability of schedulers to "trick" the NCAA RPI and on its conference- and region-based discrimination as compared to what the Balanced RPI shows is achievable, the NCAA RPI continues to get a failing grade as a rating system for Division I women's soccer.

Wednesday, January 8, 2025

2025 ARTICLE 1: NCAA RPI TEAM RANKS AS COMPARED TO NCAA RPI RANKS OF TEAMS AS CONTRIBUTORS TO OPPONENTS' STRENGTHS OF SCHEDULE

 INTRODUCTION

The NCAA RPI formula assigns values to:

1.  A Team's Winning Percentage (WP); and

2.  A Team's Strength of Schedule (SoS).

The formula is set so that each of these values accounts for 50%, in effective weight, of the Team's NCAA RPI rating.  The NCAA publicly acknowledges these effective weights.

The NCAA SoS value consists of two elements:

1.  The average of a Team's Opponents' Winning Percntages (OWP); and

2.  The average of a Team's Opponents' Opponents' Winning Percentages (OOWP).

The formula is set so that OWP accounts for 80% and OOWP for 20% of SoS, in effective weights.  The NCAA does not publicly acknowledge these effective weights.

Thus overall, the NCAA RPI value for a team consists of, in effective weights:

Winning Percentage 50%

Average of Opponents' Winning Percentages 40%

Average of Opponents' Opponents' Winning Percentages 10%

This compares to the NCAA RPI SoS contributor value for a team which consists of, in effective weights:

 Winning Percentage 80%

Opponents' Winning Percentage 20%

Because of the differences in these calculation methods, a team's NCAA RPI rating and rank can be and often are very different than its NCAA RPI SoS Contributor value and rank.  The NCAA does not publish teams'  SoS Contributor values and ranks and does not discuss that they are different than teams' NCAA RPI ratings and ranks.

How different are these two sets of numbers?  For the formula the NCAA currently uses:

The average difference between a team's NCAA RPI rank and its NCAA RPI SoS Contributor rank is 31.3 positions.

The median difference is 24 positions.

The maximum difference, since 2010, is 177 positions.

EFFECT OF THE NCAA RPI v SoS VALUE DIFFERENCES ON CONFERENCES 

In the Team Histories and Simulated 2025 Balanced RPI Ranks workbook, I have calculated for each team, for each year since 2010, the difference between the average NCAA RPI rank of its conference opponents and the average NCAA RPI SoS Contributor rank of those same opponents.  I then have calculated the average of those numbers over the period from 2010 through 2024.  I have done the same for non-conference opponents.

To show the effect of the NCAA RPI v SoS Value differences on conferences, I will start with the ACC as an example.  Here is a chart for Clemson, for whom the effect is typical for an ACC team.  Scroll to the right, if necessary, to see the entire chart:


In the chart, the blue lines are for Clemson's ACC conference opponents.  The dark blue line shows its conference opponents' average NCAA RPI ranks, year by year.  The light blue line shows their average NCAA formula ranks as SoS contributors.  As the chart shows, Clemson's ACC opponents' ranks as SoS contributors are consistently and significantly poorer than their actual NCAA RPI ranks, to the tune of 39 positions poorer on average.

The red and orange lines are for Clemson's non-conference opponents.  There is more variability here since Clemson's non-conference schedule can change significantly from year to year.  Nevertheless, in general Clemson's non-conference opponents' ranks as SoS contributors are poorer than their actual NCAA RPI ranks, to the tune of 16 positions on average.

The net effect on Clemson is that the NCAA RPI formula seriously underrates its conference opponents and also underrates its non-conference opponents.  This causes the formula as a whole to underrate Clemson.

Here is what the numbers show for the Atlantic Coast Conference as a whole.  (SMU, at the bottom of the table, is new to the conference and from a significantly weaker conferece and thus is not representative of the ACC's teams.  Stanford and California, at the top, are new but from the relatively equivalent Pac 12 and are relatively representative for the ACC.)

If you look at the third column of numbers for the teams, you will get a clear picture of how the NCAA RPI treats the ACC teams for SoS Contributor purposes.  Each team's average conference opponents' SoS Contribution to the team's NCAA RPI is 35 to 45 positions poorer than it should be according to the full NCAA RPI.  In addition, all of the teams' non-conference opponents' SoS Contributions are poorer than they should be, though to a lesser and more varying degree.

Next, I will show the information for a conference in the middle where the differences between teams' conference opponents' NCAA RPI ranks and NCAA formula ranks as SoS contributors are similar.  The Atlantic 10 is a good example, with Richmond as a representative team:


As you can see, for Richmond, its conference opponents' NCAA RPI ranks and their NCAA formula RPI SoS contributor ranks are quite similar.  On average, its conference opponents' NCAA formula SoS contributor ranks are only 2 positions poorer than their NCAA RPI ranks.  Although for Richmond's non-conference opponents there is more variability from year to year, overall on average there is no difference between the opponents' NCAA formula SoS contributor ranks and the NCAA RPI ranks.

Here is the table for the Atlantic 10 as a whole:


As you can see, for the Atlantic 10, the NCAA RPI formula gets their ratings about right.  (Note: Loyola Chicago joined the Atlantic 10 in 2022 and its numbers are not representative for the Atlantic 10.)

However, since the RPI significantly underrates teams from conferences at the level of the ACC, the NCAA RPI cannot get the Atlantic 10 rankings right, since teams from conferences at the level of the ACC might pass them in the rankings if properly rated.

And, here is information for a conference at the bottom of the spectrum, where teams' conference opponents' NCAA formula SoS contributor ranks are significantly better than their NCAA RPI ranks.  The Southland is the example, with Northwestern State as a representative team:


You can see that Northwestern State's conference opponents' NCAA formula SoS contributor ranks are better than their NCAA RPI ranks, on average 25 positions better.  Likewise its non-conference opponents' NCAA formula SoS contributor ranks are better, although less so, on average 13 positions better.

Here is the table for the Southland as a whole:


As you can see, the NCAA formula consistently over-ranks the Southland teams as NCAA formula SoS contributors and thus consistently overrates its teams.  For a conference like this, where it matters from an NCAA Tournament perspective, is if a team from the conference has an unusually good year and achieves an NCAA RPI rank that puts it in consideration for an NCAA Tournament at large selection or even for a seed.  In that case, the team will be over-ranked and thus may have bumped out of consideration a team from a strong conference, especially since teams from strong conferences are underrated.

TABLE OF ALL TEAMS, BY CONFERENCE

Below is a table of all the teams, arranged by conference so you can see the full NCAA RPI to NCAA RPI SoS rankings contrast for each conference.  It provides as clear and stark a demonstration as possible of the NCAA RPI's problem rating teams from a conference in relation to teams from other conferences.  The way the NCAA RPI is constructed, as discussed above in the Introduction, it can't do this properly.

When is this a problem from an NCAA Tournament perspective?  It is a problem whenever teams from under-ranked and over-ranked conferences are in the same NCAA RPI rank area for seeding or for at large selections.  And, it is a problem when teams from under-ranked conferences are outside the historic range for consideration for at large selections but really should be inside the range; and when teams from over-ranked conferences are inside the historic range for consideration for at large selections but should be outside the range.  Does the NCAA give the Women's Soccer Committee information about the NCAA RPI rank versus NCAA formula SoS Contributor rank differences so that the Committee can adjust its evaulations of teams to take the differences into account?  No.  Even if the NCAA were to give the Committee that information, would Committee members have the sophistication to properly take the differences into account?  Unlikely.  The solution?  Stop using the NCAA RPI and replace it with a better system.