There are four statistics-based rating systems for Division I women's soccer:
NCAA RPI
KPI (sometimes referred to as the Kevin Pauga Index although I think I've also seen the term Key Parameter Index)
Balanced RPI (my system, a modification of the NCAA RPI)
Massey Ratings
I evaluate how each rating system performs, using a series of performance measures.
For the NCAA RPI and Balanced RPI, I evaluate their performance since 2010, considering game results and calculating ratings as if there had been no overtimes throughout the period. For the NCAA RPI, I calculate ratings based on the formula the NCAA currently uses.
For the KPI, I evaluate its performance since 2017, which is the first year it produced ratings for Division I women's soccer,
For Massey, I evaluate its performance since 2022. Although Massey has done ratings for many years, his rating scale changed in 2022 (and also another time a few years earlier). His rating system also may have changed but I don't know since it is proprietary, so to be sure the evaluation is of his current system I use only performance since 2022. Since the Massey evaluation includes such a limited number of years, I consider it only a rough evaluation of the system.
Below, I will show a comparison table for each performance measure and then will explain what the table represents.
OVERALL
OVERALL PERFORMANCE
In each table, the systems are in order of NCAA RPI, KPI, Balanced RPI, Massey. This is intentional. For NCAA Tournament at large selection and seeding purposes, the Women's Soccer Committee uses the NCAA RPI as its rating system. For the past few years, the NCAA has allowed it to supplement the NCAA RPI with the KPI. That is why those systems are first in order, they are what the Committee sees during its decision process. The Committee is not allowed to use any other rating system. As I go through the measures, you will see that the NCAA RPI's and the KPI's performances are relatively similar. Although I do not know it for a fact, I suspect this is why the NCAA permits the Committee to use the KPI as a backup rating system.
In looking at the tables, you also will see that the Balanced RPI's and Massey's performance are quite similar, but are dis-similar from the NCAA RPI and the KPI. This is why the Balanced RPI and Massey are last in order. Here too although I do not know it for a fact, I suspect the dis-similarity between the NCAA RPI and the Balanced RPI and Massey is why the NCAA did not choose either of those systems as the Committee backup to the NCAA RPI.
The above table is based on looking at the opposing teams' ratings in each game, adjusting those ratings to take home field advantage into account, and determining the location-adjusted rating difference between the opponents. It then looks to see whether the better rated team won, tied, or lost the game. It does this for all games, then tallies the total wins, ties, and losses of the better rated teams, and then converts the totals to percentages of all games played.
Thus, using the NCAA RPI as an example, the better rated teams have won 65.1%, tied 21.1%, and lost 13.8% of their games. Since the KPI and Massey use different time periods of data, they show different ppercentages of ties than each other and than the NCAA RPI and the Balanced RPI. To accomodate this, I have added the green highlighted column, which shows the higher rated teams' win and loss percentages without reference to tie games. I consider this the best basis for comparing how the rating systems perform.
As the table shows in the green highlighted column, the Balanced RPI's ratings are the most consistent with game results -- the better rated team wins 83.5% of the time, followed by the KPI (82.8%), Massey (82.6%), and the NCAA RPI (82.5%).
PERFORMANCE IN GAMES INVOLVING AT LEAST ONE TOP 60 TEAM
This table is similar to the first one above, but includes only games involving at least one Top 60 team. This is to show how the rating systems perform when considering games involving teams that might be candidates for NCAA Tournament at large positions.
You can see that the Balanced RPI performs the best, followed by the KPI, the NCAA RPI, and Massey.
CONFERENCES
CONFERENCE TEAMS' ACTUAL RESULTS AS COMPARED TO THEIR EXPECTED RESULTS
This and the next two tables show how the rating systems perform at rating teams from a conference in relation to teams from other conferences.
The first step in this process is to determine, for each game, the exact location-adjusted rating difference between the two opponents. Based on that difference, the next step is to calculate each team's win likelihood, tie likelihood, and loss likelihood -- its expected results -- using a Result Probability Table for the rating system I'm evaluating. The third step is to compare a team's expected results to its actual results. The last step is to combine the expected and actual results for each conference's teams to see how the conference's actual win-tie-loss results compare to its expected win-loss-tie results.
The Result Probability Tables for the rating systems are extremely accurate when applied to all games played. In other words, the sums of the expected win-loss-tie results in all games match almost exactly the sums of the actual win-loss-tie results. This makes it possible, by breaking games into identifiable groups, such as a conference's teams' games against non-conference opponents, to see whether here too the expected results match actual results or whether they don't match. If they don't match for a group of teams, it means the rating sytstem has trouble rating the group's teams properly in relation to teams from other groups, resulting in the system overrating some groups of teams and underrating others.
Applying this evaluation method to how each conference's teams do in non-conference games produces the following table for the NCAA RPI:
This table has the conferences arranged in order of their teams' average NCAA RPI ratings. It shows, in the right-hand column, the amount by which each conference's teams' actual winning percentages are greater or less than their expected winning percentages.
The NCAA RPI row in the table at the top of this section draws from the above table. In the top table, the Conferences Non-Conference Actual Less Likely Winning Percentage, High column shows the difference amount for the Conference whose actual winning percentage most exceeds its expected winning percentage (i.e., the most underrated conference). The Low column shows the difference amount for the conference whose expected winning percentage most exceeds its actual winning percentage (i.e., the most overrated conference). The Spread column shows the difference between the High and Low numbers. This Spread is a measure of how well the rating system does at rating teams from a conference in relation to teams from other conferences: It represents the amount of the rating system's discrimination against the best performing -- most underrated -- conference as compared to the poorest performing -- most overrated -- conference. The Over and Under column shows the total amount by which all conferences perform better and worse than the Result Probability Table says they should -- in other words the rating system's overall discrimination among conferences.
There are similar conference-by-conference tables for the KPI, the Balanced RPI, and Massey. They all feed into the table at the top of this section.
As you can see from the top table, the Balanced RPI does the best job of rating teams from a conference properly in relation to teams from other conferences, followed by Massey. The NCAA RPI does the poorest job, with the KPI next but quite similar to the NCAA RPI's performance.
ACTUAL LESS EXPECTED PERFORMANCE IN RELATION TO CONFERENCE STRENGTH

The preceding section covered whether and, if so, by how much a rating system discriminates among conferences. It does not, however, cover whether there are patterns to the discrimination.
This section looks to see whether there are discrimination patterns. Specifically, it compares conferences' actual less expected winning percentages to their strength as represented by their average ratings, to see whether there discrimination patterns related to conference strength.
In the above table, the data for a rating system come from a chart for the system. Each systems' chart draws from a table like the long table in the preceding section that shows conferences' NCAA RPI ratings and their actual/expected results differences.
The following are the charts for the four systems. Below the first chart, for the NCAA RPI, there is an explanation of what the chart shows and of how it relates to the above table. Scroll to the right to see the entire chart.
In the chart, the vertical axis is for the difference between a conference's teams' actual and expected winning percentages. At the top of the chart, actual winning percentages are greater than expected winning percentages, meaning conferences' results against teams from other conferences are better than the ratings say they should be -- in other words, the ratings underrate the conferences. At the bottom, actual winning percentages are less than expected winning percentages, meaning conferences' results against teams from other conferences are poorer than the ratings say they should be -- in other words, the ratings overrate the conferences.
The horizontal axis is for the average of the conference teams' ratings. The conferences are arranged in order from the highest rated (strongest) conference on the left to the lowest rated (weakest) on the right.
The solid black line is a computer generated straight trend line that shows the relationship between conference strength and conference teams' actual performance against teams from other conferences as compared to their rating-based expected performance. The downward slope of the trend line indicates that stronger conferences, on the left, tend to perform better than their ratings say they should -- in other words tend to be underrated -- and weaker conferences, on the right, tend to perform more poorly than their ratings say they should -- in other words tend to be overrated.
On the chart, you can see the trend line's formula, which tells you what you can expect the actual/expected results difference to be at any point in the conference average NCAA RPI spectrum. You also can see an R squared value, in this case 0.6949. The R squared value is a measure of the strength of the relationship (consistency) between conferences' actual/expected results differences and conferences' strength. An R squared value of 1 means perfect consistency and of 0 means no consistency. For the NCAA RPI, the R squared value suggests that there is a relatively strong relationship between the NCAA RPI's underrating and overrating of conferences, on the one hand, and conference strength, on the other hand.
In the table at the beginning of this section, the NCAA RPI's row summarizes the NCAA RPI's overrating and underrating pattern in relationship to conference strength. The first gold highlighted column (High) shows the amount by which actual performance is better than expected performance at the top left of the trend line: by 4.3%. The second gold highlighted column (Low) shows the amount by which actual performance is poorer than expected performance at the bottom right of the trend line: by -5.6%. The fully gold highlighted column shows the difference (Spread) between these two numbers: 9.9%. This Spread is a measure of how much the NCAA RPI discriminates in relation to conference strength.
Below are comparable charts for the KPI, the Balanced RPI, and Massey. As you can see from the charts, the KPI shows the same discriminatory pattern as the NCAA RPI and has a similar R squared value. The Balanced RPI and Massey, on the other hand, show minimal discrimination among conferences. Further, the Balanced RPI's R squared value is much lower than for the NCAA RPI and the KPI, suggesting there is at most a weak relationship between the Balanced RPI's minimal discrimination among conferences and conference strength. And, Massey's R squared value is even lower, suggesting virtually no relationship between Massey's minimal discrimination and conference strength.
The above table summarizes the four charts and allows us to compare how the rating systems perform The table shows that the NCAA RPI and the KPI have significant discrimination against stronger and in favor of weaker conferences whereas the Balanced RPI and Massey have little discrimination.
ACTUAL LESS EXPECTED PERFORMANCE IN RELATION TO THE DIFFERENCE BETWEEN CONFERENCE TEAMS' NON-CONFERENCE OPPONENTS' RATINGS AS COMPARED TO THEIR NON-CONFERENCE OPPONENTS' RATINGS AS STRENGTH OF SCHEDULE CONTRIBUTORS
The preceding section shows that the NCAA RPI discriminates against stronger conferences and in favor of weaker ones, whereas the Balanced RPI doesn't. But it doesn't show why. This section shows why.
Both the NCAA RPI and the Balanced RPI, before computing teams' ratings, compute teams' strengths of schedule, which they then combine with the teams' winning percentages to produce their RPI ratings. In order to compute teams strengths of schedule, each system assigns to each team a rating as a strength of schedule contributor to its opponents. This makes it possible to compute, for each team, an opponents' average RPI rating and rank and also an opponents' average rating and rank as strength of schedule contributors. Although the KPI may and Massey does use strength of schedule calculations, I am not able to determine what a team's strength of schedule contribution to its opponents is for either system, so I am not able to compute teams' opponents' average ratings and ranks as strength of schedule contributors.
For the NCAA RPI, as I will show more explicitly later in this article, there are big differences between teams' NCAA RPI ratings and ranks and their NCAA RPI strength of schedule contributor ratings and ranks. The main difference between the Balanced RPI and the NCAA RPI is that the Balanced RPI involves a series of additional calculations designed to eliminate those differences.
The data in the above table come from the following two charts (which in turn come from other tables), the first for the NCAA RPI and the second for the Balanced RPI. There is an explanation below the NCAA RPI chart.
In this NCAA RPI chart, the vertical axis is for the difference between a conference's teams' actual and expected winning percentages. At the top of the chart, actual winning percentages are greater than expected winning percentages, meaning conferences' results against teams from other conferences are better than the ratings say they should be -- in other words, the ratings underrate the conferences. At the bottom, actual winning percentages are less than expected winning percentages, meaning conferences' results against teams from other conferences are poorer than the ratings say they should be -- in other words, the ratings overrate the conferences.
The horizontal axis shows the amount by which conference teams' opponents' NCAA RPI ratings exceed their ratings as strength of schedule contributors under the NCAA RPI formula. On the left is the conference whose teams' opponents' NCAA RPI ratings have the greatest excess over their NCAA RPI strength of schedule contributor ratings. On the right, the conference teams' opponents' NCAA RPI ratings have the greatest deficit below the NCAA RPI strength of schedule contributor ratings.
As the trend line shows, the greater the excess of conference teams' NCAA RPI ratings over their NCAA RPI ratings as strength of schedule contributors, the greater the excess of conference teams' actual performance over their rating-based expected performance. You also can see that the trend line's R squared value is 0.8565. This relatively high R squared value suggests that the NCAA RPI's difference between conference teams' opponents' ratings and their ratings as strength of schedule contributors is the cause, or at least the primary cause, of the actual/expected results differences, in other words of the NCAA RPI's discrimination among conferences.
Looking at this table together with the preceding one leads to the following conclusion: The NCAA RPI, because of its disconnect between teams' NCAA RPI ratings and their NCAA RPI ratings as strength of schedule contributors, discriminates against teams from stronger conferences and in favor of teams from weaker conferences.
A case study from the 2025 season illustrates how the mechanics of the NCAA RPI formula cause this:
Liberty, from Conference USA, had an NCAA RPI rank of 45. Cal, from the Big Ten, had an NCAA RPI rank of 46. In other words, the NCAA RPI rated them essentially the same.
But, Liberty had an NCAA RPI strength of schedule contributor rank of 26. Cal's rank was 92.
Thus for all the Conference USA teams Liberty played during the season, they got credit within their RPI ratings for having played the #26 team. On the other hand, Cal's Big 10 opponents got credit for playing only the #92 team. This caused Liberty's Conference USA opponents to be overrated and Cal's Big Ten opponents to be underrated.
Why the difference between Liberty's and Cal's NCAA RPI ranks as strength of schedule contributors when their NCAA RPI ranks were essentially the same? The difference is due to the way the NCAA RPI formula computes a team's strength of schedule contribution to its opponents. Under the NCAA RPI formula, a team's strength of schedule contribution to its opponents consists of 80% the team's winning percentage and only 20% the team's opponents' strength. Liberty, being a top team in a mid-major conference, had a higher winning percentage than Cal but played weaker opponents due to being in a weaker conference. Cal, being a mid-level team in a strong conference, had a lower winning percentage but played stronger opponents due to being in a stronger conference. Liberty's situation is representative of teams at or near the top of mid-major and weaker conferences' standings. Cal's is repreentative of strong conferences' teams that are not at or near the top of their conferences. The result of this is that the RPI formula treats teams from stronger conferences as having played weaker opponents (like Cal) than they actually played and teams from weaker conferences as having played stronger opponents (like Liberty) than they actually played. The end result, as the above table shows, is that the NCAA RPI underrates teams from stronger conferences and overrates teams from weaker ones.
Here is the chart for the Balanced RPI:
Look at the Balanced RPI's differences between conference teams' opponents' ratings and their ratings as strength of schedule contributors to their opponents, as shown across the bottom of this chart. Then look at the same differenes at the bottom of the preceding NCAA RPI chart. You will see that for the Balanced RPI, any differences are minimal as compared to the NCAA RPI.
The Balanced RPI chart shows that the minimal differences under the Balanced RPI bear no relationship to the differences between conferences' actual performance and their expected performance. Simply eyeballing the chart, you can see this, as the data points appear to be distributed randomly around the trend line, suggesting there is no relationship between conference teams' opponents' ratings/strength of schedule contributor ratings differences and conferences' actual/expected resuts differences. The trend line's R squared value of virtually 0 confirms this.
The table at the top of this section summarizes what you see in the two charts. For the NCAA RPI, the actual/expected performance spread is 9.6%. This is the level of discrimination among conferences due to the NCAA RPI rating/strength of schedule rating difference. By comparison, the Balanced RPI has no discrimination among conferences due to Balanced RPI rating/strength of schedule rating differences.
REGIONS
REGION TEAMS' ACTUAL RESULTS AS COMPARED TO THEIR EXPECTED RESULTS
This is like the first conference table above, but is for the four geographic regions within which teams tend to play their games: Middle, North, South, and West.
As the table shows, the NCAA RPI and the KPI have trouble rating teams from a geographic region properly in relation to teams from other geographic regions. Some regions' teams' actual results against out-of-region opponents are better than their NCAA RPI and KPI ratings say they should be and other regions' results are poorer. For the Balanced RPI and Massey, however, results are about as they should be.
ACTUAL LESS EXPECTED PERFORMANCE IN RELATION TO REGION STRENGTH

This table -- based on trend charts like those for conferences -- shows that the NCAA RPI and KPI both discriminate against stronger regions and in favor of weaker ones whereas the Balanced RPI and Massey have almost entirely eliminated the discrimination.
The above table and its underlying trend charts are based on the following data:
For the NCAA RPI:
NOTE: For the NCAA RPI's regions trend chart, the R squared value is 0.4142. This suggests that although there may be a relationship between the NCAA RPI's over- and under-rating of regions' teams and region strength, the relationship is not as strong as for conferences.
For the KPI:
NOTE: For the KPI's regions trend chart, the R squared value is 0.4170, very similar to the value for the NCAA RPI.
For the Balanced RPI:
NOTE: For the Balanced RPI's regions trend chart, the R squared value is 0.5940. This might suggest the Balanced RPI retains a very small amount of the NCAA RPI's discrimination among regions in relation to region strength.
For Massey:
NOTE: For Massey's regions trend chart, the R squared value is 0.4132, similar to the values for the NCAA RPI and the KPI.
ACTUAL LESS EXPECTED PERFORMANCE IN RELATION TO THE DIFFERENCE BETWEEN TEAMS' NON-REGION OPPONENTS' RATINGS AS COMPARED TO THEIR NON-REGION OPPONENTS' RATINGS AS STRENGTH OF SCHEDULE CONTRIBUTORS
This table -- again based on trend charts and underlying tables like those for conferences -- shows that for the NCAA RPI, the disconnect between region teams' opponents' overall ratings and their ratings as strength of schedule contributors results in some regions having actual winning percentages in non-region games that are better than their expected winning percentages and other regions having actual winning percentages that are poorer than their expected winning percentages. The Balanced RPI, on the other hand, minimizes this problem.
NOTE: For the NCAA RPI, the trend chart's R squared value is 0.8458. For the Balanced RPI's trend chart, the value is 0.7637.
RANK/STRENGTH OF SCHEDULE CONTRIBUTOR RANK DIFFERENCES FOR TEAMS
As the above information shows, the NCAA RPI formula's differences between how it rates and ranks teams overall and how it rates and ranks them as strength of schedule contributors causes it to discriminate against stronger conferences and in favor of weaker ones. And, it cause the NCAA RPI formula to discriminate against stronger regions and in favor of weaker ones.
In that context, it is worth comparing the NCAA RPI and the Balanced RPI in terms of the size of the differences between teams' overall ranks and their ranks as strength of schedule contributors. The following table shows this comparison.
As you can see, for the NCAA RPI the average difference between a team's overall rank and its rank as a strength of schedule contributor is 31.3 positions, the median is 24 positions, and the greatest difference is 177 positions. The Balanced RPI, on the other hand, essentially has eliminated these differences with an average difference of 0.3, a median of 0, and a maximum difference of 7 positions.
In the portion of the table to the right, I have highlighted the column that shows the percentage of teams for which the RPI rank/strength of schedule contributor rank is 15 or fewer positions: 37% for the NCAA RPI and 100% for the Balanced RPI.
I have highlighted the 15 or fewer positions column because it relates to non-conference scheduling. I consult with teams on scheduling. In general, if a team is choosing between potential opponents likely to have similar overall ranks it should play the opponent likely to have the better strength of schedule contributor rank. Teams' ranks from year to year, however, including their ranks as strength of schedule contributors, are variable. I selected the 15 or fewer positions column since it seems reasonable to plan on teams with expected strength of schedule contributor ranks that are more than 15 positions better than their overall ranks to be likely, in the future, to have strength of schedule contributor ranks that are better than their overall ranks, and on teams with expected strength of schedule contributor ranks that are more than 15 positions poorer than their overall ranks to be likely to have strength of schedule contributor ranks that are poorer than their overall ranks. Thus with the NCAA RPI having a 15 or fewer rank position difference for 37% of teams, that means it has 63% with differences greater than 15 positions. That is enough teams to make it possible, in non-conference scheduling, to trick the NCAA RPI by playing non-conference opponents whose contributions to your strength of schedule will be significantly better than the overall NCAA RPI says they should be. This is not possible, however, for the Balanced RPI.
Using the Liberty as compared to Cal example from above, although the two have essentially the same NCAA RPI ranks, if the Committee is using the NCAA RPI formula and a team has the ability to choose between Liberty and Cal as a non-conference opponent, the team always should choose Liberty. The NCAA formula will treat the team as having played the #26 ranked team if it plays Liberty, rather than the #92 team if it plays Cal.
Thus it is possible, under the NCAA RPI, for teams to "game" the ratings through "smart" scheduling. This is not possible under the Balanced RPI.
TEAM ACTUAL/EXPECTED WINNING PERCENTAGE DIFFERENCES
This table is like ones above for conferences and regions, but shows, for individual teams, how the systems' ratings perform at matching teams' actual results.
As you can see, in terms of performance as measured by the Spread between the team that historically most out-performs its rating and the team that most underperforms, the KPI does the poorest job, followed by Massey and the NCAA RPI, with those three being relatively close. The Balanced RPI does much better. And when one looks at the Over and Under Total -- the amount by which all teams deviate from having their actual and expected performances match, the Balanced RPI is far superior to the other systems.
Finally, for those who want to see the individual team details, the following tables show, for the different rating systems, all the teams and their actual/expected result differences. The teams are arranged by conference and, within each conference, in order of the extent by which their actual performance exceeded their expected performance. Teams with positive percentages had actual winning percentages that exceed their ratings-based expected winning percentages. Teams with negative percentages had actual winning percentages that were poorer than their expected winning percentages. If you spend time examining the NCAA RPI table, you can get an excellent picture of the NCAA RPI's discriminatory patterns in relation to both conferences and regions.
NCAA RPI TABLE
BALANCED RPI TABLE
KPI TABLE
MASSEY TABLE