RPI and Bracketology for D1 Women's Soccer Blogspace: September 2021

Wednesday, September 29, 2021

SIMULATED NCAA 2021 TOURNAMENT BRACKET, ALTERNATE METHOD: 9.26.21

In past years, I have done simulated NCAA Tournament brackets using a much more complex system than the simpler system I have used in my preceding posts this year. I will continue reporting what the bracket looks like using the simpler system, as it emphasizes the two key factors related to the NCAA Tournament: RPI Rank and Top 50 Results rank, which I combine together into a single factor with each weighted 50 percent.

In addition, however, starting this week I will report on the results using the more complex system. It looks at a total of 92 factors and, based on how team data compares to those factors, identifies teams that historically always have gotten a positive decision or a negative decision for each Committee decision category: #1 through #4 seeds and at large selections. If the applying the factors leaves more open positions to fill, the system also identifies candidate (bubble) teams that might fill those positions. In the past, I have made my own educated guesses as to whom the Committee would select from the bubble teams, based on the data. Last summer, however, I did a study that identified the most successful factor at picking from the bubble teams for each Committee decision.

Thus, when there are bubble teams as to a decision, here is the factor that best matches the Committee decision for each decision category:

Teams to get a seed: ARPI Rank and Common Opponent Score Rank, a combined factor with each element weighted at 50 percent. With this factor, applied to the RPI top 26 teams, I identify the 16 teams to be seeded. Over the years from 2007 through 2019, this factor correctly identifies all but 14 teams getting seeds, thus missing about 1 per year. (Since 2007, no team ranked poorer than 26 has been seeded.)

#1 seeds: Adjusted Non-Conference RPI. I first apply the 92 factor system to the RPI top 7 teams, to identify teams that history says must get #1 seeds and must not get them. To the remaining top 7 teams, I apply the Adjusted Non-Conference RPI to identify the ones to fill any remaining #1 seed positions. Over time, this system correctly identifies all but 1 team getting #1 seeds, thus correctly identifying virtually all of them. (No team ranked poorer than 7 has gotten a #1 seed.)

#2 seeds: ARPI Rating and Conference Rank, a combined factor with each element weighted at 50 percent. I first apply the 92 factor system to the RPI top 14 teams (those to which I have not already assigned #1 seeds), to identify teams that history says must get #2 seeds and must not get them. To the remaining top 14 teams, I apply this combined factor to identify the ones to fill any remaining #2 seed positions. Over time, these steps for the #1 and 2 seeds correctly identify all but 4 teams getting #1 and 2 seeds combined, thus missing one about every three years. (No team ranked poorer than 14 has gotten a #2 seed.)

#3 seeds: ARPI Rank and Conference Rank, a combined factor with each element weighted at 50 percent. I first apply the 92 factor system to the RPI top 23 teams (those to which I have not already assigned #1 or #2 seeds), to identify teams that history says must get #3 seeds and must not get them. To the remaining top 23 teams, I apply this combined factor to identify the ones to fill any remaining #3 seed positions. Over time, these steps for the #1 through #3 seeds correctly identify all but 15 teams getting #1 through 3 seeds, thus missing a little over one per year. (No team ranked poorer than 23 has gotten a #3 seed.)

#4 seeds: Since I identify the 16 teams to be seeded in the first step above and have just seeded 12 of them, the remaining 4 get the #4 seeds. Over time, these steps for the #1 through #4 seeds correctly identify all but 11 teams getting #1 through 4 seeds, thus missing a little under one per year.

At large selections: ARPI Rank and Top 50 Results Rank, a combined factor with each element weighted at 50 percent. I first apply the 92 factor system to the RPI top 57 teams (those to which I have not already assigned seeds and that are not Automatic Qualifiers), to identify teams that history says must get at large selections and must not get them. To the remaining top 57 teams, I apply this combined factor to identify the ones to fill the still open unseeded at large positions. Over time, these steps for the at large selections correctly identify all but 14 teams getting at large positions, thus missing a little over one per year. (No team ranked poorer than 57 has gotten an at large selection.)

Using my simulated end of year results based on actual results of games played through September 26 and simulated results of games not yet played, including simulated conference tournaments, with the simulated results based on team actual current RPI ratings, this system produces the following simulated NCAA Tournament bracket. The four #1 through #4 seed pods are identified in the left-hand column as 1 through 4. The unseeded Automatic Qualifiers are 5. The unseeded at large selections are 6. (The teams not getting at large selections but next in line are Georgia and Stanford.)

Tuesday, September 28, 2021

2021 RPI: 9.19.26 ACTUAL RPI RATINGS, SIMULATED END OF REGULAR SEASON RPI RATINGS, NCAA TOURNAMENT AT LARGE SELECTION AND SEED RANGES, AND AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

This week, I am adding an additional report to the report I published in preceding weeks. Thus you will see here two reports:

Actual RPI Report with At Large Selection and Seed Ranges. This report includes

(1) the above link to an Excel workbook that has all teams ranked in RPI order, with detailed actual current RPI-related information on each team, based on games played through Sunday, September 26. It includes, to the left, color coding that shows the ranges within which teams historically, at this point in the season, are potential bubble teams for at large selections and #1, 2, 3, and 4 seeds. Teams ranked better than the at large bubble historically always have gotten at large selections (if not conference automatic qualifiers). Teams ranked poorer than the at large bubble never have gotten at large selections. The workbook also has two other pages, showing conference ranks and ranks of regional playing pools.

(2) below, a table drawn from the workbook showing the teams from RPI #1 through those in the current at large bubble.

Simulated RPI Report with Simulated NCAA Tournament Automatic Qualifiers, At Large Selections, and Seeds. This report includes:

(1) the above link to an Excel workbook that shows (a) full season data for teams based on the actual results of games played through Sunday, September 26 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data.

(2) below, a table drawn from the workbook showing the Top 100 teams in the simulation based on combined RPI Rank and Top 50 Results Rank. Simulated results of games not yet played are based on teams’ current actual RPI ratings.

The earlier post, 2021 Season: Background for Upcoming Reports, has a full explanation of the simulation process, its limitations, and how to use it to consider the prospects for a team. If you are a coach using the information to analyze your team’s prospects or otherwise have a serious interest in following the simulations, I strongly recommend you review in advance the Background post.

Both the actual RPI ratings and the simulated ratings remain primitive at this stage of the season, so bear that in mind. As the season progresses, each week’s current ratings and simulated end-of-season results will be closer to what the actual end-of-season results will be.

As an additional note this week, the NCAA staff has tweaked the RPI bonus and penalty formulas very slightly this year, in order to make them continue to be consistent with past Committee instructions. The tweaking is of no practical consequence. It also appears that the penalty tiers are slighly in error due to the NCAA not having adjusted them to reflect the addition of new teams to the field of schools with soccer. This too is of no practical consequence.

Here is the 9.26.21actual RPI table, with historic seed and at large selection ranges. You will have to scroll right to see the entire table.

Here is the 9.26.21 Simulated RPI Top 100 table. You will have to scroll right to see the entire table.

Tuesday, September 21, 2021

2021 RPI: 9.19.21 SIMULATED RPI RATINGS AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

Below is a table showing (1) full season data for teams based on the actual results of games played through Sunday, September 19 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data. These cover the Top 100 teams in the simulation based on combined RPI Rank and Top 50 Results Rank. Starting with this report, simulated results of games not yet played are based on teams’ current actual RPI ratings rather than on pre-season simulated ratings.

In addition, here is a link that shows the same data for all teams: 2021 Simulated RPI Report 9.19.21.

The earlier post, 2021 Season: Background for Upcoming Reports, has a full explanation of the simulation process, its limitations, and how to use it to consider the prospects for a team. If you are a coach using the information to analyze your team’s prospects or otherwise have a serious interest in following the simulations, I strongly recommend you review in advance the Background post.

Although I now am using teams’ current actual RPI ratings to simulate the results of games not yet played, the simulated end-of-season results remain pretty primitive at this stage of the season, so bear that in mind. As the season progresses, each week’s simulated end-of-season results will be closer to what the actual end-of-season results will be.

Also, I see a potential problem with how the RPI will work this year. In a typical season, teams play 18.1% of their games out-of-region (regions being primarily geographic playing pools: middle, northeast, south, and west). This is not a high enough percentage to allow the RPI to have high accuracy in rating teams from different regions in relation to each other, with the main result being that the RPI on average underrates teams from the west. (In other words, teams from the west on average have out-of-region game results that are better than their RPI ratings say they should be.) This year, the current rate of out-of-region games is only 15.0%. This is likely to create an even bigger problem this year in terms of the RPI’s ability to rate teams from different regions properly in relation to each other.

Here are the 9.19.21 Simulated RPI Top 100. You will have to scroll right to see the entire table.

Tuesday, September 14, 2021

2021 RPI: 9.12.21 SIMULATED RPI RATINGS AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

Below is a table showing (1) full season data for teams based on the actual results of games played through Sunday, September 12 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data. These cover the Top 100 teams in the simulation based on combined RPI Rank and Top 50 Results Rank.

In addition, here is a link that shows the same data for all teams: 2021 Simulated RPI Report 9.12.21.

As you may notice, the simulation overrates or underrates a fair number of teams. This is a result of the way the simulation works and the fact that it does not use the 2020 season results due to their being unreliable. This problem will not be a factor next week, as for that simulation I will start using actual current RPI ratings as the basis for simulating the results of games not yet played. On the other hand, even the actual current RPI ratings next week will be pretty primitive, so as always you must bear that in mind.

Here are the 9.12.21 Simulated RPI Top 100. You will have to scroll right to see the entire table.

Wednesday, September 8, 2021

2021 RPI: 9.6.21 SIMULATED RPI RATINGS AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

Below is a table showing (1) full season data for teams based on the actual results of games played through September 6 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data. These cover the Top 100 teams in my simulated RPI Ranks.

In addition, here is a link that shows the same data for all teams: 2021 Simulated RPI Report 9.6.21.

For purposes of evaluating the reliability of what you are seeing:

In terms of consistency of actual game results with simulated game results, so far the simulated higher rated team, after adjustments for home field advantage, has won 66.3% of the time, tied 10.7%, and lost 23.0%. Once the season is over and we have actual RPI ratings based on the actual game results, history says that the consistency of actual game results with actual RPI ratings, adjusted for home field, will be that the higher rated team won 72.7%, tied 10.7%, and lost 16.6%.

This reflects that the simulation overrates or underrates a fair number of teams. On the other hand, if you look at significant groups of teams, as a whole they will perform about as the simulation indicates they will. Thus if you look at the entire group of home teams plus the first team in the alphabet for neutral site games and if you use the simulation to predict entire group results based on the win-tie-loss likelihood in each game, the simulation says that for the games actually played so far, the entire group would win 478.8 games, tie 88.0, and lose 290.2. In fact, the entire group has won 481, tied 92, and lost 284.

Here are the 9.6.21 Simulated RPI Top 100. You will have to scroll right to see the entire table.