RPI and Bracketology for D1 Women's Soccer Blogspace: 2021

Thursday, December 30, 2021

NCAA TOURNAMENT: ANALYZING THE COMMITTEE SEED AND AT LARGE DECISIONS

Each year, I analyze the Committee seed and at large selection decisions in relation to the Committee’s historic decision patterns. This year, I have set up a series of tables for the analysis. I will start with the #1 seed table, giving a detailed explanation of its data. Then, I will show the tables for each of the #2, #3, and #4 seeds and for the at large selections, with a few comments.

For each table, you will need to scroll to the right to see the entire table.

#1 Seeds

Here is the #1 seed table:

First, some general background to help with the table.

For at large selections, the NCAA has set specific data factors from the season that the Committee must use and is limited to using. The NCAA leaves it up to each Committee member to decide how much weight to assign to each factor. The Committee also uses the factors in the seeding process, but for seeding they are not mandatory and the Committee is not limited to the factors in evaluating teams.

I break the NCAA factors down into 13 individual factors:

RPI (adjusted)
RPI Rank
Non-Conference RPI (adjusted)
NCRPI Rank
Top 50 Results (my modification of an NCAA factor)
Top 50 Results Rank
Conference Standing (I use average of regular season standing and conference tournament finishing position)
Conference RPI
Conference RPI Rank
Head to Head Results Against Top 60 Opponents (my modification of an NCAA factor)
Results Against Common Opponents with Other Top 60 Teams (my modification of an NCAA factor)
Common Opponent Results Rank (my modification of an NCAA factor)
Poor Results (my modification of an NCAA factor)

There are NCAA data available that allow an evaluation of a team for each factor. For some of the factors, the NCAA has a scoring system. For example, an NCAA formula assigns a value for the RPI. Where the NCAA does not have a scoring system for a factor, I have created my own scoring system.

In addition, to mimic how a Committee member might think, I pair each factor with each other factor. For each factor pair, I have a scoring system that gives each factor a 50% weight.

In addition, I have one other factor, Number of Games Against Top 60 Opponents. This is not an NCAA mandated factor but rather is one I use as an aid to teams wishing to do their non-conference scheduling with a view towards the NCAA Tournament.

Altogether this produces a total of 92 individual and paired factors.

For each Top 60 team for each year since 2007 (the first year I began collecting data), my computer program has computed a score for each of the 92 factors. (Hereafter, I will refer to my computer program and process as my "system.") My system then compares the scores of all of the Top 60 teams since 2007 to the Committee seeding and at large selection decisions. From this comparison, the system identifies two scores for each factor:

A "Yes" score, which means that for a particular Committee decision, if a team has had the Yes score or better for that factor, the team always has gotten a favorable decision from the Committee. For example, if a team has had an RPI Rank of 1, it always has gotten a #1 seed. I call such a Yes score the Yes standard for that factor. Thus <=1 is the RPI Rank Yes standard for a #1 seed.

A "No" score, which means that for a particular Committee decision, if a team has had the No score or poorer for that factor, the team never has gotten a favorable decision from the Committee. For example, if a team has had an RPI Rank of 8 or poorer, it never has gotten a #1 seed. Thus >=8 is the RPI Rank No standard for a #1 seed.

Note: As the data turn out, there are some factors, for some Committee decisions, that do not have a Yes standard or that do not have a No standard. For example, a team’s Conference Standing does not have a Yes standard for any Committee decision. In other words, your conference standing, all by itself and without reference to what conference you are in, will not assure you of any seed position or of an at large selection.

On completion of the regular season including conference tournaments, my system tallies up all of the Yes and No standards a team has met, for each Committee decision -- #1, 2, 3, and 4 seeds and at large selections. For each team, for each Committee decision, there are four possible outcomes:

The team meets one or more Yes standards and no No standards. This means that if the Committee follows its historic pattern, the team will get a Yes decision from the Committee.

The team meets no Yes standards and one or more No standards. This means that if the Committee follows its historic pattern, the team will get a No decision from the Committee.

The team meets no Yes standards and no No standards. This means that either a Yes or a No decision from the Committee will be consistent with its historic pattern.

The team meets one or more Yes standards and one or more No standards. This means that the team has a profile the Committee has not seen historically. Whatever decision the Committee makes cannot be fully consistent with its historic pattern.

With that background, the above table for #1 seeds shows data related to each of the teams with RPI ranks #1 through 7. This is the candidate group for #1 seeds, since the No standard for a #1 seed is >=8.

Committee Decision: Green means the Committee gave the team a #1 seed. Red means it did not.

RPI Rank

Top 50 Results Rank: I have included this in the table because historically the factor pair of RPI Rank and Top 50 Results Rank has proved to be a good predictor of Committee decisions, especially for at large selections.

Yes Standards Met and No Standards Met: This is the number of Yes and No standards for #1 seeds that the team has met.

Yes Standard: If a team has met one or more Yes standards but does not get a Yes decision from the Committee, it is useful to know what the Yes standard is, so I list it here. For example, Florida State had an RPI Rank of #1. Suppose the Committee had not given it a #1 seed. Then in this column you would have seen 2 RPI Rank (the 2 preceding RPI Rank simply is a number I have assigned to that standard).

Yes Value: If I have listed a Yes Standard, I will state the standard score here. For example, for a #1 seed, the RPI Rank Yes standard is <=1, which means that teams with RPI ranks of #1 always have gotten #1 seeds. If the Committee had not given Florida State a #1 seed, you would have seen <=1 in this column.

Yes Actual: If I have listed a Yes standard, I also state the team’s actual score for the standard. So, if the Committee had not given Florida State a #1 seed, you would have seen 1 in this column representing its RPI rank.

No Standard. If a team has met one or more No standards but gets a Yes decision from the Committee, it is useful to know what the No standard is, so I list it here. For example, the RPI Rank No standard for a #1 seed is >=8. If the Committee had given a #1 seed to the #8 RPI team, I would have listed 2 RPI Rank here.

No Value. If I have listed a No standard, I will state the standard score here. If the Committee had given the RPI #8 team a #1 seed, you would have seen >=8 in this column.

No Actual: If I have listed a No standard, I also state the team’s actual score for the standard. So, if the Committee had given the #8 team a #1 seed, you would have seen 8 in this column.

Teams Affected: If the Committee has given a No to a team that has met a Yes standard or a Yes to a team that has met a No standard, this column will give an indication of how significant a change the Committee has made from its historic pattern for that factor. For example, since 2007 through 2019 there were 13 teams with #1 RPI ranks and they all received #1 seeds. If the Committee had not given Florida State a #1 seed this year, then you would have seen 13 in the Teams Affected column. This would mean that there are 13 teams that, based on past history, we would have thought assured of #1 seeds but that, based on the Committee decision this year, no longer could be considered as having been assured of #1 seeds. The lower the Teams Affected number, the smaller the Committee change from its historic pattern. If the Teams Affected number is 0, it means the team has a score for the factor and Committee decision that is just next to the historic standard and that the Committee has not seen before so that the Committee decision simply represents a refinement of the previous standard.

As a further note about Teams Affected, to give the numbers some context:

The #1 seed candidate range is teams with RPI Ranks of #7 or better, so the number of candidates for #1 seeds since 2007 has been 13 x 7 = 91. So when you are looking at a Teams Affected number for #1 seeds, it is that number of teams out of a total of 91.

The #2 seed candidate range is RPI Ranks of #14 or better, so the number of #2 seed candidates has been 13 x 14, less the 52 teams that got #1 seeds, which amounts to a #2 seed candidate pool of 130 teams.

The #3 seed candidate range is RPI Ranks of #23 or better, so the number of #3 seed candidates has been 13 x 23, less the 104 teams that got #1 and 2 seeds, which amounts to a #3 seed candidate pool of 195 teams.

The #4 seed candidate range is RPI Ranks of #26 or better, so the number of #4 seed candidates has been 13 x 26, less the 156 teams that got #1 through 3 seeds, which amounts to a #4 seed candidate pool of 182 teams.

The At Large candidate range is RPI Ranks of #57 or better, less the 208 seeded teams, the Automatic Qualifiers in the Top 57, and teams in the Top 57 that failed to meet the 0.500 minimum winning record requirement. Altogether since 2007, this has amounted to 447 teams.

Round Eliminated: This column shows the NCAA Tournament round this year in which the team was eliminated. It lets you look at the position the NCAA assigned the team and see how the team did in relation to that assigned position. For example, Virginia’s #1 seed means that according to the Committee it should have made it at least to the semifinals, but instead it made it only to the 3rd round. This lets you evaluate how the Committee decisions worked out. (For first round matchups between unseeded teams, I treat the home team as the stronger team according to the Committee.)

With the above explanation, I leave it to you to go up to the #1 Seeds table and see how the Committee decisions match up with its historic pattern. My comment is that the Committee did not deviate from its historic pattern except when it had to with Duke and Virginia due to their meeting both Yes and No standards and that for those teams, the Committee deviation was small.

#2 Seeds

Here is the #2 seed table:

You can review the table and reach your own conclusions. My comment is that in giving UCLA a #2 seed, the Committee deviated from its historic pattern. The deviation was not large but also was not insignificant. As an alternative, Tennessee would have been an easy #2 seed.

#3 Seeds

Here is the #3 seed table:

My comment is that there is nothing of major import in the Committee decisions.

#4 Seeds

Here is the #4 seed table:

There is only one significant Committee deviation from its historic patterns here, and it is the #4 seed given to BYU. This was a pretty large deviation. Ironically, BYU reached the championship game.

At Large

Here is the At Large table:

My comment is that the only deviation here from Committee historic patterns is in its giving St. Johns an at large position rather than West Virginia, Colorado, Oregon, or Houston. In looking at St. Johns’ Teams Affected numbers, however, the deviation was extremely small.

Summary and Two Additional Pieces of Information

My evaluation of the Committee decisions in relation to historic Committee patterns suggests that:

1. The Committee At Large selections were quite consistent with historic patterns and, where it varied with St. Johns, the variation was very small.

2. The Committee seeds were largely consistent with historic patterns. Where the Committee varied from historic patterns, most of the variations were small. The greatest variation was BYU getting seeded, which is ironic since BYU made it to the championship game.

In addition to the above, I have looked at two other aspects of the Committee decisions.

First, I looked at geographic regions based on the states where teams are located. As it turned out this year, during the season teams from states in the West played roughly 90% of their games against other teams from the West and only 10% against teams from other regions. This created a big problem for the RPI since 10% is not enough games for the RPI to properly rank teams from a region in relation to teams from other regions. Because of this, I wanted to see if teams from the West performed differently in the Tournament than the Committee had evaluated them. The following table addresses this question:

In this table, the Wins Difference column shows, for each region, the difference between (1) the number of games the Committee seeds and bracket placements for unseeded teams indicated teams should win and (2) the number of games teams actually won. The numbers in this table represent how teams from a region did against teams from other regions, since all within-region games cancel each other out. Although the numbers in the table are not large, they suggest that the Committee may have over-evaluated teams from the South and under-evaluated teams from the other regions and particularly the West.

Second, I took a similar look, but this time by conference.

This suggests that the Committee may have undervalued the West Coast Conference and, to a much lesser extent, the Big 10 and overvalued the ACC and Pac 12.

Since both of these tables are based on only one year’s results, I do not take them too seriously. They suggest, however, that it might be worthwhile to do a study that considers more years of NCAA Tournaments, to see if there are any Committee region- or conference-based overvaluation and undervaluation patterns.

Sunday, November 7, 2021

END OF REGULAR SEASON NCAA TOURNAMENT BRACKET SIMULATION 11.7.21

[Note: In the below table, Xavier should be a #4 seed rather than Auburn. Also, Santa Clara should be a 5 AQ rather than a 6. And Further Note: On relooking at why I originally had Auburn as a #4 seed, I realized the system I use has it as a clear Yes for a #4 seed, so it indeed should be a #4 seed and not Xavier.]

Here are my simulated NCAA Tournament seeds and at large selections, based on my more complex system described here. The table also includes the Automatic Qualifiers.

1 = #1 seed, 2 = #2 seed, 3 = #3 seed, 4 = #4 seed, 5 = unseeded automatic qualifier, and 6 = unseeded at large selection.

Below the table, I will indicate next teams in line for seeds and at large selections.

Other possible seeds:

#1 seed: Virginia, Arkansas

#2 seed: Georgetown

#3 seed: Princeton

#4 seed: Mississippi, Hofstra, Harvard

At Large:

Last in: NC State, Santa Clara, Providence, Wisconsin, West Virginia, Oregon

Next in line: Alabama, Colorado, Houston, St. Johns, Indiana

Monday, November 1, 2021

2021 RPI: 10.31.21 RPI RATINGS (ACTUAL CURRENT AND SIMULATED END OF SEASON), AND SIMULATED NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

This week’s reports use actual results of games played through Sunday, October 31. They are:

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for potential seeds and at large selections;

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played. The simulated results are based on opponents’ actual current RPI ratings. This report includes simulated NCAA Tournament at large selections and seeds based on the simple system described here.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system described here.

For the tables below, you may need to scroll to the right to see the entire table.

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections:

Here is a link to an Excel workbook that shows current RPI and other information for all teams.

In addition, here is a table from the workbook. On the left, it shows which teams, based on past history, are in the current seed and at large selection ranges as of this stage of the season. It also includes the next group of teams, that appear to be close but out of the range for an at large selection. If you look at the At Large Bubble column, the highest ranked teams at the top of the list that are not color coded likely are assured of getting at large selections even if not conference automatic qualifiers, based on past history.

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played:

The simulated results of games not yet played are based on opponents’ actual current RPI ratings, as adjusted for home field advantage. This report includes simple-system simulated NCAA Tournament at large selections and seeds. [NOTE: I have more confidence in the more complex system simulated Tournament selections and seeds described in part 3 below.]

Here is a link to an Excel workbook that shows the information for all teams.

In addition, below is a table from the workbook that shows simulated simple-system NCAA Tournament at large selections and seeds.

I have put the table in RPI rank order this week for a particular reason. The table includes (1) the Top 57 RPI teams, since all at large selections since 2007 have come from the Top 57, plus (2) the Automatic Qualifiers. The simple system selects unseeded Automatic Qualifiers based on a formula that combines team RPI ranks and their ranks based on their good results against Top 50 opponents, with those two factors weighted equally. The Top 50 results ranks are based on a scoring system that is very highly skewed towards good results (wins or ties) against very highly ranked opponents. The simple system uses this dual factor because on average, over the years since 2007 (excluding 2020), the dual factor rank correctly matches all but 2 per year of the Committee unseeded at large selections.

In the At Large Selections column, the color coding shows the teams to which the simple system assigns at large selections. If a team is neither an Automatic Qualifier nor color coded in the At Large Selections column, it is a team that is among the Top 57 but that the simple system does not assign an at large selection. In the Seed columns, the color coding shows the teams that the system assigns seeds (with the seeds based on RPI ranks).

The table shows an interesting anomaly: The simple system assigns Harvard a #3 seed (as the #11 RPI team -- which has dropped to #12 as of November 2), but does not assign it an at large selection since it falls too far down on the dual factor list due to having no good Top 50 results (wins or ties). In response to this and because it will have educational value, here is a detailed analysis of Harvard’s record.

First, I looked to see whether the differences between Harvard’s RPI rating and its opponents’ ratings seemed appropriate based on game results. One of the things my system does is compute the RPI rating difference between opponents as adjusted for home field advantage. Then, based on that difference and historic data, it computes the likelihood of the higher rated team winning, tieing, and losing the game. If I then combine all of those likelihoods for a team’s schedule, I can tell what the ratings say the team’s win-tie-loss results should be for those games if the ratings for the teams are correct in relation to each other. I did this first for Harvard’s non-conference games and the RPI ratings say that Harvard’s record over the course of those games should have been 7 wins, 1 tie, and 0 losses (or possibly 7-0-1). In fact, its actual record was 7-1-0, exactly what it should have been if it and its opponents are rated correctly in relation to each other. I did this next for Harvard’s conference games, where the ratings say its record (so far) should be 5-0-1, whereas it actually is 4-0-2. What this suggests is that Harvard is rated appropriately in relation to the non-conference teams it played but is overrated in relation to its Ivy League partners Brown and Princeton.

Second, I looked at Harvard’s results against opponents it had in common with other current Top 60 teams. This showed the following, based on current RPI ranks as of November 2:

Harvard: win home v #60 St Johns who had a tie home v #30 Butler and a win home v #34 Providence

Harvard: win away v #106 Northeastern who had a win away v #25 Hofstra

Harvard: win home v #91 Kansas who had a win home v #45 West Virginia

Harvard: win home v #111 Penn who had a win home v #54 Rice

Harvard: loss home v #13 Brown who had a loss away v #25 Hofstra, a loss away v #15 Notre Dame, and a loss away v #34 Providence

Harvard: loss home v #18 Princeton, who had a tie away v #14 Georgetown and a loss home v #25 Hofstra

Harvard: win home v #128 Dartmouth who had a tie away v #14 Georgetown

Looking at the common opponent results as a whole, they suggest a Harvard rank in the vicinity of #25 Hofstra to #34 Providence, possibly closer to the Hofstra side of that range.

Putting these two detailed looks at Harvard together, it appears that Harvard’s current RPI rating and rank are too high. Realistically, it probably should be outside the range for a seed. On the other hand, it seems well inside the range for an at large selection.

The above analysis can give some insight into the process the Committee must go through.

Interestingly, my more complex system, as shown in the last table below, does not seed Harvard but gives it an at large selection. An at large selection certainly would be consistent with the history of Committee decisions, which always have given at large selections to teams with RPI ranks of #30 or better.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system:

Finally, below is a table that shows the simulated more-complex-system NCAA Tournament at large selections and seeds.

Of the at large teams, the last teams in are Butler, Wisconsin, Houston, and West Virginia. The next teams in line are Oregon, Alabama, and Colorado.

1 = #1 seed, 2 = #2 seed, 3 = #3 seed, 4 = #4 seed, 5 = unseeded automatic qualifier, and 6 = unseeded at large selection.

Monday, October 25, 2021

2021 RPI: 10.24.21 RPI RATINGS (ACTUAL CURRENT AND SIMULATED END OF SEASON), AND SIMULATED NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

This week’s reports use actual results of games played through Sunday, October 24. They are:

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for potential seeds and at large selections;

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played. The simulated results are based on opponents’ actual current RPI ratings. This report includes simulated NCAA Tournament at large selections and seeds based on the simple system described here.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system described here.

For the tables below, you may need to scroll to the right to see the entire table.

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections:

Here is a link to an Excel workbook that shows RPI and other information for all teams.

NOTE: If you closely compare my ratings to those the NCAA has published, you will notice some minor differences. This is because the October 24 game between Alcorn State and Grambling somehow dropped out of the NCAA data base. Hopefully, that game will find its way back in. In addition, the NCAA still has not adjusted the ranges for its two penalty adjustment tiers. This has no significant effect, but also is causing some rating and ranking differences from mine, at the poorer end of the RPI.

Also, if you compare my ratings and ranks to the AllWhiteKit ratings, you may notice some very small differences. The rating differences are due to our systems following different rounding conventions. (For these differences, AllWhiteKit always will have a team rated 0.0001 higher than my rating.) The ranking differences are due to the fact that when the AllWhiteKit system has teams with equal ratings when rounded to four decimal places, it puts them in alphabetical order. My system puts teams in order based on calculations to 15 decimal places. Ordinarily, these differences are inconsequential.

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played:

Here is a link to an Excel workbook that shows the information for all teams. (NOTE: Due to a programming error (by me), the originally linked workbook had the wrong information. The currently linked workbook has the right information.)

In addition, here is a table from the workbook that shows simulated simple-system NCAA Tournament at large selections and seeds, plus the next teams in the RPI rankings down to #80 (some of which would not meet the NCAA Tournament 0.500 winning percentage requirement for at large selection). (NOTE: This is a corrected version of what I posted yesteray.)

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system:

Finally, below is a table that shows the simulated more-complex-system NCAA Tournament at large selections and seeds. It is worth noting that since I started keeping data in 2007, no team ranked poorer than #57 (using the current RPI formula) has gotten an at large selection.

Of the at large teams, the last teams in are Washington State, Santa Clara, and South Carolina. The next teams in line are Colorado, Butler, and Michigan State, followed by Oregon State, Georgia, Indiana, and Clemson.

1 = #1 seed, 2 = #2 seed, 3 = #3 seed, 4 = #4 seed, 5 = unseeded automatic qualifier, and 6 = unseeded at large selection.

Tuesday, October 19, 2021

2021 RPI: 10.17.21 RPI RATINGS (ACTUAL CURRENT AND SIMULATED END OF SEASON), AND SIMULATED NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

This week’s reports use actual results of games played through Sunday, October 17. They are:

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for potential seeds and at large selections;

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played. The simulated results are based on opponents’ actual current RPI ratings. This report includes simulated NCAA Tournament at large selections and seeds based on the simple system described here.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system described here.

For the tables below, you may need to scroll to the right to see the entire table.

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections:

Here is a link to an Excel workbook that shows RPI and other information for all teams.

In addition, here is a table from the workbook. On the left, it shows which teams, based on past history, are in the current seed and at large selection ranges as of this stage of the season. If you look at the At Large Bubble column, the highest ranked teams at the top of the list that are not color coded likely are assured of getting at large selections, based on past history.

NOTE: If you compare these ranks to those the NCAA has published, you will note that I have Duke and Arkansas ranked #2 and #3 respectively, whereas the NCAA has them in the reverse order. Those two teams have nearly identical ratings and the difference in their order is due only to the NCAA and I using different rounding conventions. This is an unusual occurence in this area of the ratings and I expect it will disappear in next week’s ratings and ranks.

In addition, if you get into comparing my ratings to the NCAA’s, you might notice that I have #70 Minnesota with an RPI rating of 0.5676 whereas the NCAA has them at 0.5670. The reason for this is that Minnesota has accrued an RPI penalty for a tie against Illinois-Chicago. There are two tiers of penalties, the higher penalties being for poor results against the bottom 40 teams and the lower penalties for results against the next-to-bottom 40. As new schools sponsor soccer each year, it is necessary to adjusted the penalty tiers in the RPI calculation system to match the total number of teams sponsoring soccer. I have done that, but this year the NCAA has not yet done it. As a result, the NCAA is treating Minnesota as having accrued a penalty as though its poor result was against a bottom 40 team whereas it actually accrued the result against a next-to-bottom 40 team. The NCAA has been aware for a couple of weeks that it needs to adjusted its penalty tiers and has said they will do it, but they have not done it yet. There are 15 other teams, farther down in the rankings, that this likewise affects. Ordinarily, it would not be a significant issue, but since Minnesota still is within at large selection range as of this stage of the season, it actually may become important that the NCAA make the needed correction.

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played:

Here is a link to an Excel workbook that shows the information for all teams.

In addition, here is a table from the workbook that shows simulated simple-system NCAA Tournament at large selections and seeds, plus the next ten teams:

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system:

Finally, below is a table that shows the simulated more-complex-system NCAA Tournament at large selections and seeds, plus the next 9 teams. It is worth noting that since I started keeping data in 2007, no team ranked poorer than #57 (using the current RPI formula) has gotten an at large selection.

1 = #1 seed, 2 = #2 seed, 3 = #3 seed, 4 = #4 seed, 5 = unseeded automatic qualifier, and 6 = unseeded at large selection.

Monday, October 11, 2021

2021 RPI: 10.10.21 RPI RATINGS (ACTUAL CURRENT AND SIMULATED END OF SEASON), AND SIMULATED NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

This week’s reports use actual results of games played through Sunday, October 10. They are:

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections;

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played. The simulated results are based on opponents’ actual current RPI ratings. This report includes simulated NCAA Tournament at large selections and seeds based on the simple system described here.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system described here.

For the tables below, you may need to scroll to the right to see the entire table.

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections:

Here is a link to an Excel workbook that shows RPI and other information for all teams.

In addition, here is a table from the workbook. On the left, it shows which teams, based on past history, are in the current seed and at large selection ranges as of this stage of the season. If you look at the At Large Bubble column, the highest ranked teams at the top of the list that are not color coded likely are assured of getting at large selections, based on past history.

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played:

Here is a link to an Excel workbook that shows the information for all teams.

In addition, here is a table from the workbook that shows simulated simple-system NCAA Tournament at large selections and seeds, plus the next five teams.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system:

Finally, below is a table that shows the simulated more-complex-system NCAA Tournament at large selections and seeds. The next five teams in line for at large selections would be West Virginia, Old Dominion, South Florida, Butler, and Clemson.

It is worth noting that this sytem has Harvard ending up ranked #4 by the RPI but going unseeded. If Harvard ends up ranked #4, it is unlikely to go unseeded.

1 = #1 seed, 2 = #2 seed, 3 = #3 seed, 4 = #4 seed, 5 = unseeded automatic qualifier, and 6 = unseeded at large selection.

Sunday, October 10, 2021

ON LINE RESOURCES: DATA, RATINGS, AND BRACKETOLOGY

There are some excellent on line resources for data, ratings, and bracketology information about Division I women’s soccer. Here are resources I recommend.

Data

I use three data sources:

NCAA Statistics This link will take you to the daily Division I women’s soccer Scoreboard page of the NCAA Statistics system. On that page, there are links to a number of other data resources.

At the beginning of each season, schools enter their schedules into the NCAA system. Then, as they play games, they enter the results into the system, including box scores. Once entered, the results appear on the Scoreboard page. Links to the box scores also appear on the Scoreboard page. And further, by clicking on a school name on the Scoreboard page, the system takes you to a page for that team that shows its entire schedule and results to date and has links to other data for the team and its players and coach.

At the top of each page are a number of tabs that take you to other areas of the NCAA statistics system. Of particular note, the RPI/NET Rankings tab will take you to a page with links to what are called the Nitty Gritties for sports. These are reports the NCAA sports committees use in making their decisions about NCAA Tournament seeding and at large selections. From the RPI/NET Rankings page, you can navigate to the Women’s Soccer Nitty Gritty reports, which contain information about each team including its RPI rating and rank. The NCAA publishes these reports weekly, starting with the sixth week of the season.

In addition, at the top of the Scoreboard page (and other system pages), you can use the Player Search tab to find data about a particular player and the Team Search box to go to a particular team’s page.

If you navigate around the NCAA Statistics system, you can find virtually all of the data within the system about teams, players, and coaches.

College Women’s Soccer Schedule (presented by All White Kit) This link will take you to the Composite Schedule page of the College Women’s Soccer Schedule website. This page gets updated with game results daily through the course of the season.

At the top of the Composite Schedule page, if you clink on the Information tab, you will get a drop down menu from which you can navigate to other pages, such as the Adjusted RPI page. As game results are entered over the course of the day, the system automatically updates, so if you go to the Adjusted RPI page you will get real time RPI ratings for teams. And, if you click twice on the Adjusted RPI column header, it will place the teams in rank order from best to worst.

In addition, on the Adjusted RPI page (and some of the other pages), if you click on a team name, it will take you to a team page with the team schedule and results to date plus some information about the team’s opponents and some team past history information. And, if you click on the box to the left of a team name, it will take you to the team schedule page at the school’s website.

If you navigate around this system, you will find it has lots of information about each team and a number of nifty bells and whistles.

School Athletics Websites The women’s soccer pages at school athletics websites are the other place I go for data about teams.

Ratings

NCAA Stastics This report will take you directly to the NCAA Nitty Gritty reports for Division I women’s soccer, referred to above. As stated above, these include team RPI ratings and ranks. You can use the Thru Games box at the top of the page to go to the weekly reports the NCAA has published over the course of the season, including the most recent report.

College Women’s Soccer Schedule (presented by All White Kit) This link will take you directly to the Adjusted RPI page for the College Women’s Soccer Schedule website. As stated above, this page has RPI ratings adjusted automatically as results are entered into the system over the course of each day.

Massey Ratings This link will take you to ratings by a system developed and maintained by Kenneth Massey. These are different than the RPI ratings, but are the best non-RPI on line ratings I know of. In particular, they do a good job of rating teams from different conferences and geographic regions in relation to each other (which is an area of weakness for the RPI).

RPI, Bracketology, and Scheduling

Here are the best resources for information about (1) the structure of the RPI and how it works, (2) the process for forming the NCAA Tournament bracket, and (3) the factors teams must consider, in relation to the NCAA Tournament, when developing their non-conference schedules.

RPI for Division I Women’s Soccer This link will take you to my website. It includes a detailed explanation of the RPI formula and how it works, of the rules and process the Women’s Soccer Committee follows in filling out the NCAA Tournament bracket, the Committee historic patterns in filling out the bracket, and factors coaches must consider when doing non-conference scheduling with a view to NCAA Tournament seeding and at large selections.

RPI and Bracketology for DI Women’s Soccer Blogspace This link will take you to my blog. Here, during the course of the season, I file weekly reports with information such as current RPI ratings and related information, simulated end-of-season ratings, and simulated NCAA Tournament brackets. In addition, over the course of the year I publish other articles about ratings, the patterns of the Women’s Soccer Committee in filling out the NCAA Tournament bracket, scheduling, revisions that would improve the RPI, and similar topics.

Chris Henderson on Twitter Chris Henderson maintains the College Women’s Soccer Schedule website referred to above. In addition, he maintains a comprehensive data base and system for evaluating Division I women’s soccer teams, players, and coaches. He publishes information ordinarily multiple times daily on Twitter and is one of the few people I follow there.

I hope you find these resources helpful. If you are aware of other high level resources, please let me know by email: cpthomas@q.com

Tuesday, October 5, 2021

2021 RPI: 10.3.26 ACTUAL RPI RATINGS (ACTUAL CURRENT AND SIMULATED END OF SEASON), AND SIMULATED NCAA TOURNAMENT AT LARGE SELECTION AND SEEDS

This week’s reports, using actual results of games played through Sunday, October 3, are:

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections;

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played. The simulated results are based on opponents’ actual current RPI ratings. This includes simulated NCAA Tournament at large selections and seeds based on the simple system described here.

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system described here.

For the tables below, you may need to scroll to the right to see the entire table.

1. Actual current RPI ratings and ranks, showing which teams are in the current ranges for seeds and at large selections:

Here is a link to an Excel workbook that shows RPI and other information for all teams.

In addition, here is a table from the workbook that shows the Top 100 RPI teams. On the left, it shows which teams are in the current seed and at large selection ranges. If you look at the at large selection column, the highest ranked teams at the top of the list that are not color coded likely are assured of getting at large selections, based on past history.

2. Simulated RPI ratings and ranks based on the actual results of games played and simulated results of games not yet played:

The simulated results are based on opponents’ actual current RPI ratings. This includes simulated simple-system NCAA Tournament at large selections and seeds.

Here is a link to an Excel workbook that shows the information for all teams.

In addition, here is a table from the workbook that shows the simulated simple-system NCAA Tournament at large selections and seeds, plus the next six teams:

3. Simulated NCAA Tournament bracket based on the simulated RPI ratings and ranks, using the more complex system:

Below is a table that shows the simulated more-complex-system NCAA Tournament at large selections and seeds. The next two teams in line for at large selections would be Washington State and UNC Wilmington.

1 = #1 seed, 2 = #2 seed, 3 = #3 seed, 4 = #4 seed, 5 = unseeded automatic qualifier, and 6 = unseeded at large selection.

Wednesday, September 29, 2021

SIMULATED NCAA 2021 TOURNAMENT BRACKET, ALTERNATE METHOD: 9.26.21

In past years, I have done simulated NCAA Tournament brackets using a much more complex system than the simpler system I have used in my preceding posts this year. I will continue reporting what the bracket looks like using the simpler system, as it emphasizes the two key factors related to the NCAA Tournament: RPI Rank and Top 50 Results rank, which I combine together into a single factor with each weighted 50 percent.

In addition, however, starting this week I will report on the results using the more complex system. It looks at a total of 92 factors and, based on how team data compares to those factors, identifies teams that historically always have gotten a positive decision or a negative decision for each Committee decision category: #1 through #4 seeds and at large selections. If the applying the factors leaves more open positions to fill, the system also identifies candidate (bubble) teams that might fill those positions. In the past, I have made my own educated guesses as to whom the Committee would select from the bubble teams, based on the data. Last summer, however, I did a study that identified the most successful factor at picking from the bubble teams for each Committee decision.

Thus, when there are bubble teams as to a decision, here is the factor that best matches the Committee decision for each decision category:

Teams to get a seed: ARPI Rank and Common Opponent Score Rank, a combined factor with each element weighted at 50 percent. With this factor, applied to the RPI top 26 teams, I identify the 16 teams to be seeded. Over the years from 2007 through 2019, this factor correctly identifies all but 14 teams getting seeds, thus missing about 1 per year. (Since 2007, no team ranked poorer than 26 has been seeded.)

#1 seeds: Adjusted Non-Conference RPI. I first apply the 92 factor system to the RPI top 7 teams, to identify teams that history says must get #1 seeds and must not get them. To the remaining top 7 teams, I apply the Adjusted Non-Conference RPI to identify the ones to fill any remaining #1 seed positions. Over time, this system correctly identifies all but 1 team getting #1 seeds, thus correctly identifying virtually all of them. (No team ranked poorer than 7 has gotten a #1 seed.)

#2 seeds: ARPI Rating and Conference Rank, a combined factor with each element weighted at 50 percent. I first apply the 92 factor system to the RPI top 14 teams (those to which I have not already assigned #1 seeds), to identify teams that history says must get #2 seeds and must not get them. To the remaining top 14 teams, I apply this combined factor to identify the ones to fill any remaining #2 seed positions. Over time, these steps for the #1 and 2 seeds correctly identify all but 4 teams getting #1 and 2 seeds combined, thus missing one about every three years. (No team ranked poorer than 14 has gotten a #2 seed.)

#3 seeds: ARPI Rank and Conference Rank, a combined factor with each element weighted at 50 percent. I first apply the 92 factor system to the RPI top 23 teams (those to which I have not already assigned #1 or #2 seeds), to identify teams that history says must get #3 seeds and must not get them. To the remaining top 23 teams, I apply this combined factor to identify the ones to fill any remaining #3 seed positions. Over time, these steps for the #1 through #3 seeds correctly identify all but 15 teams getting #1 through 3 seeds, thus missing a little over one per year. (No team ranked poorer than 23 has gotten a #3 seed.)

#4 seeds: Since I identify the 16 teams to be seeded in the first step above and have just seeded 12 of them, the remaining 4 get the #4 seeds. Over time, these steps for the #1 through #4 seeds correctly identify all but 11 teams getting #1 through 4 seeds, thus missing a little under one per year.

At large selections: ARPI Rank and Top 50 Results Rank, a combined factor with each element weighted at 50 percent. I first apply the 92 factor system to the RPI top 57 teams (those to which I have not already assigned seeds and that are not Automatic Qualifiers), to identify teams that history says must get at large selections and must not get them. To the remaining top 57 teams, I apply this combined factor to identify the ones to fill the still open unseeded at large positions. Over time, these steps for the at large selections correctly identify all but 14 teams getting at large positions, thus missing a little over one per year. (No team ranked poorer than 57 has gotten an at large selection.)

Using my simulated end of year results based on actual results of games played through September 26 and simulated results of games not yet played, including simulated conference tournaments, with the simulated results based on team actual current RPI ratings, this system produces the following simulated NCAA Tournament bracket. The four #1 through #4 seed pods are identified in the left-hand column as 1 through 4. The unseeded Automatic Qualifiers are 5. The unseeded at large selections are 6. (The teams not getting at large selections but next in line are Georgia and Stanford.)

Tuesday, September 28, 2021

2021 RPI: 9.19.26 ACTUAL RPI RATINGS, SIMULATED END OF REGULAR SEASON RPI RATINGS, NCAA TOURNAMENT AT LARGE SELECTION AND SEED RANGES, AND AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

This week, I am adding an additional report to the report I published in preceding weeks. Thus you will see here two reports:

Actual RPI Report with At Large Selection and Seed Ranges. This report includes

(1) the above link to an Excel workbook that has all teams ranked in RPI order, with detailed actual current RPI-related information on each team, based on games played through Sunday, September 26. It includes, to the left, color coding that shows the ranges within which teams historically, at this point in the season, are potential bubble teams for at large selections and #1, 2, 3, and 4 seeds. Teams ranked better than the at large bubble historically always have gotten at large selections (if not conference automatic qualifiers). Teams ranked poorer than the at large bubble never have gotten at large selections. The workbook also has two other pages, showing conference ranks and ranks of regional playing pools.

(2) below, a table drawn from the workbook showing the teams from RPI #1 through those in the current at large bubble.

Simulated RPI Report with Simulated NCAA Tournament Automatic Qualifiers, At Large Selections, and Seeds. This report includes:

(1) the above link to an Excel workbook that shows (a) full season data for teams based on the actual results of games played through Sunday, September 26 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data.

(2) below, a table drawn from the workbook showing the Top 100 teams in the simulation based on combined RPI Rank and Top 50 Results Rank. Simulated results of games not yet played are based on teams’ current actual RPI ratings.

The earlier post, 2021 Season: Background for Upcoming Reports, has a full explanation of the simulation process, its limitations, and how to use it to consider the prospects for a team. If you are a coach using the information to analyze your team’s prospects or otherwise have a serious interest in following the simulations, I strongly recommend you review in advance the Background post.

Both the actual RPI ratings and the simulated ratings remain primitive at this stage of the season, so bear that in mind. As the season progresses, each week’s current ratings and simulated end-of-season results will be closer to what the actual end-of-season results will be.

As an additional note this week, the NCAA staff has tweaked the RPI bonus and penalty formulas very slightly this year, in order to make them continue to be consistent with past Committee instructions. The tweaking is of no practical consequence. It also appears that the penalty tiers are slighly in error due to the NCAA not having adjusted them to reflect the addition of new teams to the field of schools with soccer. This too is of no practical consequence.

Here is the 9.26.21actual RPI table, with historic seed and at large selection ranges. You will have to scroll right to see the entire table.

Here is the 9.26.21 Simulated RPI Top 100 table. You will have to scroll right to see the entire table.

Tuesday, September 21, 2021

2021 RPI: 9.19.21 SIMULATED RPI RATINGS AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

Below is a table showing (1) full season data for teams based on the actual results of games played through Sunday, September 19 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data. These cover the Top 100 teams in the simulation based on combined RPI Rank and Top 50 Results Rank. Starting with this report, simulated results of games not yet played are based on teams’ current actual RPI ratings rather than on pre-season simulated ratings.

In addition, here is a link that shows the same data for all teams: 2021 Simulated RPI Report 9.19.21.

The earlier post, 2021 Season: Background for Upcoming Reports, has a full explanation of the simulation process, its limitations, and how to use it to consider the prospects for a team. If you are a coach using the information to analyze your team’s prospects or otherwise have a serious interest in following the simulations, I strongly recommend you review in advance the Background post.

Although I now am using teams’ current actual RPI ratings to simulate the results of games not yet played, the simulated end-of-season results remain pretty primitive at this stage of the season, so bear that in mind. As the season progresses, each week’s simulated end-of-season results will be closer to what the actual end-of-season results will be.

Also, I see a potential problem with how the RPI will work this year. In a typical season, teams play 18.1% of their games out-of-region (regions being primarily geographic playing pools: middle, northeast, south, and west). This is not a high enough percentage to allow the RPI to have high accuracy in rating teams from different regions in relation to each other, with the main result being that the RPI on average underrates teams from the west. (In other words, teams from the west on average have out-of-region game results that are better than their RPI ratings say they should be.) This year, the current rate of out-of-region games is only 15.0%. This is likely to create an even bigger problem this year in terms of the RPI’s ability to rate teams from different regions properly in relation to each other.

Here are the 9.19.21 Simulated RPI Top 100. You will have to scroll right to see the entire table.

Tuesday, September 14, 2021

2021 RPI: 9.12.21 SIMULATED RPI RATINGS AND NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS

Below is a table showing (1) full season data for teams based on the actual results of games played through Sunday, September 12 and simulated results of games not yet played and (2) simulated automatic qualifiers and at large selections for the NCAA Tournament, based on those data. These cover the Top 100 teams in the simulation based on combined RPI Rank and Top 50 Results Rank.

In addition, here is a link that shows the same data for all teams: 2021 Simulated RPI Report 9.12.21.

As you may notice, the simulation overrates or underrates a fair number of teams. This is a result of the way the simulation works and the fact that it does not use the 2020 season results due to their being unreliable. This problem will not be a factor next week, as for that simulation I will start using actual current RPI ratings as the basis for simulating the results of games not yet played. On the other hand, even the actual current RPI ratings next week will be pretty primitive, so as always you must bear that in mind.

Here are the 9.12.21 Simulated RPI Top 100. You will have to scroll right to see the entire table.