RPI and Bracketology for D1 Women's Soccer Blogspace: January 2022

Each year, in forming the NCAA Tournament bracket, the Women’s Soccer Committee selects 4 #1 seeds, 4 #2s, 4 #3s, and 4 #4s and places them in the bracket accordingly.

If we think of the #1 seeds as 1.1, 1.2, 1.3, and 1.4, it seems clear the Committee places 1.1 at the top left of the bracket. After that, although it is not always clear, it appears to place 1.2 at the top right, 1.3 at the bottom right, and 1.4 at the bottom left. (Over the last 11 seasons, but excluding 2020, the 1.1 top left have won 44 Tournament games, the 1.2 top right 41, the 1.3 bottom right 41, and the 1.4 bottom left 36. It is possible the 1.2 and 1.3 positions are reversed from what I have indicated.)

As for the #2, #3, and #4 seeds, it is not clear to me how the Committee places them. For purposes of this article, it does not matter.

In addition, in the Tournament first round, there are 16 games between unseeded opponents. In these games, the Committee appears to give home field to the teams it considers stronger.

When the Committee makes these decisions, it is supposed to be free from bias, both in terms of the regions teams come from and the conferences they play in. But is it really free from bias? When I looked at the results of the 2021 Tournament, it occurred to me that this might be a good question to research. So I did.

My approach was to see how many rounds regions’ and conferences’ teams actually won in each of the last ten Tournaments as compared to how many the Committee, based on its seed and host decisions, thought they would win. In terms of how many games the Committee decisions say teams should win, here are the numbers:

#1.1 6 games (champion)
#1.2 5 games (runner up)
#1.3 and 1.3 4 games (semi-finalists)
#2s 3 games (quarterfinalists)
Unseeded pairs, 1st round host 1 game (get to second round)
Unseeded pairs, 1st round visitor 0 games (lose in first round)

Unseeded, playing seed in 1st round 0 games (lose in first round)

Regions

This year, I am changing my approach to regions by using four strictly geographic regions based on where a state’s teams play the greatest numbers of their games. (Given the frequency of teams shifting conferences, this seems a better long term approach for studying regions than using conferences as the building blocks for regions.)

Middle: Illinois, Indiana, Iowa, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin (64 teams)

North: Connecticut, Delaware, Maine, Maryland, Massachusetts, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Washington DC (79 teams)

South: Alabama, Arkansas, Florida, Georgia, Kansas, Kentucky, Louisiana, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia (139 teams)

West: Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, Wyoming (62 teams).

For regions, I look at each region team’s bracket position to see how many games the positioning says it should win. Then I look at actual results to see how many games it actually won. From the games it actually won, I subtract the number the Committee positioning says it should have won. If it did better than expected, it will have a positive number result. If it did as expected, it will have a 0 result. If it did poorer than expected, it will have a negative number result.

If two teams from the same region play each other, then either each will perform as expected and both will get 0 results from the game. Or, neither will perform as expected and one will get a +1 and the other a -1 result, with these numbers canceling each other out from a region perspective. Thus when I add up all the numbers for a region’s teams, the total will reflect how the region did in games against other regions.

The following table summarizes how the regions actually did as compared to how their Tournament positioning says they should have done:

The table gives one kind of look at the data. The totals suggest that the Committee decisions, for whatever reason, have been wrongly biased in favor of the South region and against the North and Middle regions and, to a lesser extent, the West region.

A different kind of look considers how the numbers have trended over time:

In this chart, 2011 is on the left and 2021 on the right. The circle markers are the region data points, connected by the solid lines. The dotted lines are the trends (straight line trends). The trend lines suggest that the Committee has slightly underrated the Middle region (blue) over time, but been pretty consistent about it. In the past the Committee slightly overrated the North region (red) but that has trended downward and now is right about where it should be. The Committee started the decade rating the South (grey) about right, but is trending towards significantly overrating it. And conversely, the Committee started the decade overrating the West (gold), but is trending towards fairly significantly underrating it.

You can decide for yourself how seriously to take what the table and chart show. The numbers, however, are correct.

Conferences

For conferences, the process is the same. A problem, however, is that teams have changed conferences during the 2011 to 2021 period. I did not want to be moving teams from one conference to another, so I have treated teams as though they were in their current conference throughout the period.

Here, the totals make it look like the Committee has been biased in favor of the American, Big 12, Pac 12, and SEC conferences in particular and against the Big 10 and Colonial, and especially the West Coast conference in particular.

The trends, however, give a different look:

The chart covers only the conferences that have had + or - results in at least half of the years: the Power 5 plus the American, Big East, and West Coast. I have eliminated the lines connecting the data points to make it easier to see the trend lines.

Since the colors are a little hard to read, I will start at the top left of the chart and move down through the trend lines and what they suggest:

For the ACC (blue), the Committee has gone from significantly underrating it to significantly overrating it.

For the Big 10 ( black), the Committee has slightly underrated it pretty consistently.

For the Pac 12 (gold), the Committee started out rating it about right and is trending towards slightly overrating it.

For the Big East (green), the Committee consistently has rated it about right.

For the American (grey), the Committee started out rating it about right and is trending towards slightly overrating it.

For the West Coast (dark blue), the Committee has gone from overrating it to significantly underrating it.

For the SEC (dark green), the Committee has gone from underrating it to rating it about right.

For the Big 12 (purple-brown), the Committee has gone from underrating it to slightly underrating it.

Here again, you can decide for yourself how seriously to take what the table and chart show. The numbers, however, are correct.

Conclusion

From both region and conference perspectives, the data suggest the possibility that, in seeding and in assigning home field to first round unseeded pairings, the Committee may have region and conference biases that are wrongly influencing its decisions. Although this is just a possibility, perhaps it is something the Committee should bear in mind in its future bracket formation decisions.

RPI and Bracketology for D1 Women's Soccer Blogspace

Friday, January 7, 2022

GRADING THE COMMITTEE ON ITS NCAA TOURNAMENT SEED AND HOME FIELD DECISIONS