When making NCAA tournament at large selections and seeds, the Women’s Soccer Committee considers a number of data-based factors. For the at large selections, the Committee must base its decisions on these factors. For the seeds, the Committee considers these factors but is not bound by them.
What factors have the greatest effect on the Committee decisions? What is the thinking process of the Committee in getting to its decisions? For coaches whose teams are contenders for at large selections and seeds, these are important questions.
The details are confidential of what the individual Committee members think, what the Committee discussions are, and the reasoning by which the Committee makes its decisions. In today’s world, however, it is possible, with the aid of a computer and programming, to analyze the relationship between the data the Committee uses and the decisions it makes. This can let us say that the Committee makes decisions "as if" the Committee thinks in a particular way.
At the RPI for Division I Women’s Soccer website, starting with the NCAA Tournament: Predicting the Bracket, At Large Selections page and the page that follows it, I have described a computer-driven system I have used for the last few years to analyze the Committee’s decisions. The system is set up based on matching data for the factors the Committee considers to the Committee’s actual decisions. It lets me say that although I do not know how the Committee members think they make their decisions, it is "as if" they make them this way.
The Factors. Here are the primary factors the Committee is supposed to consider. For factors where there is not an obvious "score" associated with the factor, I have set up a system that assigns a score to each team based on its game results. For a more detailed description of the factors and the scoring systems I have set up for them, go to the page linked above at the RPI website.
RPI rating
RPI rank
Non-conference RPI rating
Non-conference RPI rank
Results against Top 50 opponents
Top 50 results rank
Combined conference tournament and conference regular season standing
Conference average ARPI
Conference average ARPI rank
Top 60 head to head results
Top 60 common opponent results
Top 60 common opponent results rank
Poor results
Number of Top 60 opponents: This is not a factor set by the NCAA, although it could fall under the heading Strength of Schedule. Nevertheless, it is something my system calculates as an aid to coaches with NCAA tournament aspirations. It gives them a picture of how many Top 60 opponents they should think about scheduling.
Paired factors: In addition to those single factors, my system also uses paired factors. For each single factor, I pair it with each other factor weighted 50-50 for each factor in a pair. (I do not use the Number of Top 60 Opponents in setting up the paired factors.)
The Standards. My system compares all of the Committee’s at large and seeding decisions over the last 13 years with the Top 60 teams’ factor scores. This is for teams that did and did not get at large selections, #1 seeds, #2 seeds, and so on. The system tells me that teams scoring x or better on a particular factor always got a 👍 decision from the Committee, whereas teams scoring y or poorer always got a 👎. As examples, teams ranked #1 by the RPI always have gotten a #1 seed. Teams ranked #8 or poorer never have gotten a #1 seed.
This process, in most cases, gives me a "yes" and a "no" standard for each factor (whether single or paired) for each of the at large, #1 seed, #2 seed, and so on, decisions.
To see the standards, go to the RPI website page linked above for at large selections (bottom of the page) and the page that follows it for seeds.
The Bracket. At the end of the season, my system applies the standards to the factor scores of the Top 60 teams. Teams that only meet "yes" standards for a decision get a 👍 for that decision, for example an at large selection. Teams that only meet "no" standards get a 👎 decision. Ordinarily, this will leave a number of teams in the middle that meet no "yes" and no "no" standards. If all the positions -- at large selections or particular seeds -- are not filled yet, these teams are candidates to fill the vacant positions. Also, for a new year, occasionally there are teams that meet both some "yes" and some "no" standards. These are teams with profiles the Committee has not seen over the last 13 years (and that will require me to update the standards for future years to incorporate what the Committee decided as to these teams). They also are candidates to fill the vacant positions.
New Addition to the System. The above describes the system I have used for several years. To improve the system, I have worked this year to see if there is an additional step I can add to the system so it will come closer to exactly matching the Committee’s decisions. The additional step would select teams from among the candidates for as yet unfilled at large selection and seed positions based on the question, "All other things being equal, what should be the ‘decider’ for selection purposes?"
To do this, I looked at each of the single factors and the most "powerful" of the paired factors to see, if I used one of them as the "decider," how many of that "decider’s" decisions would match the Committee decisions.
My work produced quite clear results: Decisions based on the paired RPI rank and Top 50 results rank factor produce the best match with the Committee decisions. In other words, if my system fills vacant slots by choosing from the candidate teams those with the best scores for that paired factor, my system comes closer to the teams the Committee actually selected than it would using any other factor. The only exception is for the #1 seeds. There, Poor Results is the only factor that picks the correct teams to fill the #1 seed vacancies.
Results. Here is how well the updated system works when applied retrospectively to the last 13 years:
At large selections:
435 postions to be filled
369 positions filled by standards
66 postions left to be filled
51 of the still open positions correctly filled by decider
15 positions incorrectly filled
In five of the years, the system matches all of the Committee at large selections; in four of the years, it misses 1; in one year, it misses 2; and in three years, it misses 3.
#1 seeds:
52 positions to be filled
50 positions filled by standards
2 positions left to be filled
2 positions correctly filled by decider
0 decisions incorrectly filled
#2 seeds:
52 positions to be filled
40 positions filled by standards
12 positions left to be filled
8 positions correctly filled by decider
4 positions incorrectly filled
In ten of the years, the system matches all of the Committee #2 seeds; in two of the years, it misses 1; in one year, it misses 2.
#3 seeds:
52 positions to be filled13 positions filled by standards
39 positions left to be filled
23 positions correctly filled by decider
16 positions incorrectly filled
In three of the years, the system matches all of the Committee #3 seeds; in five of the years, it misses 1; in four of the years, it misses 2; in one of the years, it misses 3.
#4 seeds:
52 positions to be filled
26 positions filled by standards
26 positions left to be filled
18 positions correctly filled by decider
8 positions incorrectly filled
In seven of the years, the system matches all of the Committee #4 seeds; in four of the years, it misses 1; in two of the years, it misses 2.
Finally regarding seeds, if we disregard the seed positions and look only at teams that got seeded, in six of the years the system seeds and the Committee seeds were identical; in four of the years, it misses 1; and in 3 of the years it misses 2. In other words, of 208 teams getting a seed over the last 13 years, the system matches 198 of them and misses 10.
Thus for seeds, although the system has difficulty especially with #3 seeds, its issues with seeds ordinarily relate not to whether a team should be seeded but whether it should be seeded #3 or #4.
For illustration, I will use the 2019 Committee decisions as compared to the system decisions:
At large selections: The Committee decisions and system decisions match except that the Committee gave Utah an at large selection and the system would have given it to Alabama.
#1 seeds: The Committee and system decisions are the same.
#2 seeds: The Committee and system decisions are the same.
#3 seeds: The Committee and system decisions are the same except that the system gives Duke a #3 seed, whereas the Committee gave Wisconsin that #3 seed position.
#4 seeds: The Committee and system decisions are the same except that the system gave Wisconsin a #4 seed (which the Committee gave a #3). The Committee gave Penn State the remaining #4 seed, leaving Duke with no seed.
Final Thoughts. Here are some final thoughts about what this shows:
First, it is remarkable how close the system comes to the Committee’s actual decisions. Although we do not know exactly how the Committee members and the Committee as a whole think, it is "as if" their thinking and the system’s are almost the same.
Second, the results of this work show that the Committee’s decision-making has been very consistent over time.