Monday, January 8, 2018

NCAA TOURNAMENT BRACKETOLOGY: WHAT MATTERS MOST TO THE COMMITTEE? POST 2017 UPDATE

What factors have the most influence on the Women's Soccer Committee's decisions on NCAA Tournament seeds and at large selections?  That's the subject of this post.

Each year, I review the Committee's decisions in comparison to 13 basic factors I've drawn from the at large decision-making rules the Committee must follow.  And, I review the decisions in comparison to 78 additional factors, each of which pairs two of the 13 basic factors weighted at 50% each.  I review the Committee's decisions for each of  #1 seeds, #2 seeds, #3 seeds, #4 seeds, all 16 seeded teams as a single group, and at large selections.

I then combine the data from all the years in my data base and, looking at the Top 60 teams, identify "yes" and "no" standards for each factor, for each decision group.  Using the Adjusted RPI Rank factor and the #1 seed group as an example,  "yes" a team with an ARPI Rank of #1 always has received a #1 seed; and "no" a team with an ARPI rank of #8 or poorer never has received a #1 seed.

I've just updated the standards after adding the 2017 data and Committee decisions to the data base.  And, I've done a review to see which factors appear to have been the most powerful in the Committee's decision process.

But first, for context, here's a table that summarizes how well the factor standards match with the Committee's decisions over the last 11 years:






















For each year, the "... Final +/0 and 0/0" column shows the number of positions in each group that the standards "fill" with no decision required.  The "Percent Filled by Standards" row shows the percent of available positions the standards fill for each group.  The "Undecided Positions to Fill per Year" column shows, for an average year, the number of positions in each group that the standards by themselves cannot fill, in other words the remaining openings.  And, the "Candidates for Undecided Positions" column shows the number of candidates per year the standards identify for the remaining openings.  All of these candidates meet no "yes" and no "no" standards for the group.  (All other teams are excluded because they meet no "yes" standards and 1 or more "no" standards.)

As the table shows, the standards do a great job with the #1 seeds and a good job with the #2 seeds.  They do lesser jobs with the #3 and #4 seeds, but when looking at all the seeds as a single group, the standards again do a good job.  This suggests that the Committee is pretty consistent in its selection of seeds as a whole but is less consistent in how it assigns teams #9 through #16 between the #3 and #4 seed groups.  In addition, for each position the standards by themselves cannot fill, the standards identify roughly 2 candidates for the position.

For at large selections, combining both the at large seeded and at large unseeded teams, the standards by themselves fill 87.8% of the available positions.  Each year, there are about 4 positions the standards by themselves cannot fill.  And there are 6 to 7 candidates to fill those positions.

The bottom line is that the standards do a good job of reflecting how the Committee makes its decisions.

With that in mind, which standards are the most powerful?  To evaluate that, for each factor and decision group, looking at the Top 60 teams, I count the number of teams that meet "yes" and "no" standards -- the teams for which those particular standards make a decision.  And, I count the number of teams that meet neither a "yes" nor a "no" standard -- the teams for which those particular standards don't make a decision.  Once I've done that, I look to see which standards account for the most "yes" decisions, the most "no" decisions, and the fewest "don't make a decision" -- the fewer teams in the "don't make a decision" group, the more powerful the factor.

Here are the factors that produce the most "yes" decisions, the most "no" decisions, and the fewest "don't make a decision," for each decision group.  Where I list two factors together, I'm referring to paired individual factors with each weighted 50%:

#1 Yes:  ARPI Rank and Top 60 Common Opponents Rank
#1 No:  ARPI Rank
#1 Don't make a decision:  ARPI Rank and Top 60 Common Opponents Rank

#2 Yes:  ARPI and Top 60 Common Opponents Score
#2 No:  ARPI Rank and Top 60 Common Opponents Rank
#2 Don't make a decision:  ARPI Rank and Top 60 Common Opponents Rank

#3 Yes:  ARPI Rank and Conference Rank
#3 No:  ARPI and Top 50 Results Rank
#3 Don't make a decision:  ARPI and Top 60 Common Opponents Rank

#4 Yes: ARPI and Top 60 Common Opponents Score
#4 No:  ARPI and Top 60 Common Opponents Rank
#4 Don't make a decision:  ARPI and Top 60 Common Opponents Rank

At Large Yes:  ARPI and Top 50 Results Rank
At Large No:  ARPI Rank and Top 50 Results Score
At Large Don't make a decision:  ARPI and Top 50 Results Rank

Here are some observations:

First, looking at at large selections and seeds together, the Adjusted RPI has the most powerful influence.  This is consistent with past observations.  Essentially, the Committee starts with the ARPI and then looks to see how other factors suggest a change from the ARPI rankings.

Second, for at large selections, Top 50 Results (Score or Rank) combined with the ARPI has the most powerful influence.  This is consistent with past observations.  Top 50 Results Score looks at good results (wins or ties) against Top 50 teams, and assigns values to those results based on the ranks of the opponents and the game locations.  The higher ranked the Top 50 opponent, the higher the score assigned for a good result, with the assigned scores very much higher for very highly ranked opponents than for lower ranked opponents.

Third, for seeds, Top 60 Common Opponents results (Score or Rank) combined with the ARPI has the most powerful influence.  This confirms, quite clearly, past observations.  Top 60 Common Opponents results is based on the requirement that the Committee consider results against common opponents in making at large selections.  Essentially, this is a mini-rating system for only the Top 60 teams based on common opponent results.  For each Top 60 team, it looks at its results as compared to the results of each other Top 60 team for opponents the two teams had in common.  Then, it assigns a score and ranking to each Top 60 team based on its cumulative common opponent comparisons with the other Top 60 teams.

Fourth, for #3 seeds, teams' Conference Ranks come into play.  This confirms something I've suggested previously, which is that in distinguishing among teams ranked #9 through #16, where making distinctions is difficult, the Committee tends to defer to teams from stronger conferences.

Fifth, teams' Non-Conference RPIs (Ratings and Ranks), Head to Head results against Top 60 opponents, and poor results, although they have some influence, have limited influence as compared to the other factors.

All of these observations are consistent with past observations, so the Committee did not do anything radically different in 2017.


HOW MANY TOP 60 OPPONENTS SHOULD WE SCHEDULE, FOR NCAA TOURNAMENT PURPOSES? POST-2017 UPDATE

Last year, I published information on the number of Top 60 opponents that the NCAA Tournament #1, #2, #3, and #4 seeds played and that the unseeded teams that did and did not get at large selections played.  That information was based on 10 years' data, from 2007 through 2016.  I've added the data from the 2017 season, so here is a table with updated numbers:


And below are charts showing a little more detail (but not the Average and Median numbers) for the #1, #2, #3, and #4 seeds and for the unseeded teams that did (yes) and did not (no) get at large selections: