Saturday, January 21, 2017

NCAA Tournament Bracket Formation: Personal Thoughts on the Factors Most Important to Decision-Making

In the previous group of posts, I've provided data on, and some observations about, what appear to be the most important factors influencing the Women Soccer Committee's decisions on NCAA Tournament at large selections and seeds.  I always like to provide the data so that others who are interested can review the data and reach their own conclusions on what the data mean.  I also have my own thoughts on what the data mean, so here they are:

1.  Not surprisingly and certainly not a new observation, teams' ARPI Ratings and Ranks are key factors in the decision-making process.  For one thing, they create a superstructure within which all decisions are made:
  • Teams with ARPI rankings of #58 or poorer do not get at large selections;
  • Teams ranked #30 or better appear secure in getting at large selections, although over the last few years it has become less clear where the "inner boundary" for protected teams is;
  • Teams must be ranked #26 or better to get a seed;
  • Teams must be ranked #19 or better to get a #3 seed;
  • Teams must be ranked #13 or better to get a #2 seed;
  • Teams must be ranked #7 or better to get a #1 seed.
In addition, many of the other factors that appear powerful in the Committee's decision-making are paired factors, with one element of the pair being a team's ARPI Rating or its ARPI Rank.

2.  For at large selections, teams' Top 50 Results Scores and/or Ranks, using a scoring system highly weighted towards good results against very good teams, when paired with the ARPI, is the most important factor.  This is not new information, but my current update work confirms it.  This is important not only for identifying the teams that do get at large selections, it's also important for identifying the teams that don't get selections.

3.  For at large selections, a second important factor is teams' Top 60 Common Opponent Results Scores and/or Ranks.  This year's update is the first time it's become clear to me how important this factor is in contributing to what teams do get at large selections.  It's not as important for what teams don't get selections.

4.  For at large selections, the Non-Conference ARPI (ANCRPI) Ratings and Ranks also are important.  This is the first time I've seen this so clearly.

5.  For at large selections, although a team's Conference Rating and/or Conference Rank do not appear to be highly important for identifying teams that do get at large selections, when paired with the ARPI they are an important factor for identifying teams that don't get at large selections.  There are a couple of possible explanations for this.  The more cynical explanation, that I doubt is the case, is that the Committee is biased in favor of the strongest conferences.  The other explanation, which I believe is more likely, is that the significance of the Conference Rating and/or Rank factor pattern is that it represents a difficult reality for teams from all but the strongest conferences.  Since their conference schedules are weaker than the conference schedules of teams from the strongest conferences, the teams from the weaker conferences tend to play weaker schedules.  With good results against Top 50 teams, represented by the Top 50 Results Score and Rank factors, as a major part of the decision-making, followed by Top 60 Common Opponent Results Scores and/or Ranks, most teams from weaker conferences have fewer opportunities to score well on those factors.  Thus their Conference ARPI/Rank probably are indicators of that problem.  This is why, for a long time, I have emphasized the importance for teams from mid-majors, that want to be successful in the competition for at large selections, to schedule a lot of very strong opponents for the non-conference parts of their schedules.  The West Coast Conference is an example of a mid-major that does this and has been successful in getting at large selections.  The Ivy League, unfortunately, is an example of a mid-major that doesn't do it and has been pretty unsuccessful in getting at large selections.

6.  For #1, #2, and #4 seeds, teams' Top 60 Common Opponent Scores/Ranks paired with teams ARPPI Ratings/Ranks are the most important in determining which teams do and don't get those seed positions.  This is not that surprising, as the teams in contention for seeds typically have significant numbers of games against each other, so a good measure for seeding purposes is how a team has done against the whole pool of seed competitors.

7.  For #3 seeds, teams' Conference ARPI/Rank is important.  It appears difficult for the Committee to make decisions, for teams they think should be seeded, between #3 and #4 seeds.  The importance of Conference ARPI/Rank suggests that the Committee, in deciding which should receive #3 seeds, has a tendency to default to teams from the strongest conferences.

8.  For the #4 seeds, teams' ANCRPI Ratings/Ranks also are important.

9.  Although teams' Top 50 Opponents Scores/Ranks are not particularly important in deciding the seeds that teams get seeds, they are important in deciding the teams that do not get #2, #3, and #4 seeds.

Overall, my conclusion is that fans and coaches, if they want to know their teams' NCAA Tournament seed and at large selection prospects, should be paying particular attention to (1) their ARPI Ratings and Ranks, (2) their Top 50 Results Scores and Ranks, and (3) their Top 60 Common Opponent Scores and Ranks.  Secondarily, they should pay attention to teams' Conference ARPIs and Ranks and their ANCRPI Ratings and Ranks.

On the other hand, the data suggest that teams' Head to Results and Last 8 Games Results are not as important.  It is possible the Head to Head Results relative unimportance represents a judgment that using one game's result is not a reliable basis for decision-making.

Friday, January 20, 2017

NCAA Tournament Bracket Formation: Most Important Factors for #4 Seeds

Continuing with my reports on the most important factors in the Women's Soccer Committee's decision-making on NCAA Tournament at large selections and seeds, here is a table showing the most important factors in deciding that "yes," a team gets a #4 seed:

Factor Yes 4 Seed
ARPI Rating and Top 60 CO 9
ANCRPI Rating and CO Score 8
ARPI Rating and Top 60 CO Rank 7
ARPI Rank and ANCRPI Rank 7
ANCRPI Rating and Last 8 Games (Poor Results) 5
CO Score and Last 8 Games (Poor Results) 5
ARPI Rank and Top 60 CO Score 5
ARPI 4
ARPI Rating and ANCRPI Rating 4
ARPI Rank and Top 60 CO Rank 4
ANCRPI Rank and CO Rank 4
ANCRPI Rating and CO Rank 4
CO Rank and Last 8 Games (Poor Results) 4
Conference Rank and CO Score 3
ANCRPI Rank and CO Score 3
ARPI Rank and ANCRPI Rating 3
ARPI Rating and ANCRPI Rank 2
ARPI Rank and Conference Rank 2
ARPI Rating and Last 8 Games (Poor Results) 2
Conference ARPI and HTH Score 2
CO Score and CO Rank 2
ANCRPI Rating and Conference Standing 2
Top 60 CO Score 2
Top 60 CO Rank 2
Conference Standing and Top 60 CO Rank 2
ANCRPI Rank and Conference Standing 2

This table shows the 26 most important factors in determining which teams get #2 seeds.  Here, various pairings of the ARPI, Top 60 Common Opponent results, and the Adjusted Non-Conference ARPI are the most important factors.  Looking through the list, this is an area in which the ANCRPI is at its most relevant.  Since the NCAA considers the ANCRPI to be more indicative of conference strength than the ARPI (if nevertheless less accurate), it may be that the ANCRPI comes into play here as one way of including conference strength in the consideration of which teams should receive at least some seed.

Also of note here, the factor patterns are more able to identify teams to receive #4 seeds than they are #3s.  As mentioned in my post on the #3 seeds, this is consistent with my personal observations that the Committee appears to have trouble distinguishing which teams should receive #3 seeds as compared to #4s.

The following table shows the 25 most important factors in determining which teams do not get #4 seeds:

Factor No 4 Seed
ARPI Rating and Top 60 CO Rank 367
ARPI Rating and Top 50 Results Rank 357
ARPI Rating 339
ARPI Rank and Rating 339
ARPI Rank 337
ARPI Rating and Top 50 Results Score 324
ARPI Rating and ANCRPI Rank 316
ARPI Rating and Top 60 CO 311
ARPI Rank and Top 50 Results Rank 273
ARPI Rating and Top 60 HTH 267
ARPI Rank and Top 60 CO Rank 260
ARPI Rating and Last 8 Games (Poor Results) 260
Top 50 Results Rank and CO Rank 258
HTH Score and Last 8 Games (Poor Results) 247
CO Score and Last 8 Games (Poor Results) 232
ARPI Rating and Conference Rank 231
ARPI Rank and ANCRPI Rank 223
ARPI Rank and Conference Standing 223
ANCRPI Rating and HTH Score 217
ANCRPI Rank and CO Score 216
Top 50 Results Score and Last 8 Games (Poor Results) 211
ANCRPI Rank and CO Rank 204
Top 50 Results Score and CO Rank 202
ANCRPI Rating and CO Score 201
ARPI Rating and Conference Standing 184
Here, ARPI Rating paired with Top 60 Common Opponents Rank is the most powerful factor, with ARPI Rating paired with Top 50 Results Rank being the next most powerful factor.  ARPI Rating and Rank alone as primary factors are near the top of the list.  By itself, ARPI Rank excludes all teams ranked #27 or poorer from receiving #4 seeds.  And clearly, the ARPI is the most important factor in determining #4 seeds, when paired with other factors.

NCAA Tournament Bracket Formation: Most Important Factors for #3 Seeds

Continuing with my reports on the most important factors in the Women's Soccer Committee's decision-making on NCAA Tournament at large selections and seeds, here is a table showing the most important factors in deciding that "yes," a team gets a #3 seed:

Factor Yes 3 Seed
ARPI Rank and Conference Rank 8
Conference Rank and CO Rank 6
ARPI Rank 4
Conference ARPI and HTH Score 4
Conference Rank and HTH Score 4
ANCRPI Rank and CO Score 3
ANCRPI Rating and CO Score 3
ARPI Rating and Conference ARPI 3
Top 50 Results Rank and CO Rank 3
ANCRPI Rating and CO Rank 3
ARPI Rating and Top 50 Results Score 2
ARPI Rating and ANCRPI Rank 2
ARPI Rank and Conference ARPI 2
ANCRPI Rank and Top 50 Results Score 2
Conference ARPI and CO Rank 2


This table shows the 15 most important factors in determining which teams get #2 seeds.  Here, the Conference Rank/ARPI Rank paired factor is the powerful, followed by the Conference Rank/Common Opponents Rank paired factor.  What this suggests is that when the Committee gets to the #3 seeds, the conference a team is in has grown in importance.  Also, here Head to Head Results becomes an influential factor, paired with Conference ARPI and Conference Rank.

Also of note here, the factor patterns are less able to identify teams to receive #3 seeds than they are for #1 and #2 seeds.  This suggests that the Committee has a harder time identifying teams for #3 seeds and needs to do more "guessing" about which teams should receive #3s.  This is consistent with my analyses of the Committee's specific #3 seed decisions.

The following table shows the 25 most important factors in determining which teams do not get #3 seeds:

Factor No 3 Seed
ARPI Rating and Top 50 Results Rank 423
ARPI Rank 416
ARPI Rank and Rating 404
ARPI 397
ARPI Rating and Top 60 CO Rank 389
ARPI Rating and Top 50 Results Score 382
ARPI Rank and ANCRPI Rank 377
ARPI Rank and Top 50 Results Rank 373
ARPI Rating and ANCRPI Rank 368
ARPI Rank and Conference Rank 353
ARPI Rank and Top 50 Results Score 344
ARPI Rating and ANCRPI Rating 316
ARPI Rating and Top 60 CO 311
ARPI Rating and Top 60 HTH 310
Conference Rank and CO Score 310
ANCRPI Rating and CO Score 306
Top 50 Results Score and CO Rank 297
HTH Score and Last 8 Games (Poor Results) 277
Conference ARPI and CO Score 275
ANCRPI Rating and HTH Score 273
Top 50 Results Score and Last 8 Games (Poor Results) 263
ANCRPI Rank and Top 50 Results Score 261
ANCRPI Rating and Last 8 Games (Poor Results) 261
ARPI Rank and Top 60 CO Rank 260
ARPI Rating and Last 8 Games (Poor Results) 260
Here, ARPI Rating paired with Top 50 Results Rank is the most powerful factor, with ARPI Rating paired with Top 60 Common Opponents Rank also being a powerful paired factor.  ARPI Rank alone as a primary factor is just next to the top of the list.  By itself, it excludes all teams ranked #20 or poorer from receiving #3 seeds.  Indeed, the ARPI clearly is the most important factor in determining #3 seeds, when paired with other factors.  I suspect that, in trying to distinguish among potential #3 and #4 seeds, the Committee has a hard time finding persuasive data one way or the other.  In that circumstance, perhaps it tends to default to the ARPI for purposes of eliminating teams from consideration.  I believe there's support for this possibility in the factors' being pretty good at identifying which teams will receive some seed, but not being so good at identifying which seed they will receive when it comes to the #3s and #4s.

NCAA Tournament Bracket Formation: Most Important Factors for #2 Seeds

Continuing with my reports on the most important factors in the Women's Soccer Committee's decision-making on NCAA Tournament at large selections and seeds, here is a table showing the most important factors in deciding that "yes," a team gets a #2 seed:

Factor Yes 2 Seed
ARPI Rating and Top 60 CO 12
ARPI Rank and Top 60 CO Score 10
Conference ARPI and CO Score 9
Top 60 CO Score 9
ANCRPI Rating and CO Score 8
Conference Standing and Top 60 CO Score 8
ARPI Rank and Top 60 CO Rank 8
ARPI Rating and Last 8 Games (Poor Results) 8
CO Score and Last 8 Games (Poor Results) 8
ARPI Rating 7
ARPI Rating and Top 60 CO Rank 7
Conference Standing and Top 60 CO Rank 7
Top 50 Results Rank and CO Score 7
Conference Rank and CO Rank 7
ARPI Rank and Top 50 Results Rank 6
ARPI Rating and Top 50 Results Rank 6
ARPI Rating and Conference Standing 6
ARPI Rank and ANCRPI Rating 6
ANCRPI Rank and Top 50 Results Rank 5
ARPI Rank and Conference Rank 5
ARPI Rank and Rating 5
ANCRPI Rank and Top 50 Results Score 5
ARPI Rank and ANCRPI Rank 5
Top 50 Results Rank and CO Rank 5
ANCRPI Rating and Top 50 Results Rank 5
ARPI Rank and Last 8 Games (Poor Results) 5

This table shows the 26 most important factors in determining which teams get #2 seeds.  As with the #1 seeds, teams' ARPI Ratings and Ranks, paired with their Top 60 Common Opponent Results, are the most powerful factors.  (And, once again, Head to Head Results do not appear on the list of the most powerful factors.)

The following table shows the 25 most important factors in determining which teams do not get #2 seeds:

Factor No 2 Seed
ARPI Rank and Top 60 CO Rank 477
ARPI Rating and Top 50 Results Rank 472
ARPI Rank and Top 60 CO Score 467
ARPI Rank and Conference Rank 460
ARPI Rank 457
ANCRPI Rank and CO Rank 455
ANCRPI Rank and CO Score 452
ARPI Rating and Top 50 Results Score 450
ANCRPI Rating and CO Score 443
ARPI Rating and Top 60 CO Rank 437
ARPI Rating and Conference ARPI 436
ARPI Rating and Last 8 Games (Poor Results) 434
ARPI Rank and Rating 434
ARPI Rank and ANCRPI Rank 431
Conference Rank and CO Rank 423
Top 50 Results Rank and CO Score 422
ARPI Rating 421
HTH Score and Last 8 Games (Poor Results) 411
Top 50 Results Rank and CO Rank 409
ARPI Rank and Top 50 Results Rank 408
CO Score and Last 8 Games (Poor Results) 407
Top 60 CO Rank 407
ARPI Rating and Conference Rank 406
ARPI Rating and Top 60 HTH 405
ARPI Rating and ANCRPI Rank 403
Again, ARPI Rank paired with Top 60 Common Opponents Rank is the most powerful factor, with ARPI paired with Top 50 Results Rank and paired with Conference Rank also being powerful factors.

ARPI Rank, by itself, is the most powerful single factor.  By itself, this factor excludes all but the top 14 ARPI teams each year from getting a #2 seed, just as it excludes all but the top 7 ARPI teams from getting a #1 seed.

NCAA Tournament Bracket Formation: Most Important Factors for At Large Selections

Following up on the preceding post, here is a table showing the 26 most important factors for the Committee's NCAA Tournament at large selections:

Factor Yes at Large
ARPI Rating and Top 50 Results Rank 144
ARPI Rank and Conference Rank 128
ARPI Rank and Rating 115
ARPI Rating 115
ARPI Rating and Top 60 CO Rank 109
ARPI Rank and Top 50 Results Rank 109
ARPI Rank 107
ARPI Rating and ANCRPI Rank 107
ARPI Rating and Top 60 CO 89
ARPI Rank and ANCRPI Rank 89
ARPI Rank and Top 60 CO Rank 87
ARPI Rank and Top 60 CO Score 69
ANCRPI Rank and CO Score 61
Top 50 Results Rank and CO Score 60
ARPI Rating and ANCRPI Rating 59
ARPI Rating and Top 50 Results Score 58
Top 60 CO Score 57
CO Score and CO Rank 54
ARPI Rank and Conference Standing 49
ANCRPI Rank and CO Rank 46
ARPI Rating and Last 8 Games (Poor Results) 46
Top 60 CO Rank 45
Conference ARPI and CO Score 43
Top 50 Results Rank 43
Conference Rank and CO Rank 41
ANCRPI Rank and Top 50 Results Rank 41

To provide context for this table, as stated in my previous post, realistically the pool of teams the Committee looks at for seeding and at large selection purposes is the top 60 teams in the ARPI rankings, thus a total of 600 teams over the last 10 years (less the 3 teams in the top 60 over the last 10 years with records below 0.500).  Over the last 10 years, there have been 155 Automatic Qualifiers in the top 60.  Since the AQs are not in competition for at large selections, the actual pool of teams the Committee is looking at for at large selections is roughly 445 teams, in other words roughly 44.5 teams per year.  From this pool, the Committee had to pick 34 teams per year from 2007 through 2013 and 33 teams per year since then (due to the split between the Big East and American Athletic Conferences).

The above table shows that the ARPI Rating/Top 50 Results Rank paired factor is the most important factor for purposes of saying "yes" a team gets an at large selection.  The Top 50 Results Score and Rank factors use a scoring system that rewards teams for wins and ties against teams ranked #50 or better by the ARPI.  The scoring system is weighted strongly in favor of good results against very highly ranked teams.  This paired factor is significantly more powerful than any other factor.

Next in terms of power is the ARPI Rank/Conference Rank paired factor, which is significantly more powerful than the next factors on the list.  Then come the ARPI Rating factor by itself and the ARPI Rank/ARPI Rating paired factor, which are significantly more powerful than the ones below them.  And then come four factors, one of which is the ARPI Rank factor by itself, the other three of which are either either ARPI Rating or ARPI Rank paired with one of Top 50 Common Opponents Rank, Top 50 Results Rank, or Adjusted Non-Conference RPI (ANCRPI) Rank.  The other factors on the list are significantly weaker than any of these.

Altogether, the list demonstrates the importance of teams' ARPI ratings/ranks in the at large selection process and, of the other primary factors, the importance of the Top 50 Results factor.  It also is noteworthy that the Head to Head Results factor does not show up on the "top 26" list, at all.

What about the "no" at large selection factors?  Here's a list of the top 26 factors:

Factor No at Large
ARPI Rank and Top 50 Results Rank 41
ARPI Rank and Top 50 Results Score 40
ANCRPI Rank and Top 50 Results Rank 32
ARPI Rating and Conference Rank 32
ARPI Rank and Conference Rank 26
ARPI Rating and Top 50 Results Score 25
ARPI Rating and Top 50 Results Rank 24
ARPI Rating and ANCRPI Rank 24
ANCRPI Rating and HTH Score 24
ARPI Rank 23
ARPI Rating and Top 60 HTH 21
ARPI 20
Top 50 Results Score and HTH Score 20
Top 50 Results Rank and HTH Score 20
ARPI Rank and Rating 19
ARPI Rank and Top 60 HTH Score 16
ANCRPI Rank and HTH Score 16
HTH Score and CO Rank 16
Top 60 HTH 16
Conference Rank and HTH Score 15
HTH Score and CO Score 15
ANCRPI Rating and Conference ARPI 15
ARPI Rating and Top 60 CO Rank 14
ANCRPI Rating and Conference Rank 14
ANCRPI Rank and Top 50 Results Score 12
Conference ARPI and HTH Score 12
Once again, teams' ARPI Ratings and Ranks, coupled with their Top 50 Results Ranks, are the most powerful factors, by a good margin.  Of interest, ARPI Ratings and Ranks, coupled with a team's Conference Rank, also are powerful factors.

It's worth noting that the most powerful factors, by themselves, are not close to sufficient to do the entire job.  The Committee currently needs to select 33 at large teams from a pool of roughly 44.5 teams per year.  The most powerful "yes" factor can select 14.4 teams per year.  The most powerful "no" factor can exclude 4.1 teams.  This leaves the Committee with 26 teams still in the pool, from which to fill 18.6 remaining at large slots.  Thus the other factors are needed to whittle this number down to a few slots each year to be filled from the remaining pool of a few teams that meet no "yes" and likewise no "no" factor patterns.  Nevertheless, the most likely factors to get a team a "yes" and also to get a team a "no" are the paired ARPI/Top 50 Results factors.

Thursday, January 19, 2017

NCAA Tournament Bracket Formation: Which Factors Are Most Important to the Committee?

The purpose of this and some following posts is to provide information on which factors are most important to the Women's Soccer Committee in its at large selections and seeds for the NCAA Tournament.  The information isn't necessarily about which factors the Committee members individually think are the most important.  Rather, it's about which factors, when matched up with the Committee's decisions over the last 10 years, correlate best with the Committee's actual decisions and therefore are the most important, whether individual Committee members know it or not.

By "factors," I'm referring to 13 factors that the Committee is required by rule to use in deciding on at large selections for the NCAA Tournament.  The Committee isn't limited to those factors when it comes to seeding, but I'm satisfied that those factors are very important to seeding, too.  In addition, as I've covered in prior posts, I'm referring to each individual factor plus each factor combined in a pair with each other factor.

For each factor, whether individual or paired, I have identified patterns that match with the Committee's decisions.  A "yes" pattern means that over the last 10 years every team meeting that pattern has received a "yes" decision on a particular seed or at large selection.  A "no" pattern means that no team meeting the pattern has received a "yes" decision.

So, what I'll be covering here is which of the 13 individual factors' patterns and 78 paired factors' patterns match up the most frequently with the Committee's seeding and at large selection decisions.  The factors associated with these patterns are the most important factors, for seeding and at large selection purposes.

I'll start with the #1 seeds.  In the following post I'll cover the at large selections, and then I'll go through the #2, #3, and #4 seeds.

Here's a table for the #1 seeds, followed by an explanation:

Factor Yes 1 Seed
ARPI Rating and Top 60 CO Rank 27
ARPI Rank and Top 60 CO Rank 27
CO Score and CO Rank 23
ARPI Rank and Top 60 CO Score 22
Conference ARPI and CO Rank 22
HTH Score and CO Rank 22
Top 60 CO Score 21
ANCRPI Rating and CO Rank 21
Conference Rank and CO Rank 21
Top 60 CO Rank 20
ARPI Rank and Conference Rank 20
CO Score and Last 8 Games (Poor Results) 20
CO Rank and Last 8 Games (Poor Results) 20
ARPI Rank and Conference ARPI 19
ARPI Rating and Conference ARPI 18
ARPI Rating and Top 60 CO 18
ARPI Rank and Top 50 Results Score 18
Conference ARPI and CO Score 18
ANCRPI Rank and CO Rank 17
Top 50 Results Score and CO Rank 17
Conference ARPI and HTH Score 17
HTH Score and CO Score 17
ARPI Rank and Rating 16
ARPI Rating and Conference Rank 16
ARPI Rank and Top 50 Results Rank 16
ARPI Rank and Top 60 HTH Score 16

This table shows the top 26 factors, whether individual or paired, in terms of which factors' patterns match the most often with the Committee's decisions.  Thus, for example, at the top of the table the pattern for the ARPI Rating/Top 60 Common Opponents Rank factor pair matches with actual #1 seeds 27 times over the last 10 years.  Or, to put it differently, if in 2006 I had had the factor patterns I now have, this particular paired factor pattern, by itself, would have "predicted" 27 of the 40 #1 seeds over the next 10 years.  The same is true for the pairing of ARPI Rank/Top 60 Common Opponents Rank.

The patterns for the factors that don't show up in the table would have "predicted" from 15 to 0 #1 seeds over the last 10 years.

What is significant to me about this table is how important teams' common opponent results are when it comes to #1 seeds.  As a reminder, and without going into a full explanation, the common opponent factor is something I developed to meet the NCAA requirement that the Committee consider results against common opponents.  For each top 60 team, I come up with a Common Opponent Score and Rank, based on how each team compares to each other top 60 team in results against opponents that the two teams had in common.

Also of interest, these top 26 factors in terms of #1 seeds include only two individual factors, with all the others being paired.  The two individual factors are the Top 60 Common Opponent Score and the Top 60 Common Opponent Rank, able by themselves to "predict" 21 and 20 of the #1 seeds, respectively, in other words half of the #1 seeds.

Looking over the balance of the list, it looks like teams' ARPI Ratings and ARPI Ranks are the next most important factors.

The "yes" patterns, however, are only part of the equation.  Although the "yes" patterns can identify a major portion of the #1 seeds, they can't identify all of them.  The "no" patterns also are important, since they exclude teams from #1 seeds.  This then leaves a few teams meeting no "yes" but also no "no" patterns, and these are the teams to choose from to fill any remaining #1 seed slots.

Here's a table showing the 25 factors that are most important in excluding teams from receiving #1 seeds:

Factor No 1 Seed
ARPI Rank 527
ARPI Rating and Top 60 CO Rank 526
ARPI Rating and Top 60 CO 523
ARPI Rank and Last 8 Games (Poor Results) 522
ANCRPI Rank and CO Rank 522
ARPI Rank and Top 60 CO Rank 517
ARPI Rank and Rating 516
ANCRPI Rank and CO Score 516
ARPI Rank and Top 60 CO Score 515
ARPI Rating and Top 50 Results Rank 513
ARPI Rating and Conference ARPI 513
ARPI Rating and Last 8 Games (Poor Results) 513
ARPI Rating and Top 60 HTH 509
ARPI Rating 507
ARPI Rank and ANCRPI Rank 504
CO Score and Last 8 Games (Poor Results) 502
Top 50 Results Rank and CO Score 500
Top 60 CO Rank 497
HTH Score and Last 8 Games (Poor Results) 494
CO Score and CO Rank 493
ARPI Rank and Top 60 HTH Score 492
ANCRPI Rating and CO Score 491
Top 60 CO Score 489
ANCRPI Rank and HTH Score 487
ANCRPI Rating and CO Rank 486

To get these numbers in the right context, realistically speaking the pool of teams the Committee looks at in the at large selection and seeding process is the top 60 teams in the ARPI rankings.  So the above table is based on a pool of 600 teams over the last 10 years.  The "No 1 Seed" column then tells us how many of those 600 teams a particular factor's pattern has excluded from getting a #1 seed.

[NOTE:  The NCAA has a rule that no team with a record below 0.500 can receive an at large selection.  Over the last 10 years, 3 teams in the top 60 have had records below 0.500.]

This "No 1 Seed" table shows the most important factor patterns in excluding teams from getting #1 seeds are teams' ARPI Ranks and Ratings.  Indeed, each year the ARPI Rank factor pattern alone excludes 53 of the 60 possible teams from getting #1 seeds, limiting those seeds to coming from among the top 7 teams in the ARPI rankings.  Next in importance come teams' Common Opponent Scores and Ranks.


Tuesday, January 17, 2017

NCAA Tournament Bracket Formation: The Role of Standing Within Conference and Conference ARPI Rank

Over the years, some of the most adamant criticisms of the Women's Soccer Committee's at large selections have come when the Committee has denied an at large position to a team that has finished high in the regular season standings of a highly ranked conference.  Two recent examples of this are the Committee's decisions in 2015 and 2016 to not give at large positions to Wisconsin and DePaul, respectively.

The purpose of this post is to show the role of a team's standing within its conference and the conference's ARPI rank in the Committee's decision-making process, as evidenced by the Committee's decisions over the last 10 years.  This does not mean that the Committee members necessarily think about this the way I will write about it.  It does mean, however, that the Committee's decisions always have come out the way I will write about it.  So, you could take this as evidence of how the mind of the "Committee as a whole" works, even if the individual members don't think that way.

Within a conference (except for the Pac 12, Ivy, and West Coast), there are two sub-competitions.  There is the conference regular season competition and there is the conference tournament competition.  Having studied the Committee's decisions, I've concluded that the Committee, in evaluating where a team fits within its conference, does not look just at one of these or at the other.  Rather, it looks at a combination of the two.  Although there could be other ways to combine the two, one way is simply to take the average of where a team finishes in the conference regular season competition and where it finishes in the conference tournament to identify the team's ultimate conference standing.  This is a logical way to do it, it seems like how the Committee as a whole might do it, and it is how I do it.

To determine ultimate conference standing, the questions then are (1) what regular season standing positions do teams have if they are tied in points in the conference regular season competition and (2) what conference tournament standings do teams have that exit the tournament in the same round?  Here again, I look at this in terms of what seems logical and how the Committee as a whole might approach it:

  • For the regular season conference competition, if two teams are tied in points, then their standing is the average of the positions they occupy.  Thus if two teams are tied for 1st in the competition, they occupy positions 1 and 2 in the conference, so each one has a conference regular season standing of 1.5.
  • For the conference tournament, the tournament champion occupies position 1 and the runner up occupies position 2.  The losing semi-finalists occupy positions 3 and 4 in the tournament, so each one has a tournament standing of 3.5.  The losing quarter-finalists, if the conference has four quarter-final matches, occupy positions 5 through 8, so each one has a tournament standing of 6.5.  And so on, for other tournament formats.
Conference standing in isolation, however, has a very limited meaning.  Rather, what matters is a team's standing in its conference combined with the strength of its conference.  In order to have a measure of these two factors together, I use the following formula:

          2.8 x Conference Standing  +  Conference Rank

This formula has the effect, when applied to teams within the Top 60 of the ARPI rankings, of assigning equal weight to Conference Standing and to Conference Rank.

I need to be clear here that what I've described above is only one way to do it, and is the way I do it.  For purposes of my work identifying patterns the Committee has followed, however, it really doesn't matter so long as my method is reasonable.  All the Conference Standing and Conference Rank pattern will say is, "Using this method for assigning a value to a team's conference standing and conference rank, this is the pattern for the Committee's decisions."

So, using this method for assigning a value to a team's conference standing and conference rank combined, here are the Committee's patterns over the last 10 years for NCAA Tournament seeds and at large selections:


In this table, the lower the value the better.  Using at large selections as an example, the table says that a team with a conference standing and conference rank combined value of 12.0 or less always has received an at large selection.  And, a team with a value of 31.2 or more never has received an at large selection.  Thus, in terms of expectations, one reasonably can expect that a team with a value of 12.0 or lower will receive an at large selection.  A team with a value of 31.2 or higher won't receive an at large selection.  And for teams in the middle, the patterns don't indicate one way or the other: the team might or might not receive an at large selection.

Another table elaborates on this:


This table, which is a little hard to see, is part of a larger table.  (To see this table in a larger and clearer format: right click on it, then click on "Open link in new tab," and then at the top of your computer screen click on the new browser tab that just appeared.)  The row across the top is a team's conference standing determined as I described above.  The column down the left is the conference's ARPI rank.  Thus the yellow box at the upper left of the table is the value for Conference Standing and Conference Rank assigned to the #1 team in the #1 conference.  The yellow fill for that box indicates that a team with that value always has received a #1 seed.  Thus, over the last 10 years, the #1 team in the #1 conference always has received a #1 seed.

For a table like this, if there were orange infills, it would mean that teams with those values always have received at least #2 seeds.  For this particular set of patterns, there is not any value for always receiving #2 seeds.  The bright red means teams meeting those values always have received at least #3 seeds; and the darker red means they always have received at least #4 seeds.  The gold boxes on the left simply denote that those teams are unseeded automatic qualifiers.

In the table, the grey infill boxes represent Conference Standing and Conference Rank values that always have resulted in at large selections.  The "stair step" nature of both the grey area and the seed areas show how the Committee looks at Conference Standing and Conference Rank.  The better the Conference Rank, the lesser the Conference Standing needs to be in order to get a particular seed or an at large selection.  If I were to post the entire table, which is too big for this webpage, you'd see that at the bottom right of the table, there is another shaded area showing Conference Standing and Conference Rank values that never have received at large selections, with a similar stair step look.

And finally, the table has green infill boxes.  These are for the six teams, over the last 10 years, that finished #2.00 or better in their combined conference standings and that came closest to the "yes" at large selection area but were not within it and did not get at large selections.  These are the ones that have generated controversy:
  • Wisconsin, Big 10 ranked #2; finished at 1.5 in regular season, at 6.5 in conference tournament, for average of 4.0; Conference Standing and Conference Rank value of 13.2
  • DePaul, Big East ranked #6; finished at 1.5 in regular season, at 3.5 in conference tournament, for average of 2.5; Conference Standing and Conference Rank value of 13.0
  • Santa Clara, West Coast ranked #7; finished at 2.0 in conference; Conference Standing and Conference Rank value of 12.6
  • Penn, Ivy League ranked #8; finished at 2.0 in conference; Conference Standing and Conference Rank value of 13.6
  • Long Beach State, Big West ranked #8; finished at 2.0 in regular season and 2.0 in conference tournament, for average of 2.0; Conference Standing and Conference Rank value of 13.6
  • Missouri, SEC ranked #3; finished at 2.0 in regular season and 6.5 in conference tournament, for average of 4.25; Conference Standing and Conference Rank value of 14.9
All of these were close to the "yes" at large selection area, but none within it, and for whatever reasons applied in their particular cases, they did not get at large selections.

What the table demonstrates is that the Committee's decisions, in relation to teams' conference standings and ranks, follow a logical pattern.  One might argue, as a matter of policy, that the "yes" at large selection area should extend farther down on the table, which essentially is what fans of the above teams have argued.  The Committee, however, over the last 10 years, has drawn the line higher up.  Thus, using the table, a team is not assured of an at large selection if:

  • It is in the #8 ranked conference and finishes at #1.75 or poorer in the combined conference regular season/tournament standings;
  • It is in the #7 ranked conference and finishes at #2.00 or poorer;
  • It is in the #6 ranked conference and finishes at #2.50 or poorer;
  • It is in the #5 ranked conference and finishes at #2.75 or poorer;
  • It is in the #4 ranked conference and finishes at #3.00 or poorer;
  • It is in the #3 ranked conference and finishes at #3.25 or poorer;
  • It is in the #2 ranked conference and finishes at #3.75 or poorer;
  • It is in the #1 ranked conference and finishes at #4.00 or poorer.
And further, this pattern demonstrates that it does not matter to the Committee where a team finishes in the conference regular season standings, it matters only what the team's combined conference regular season/conference tournament standing is.

And, to add a few more details:

  • Over the last 10 years, looking at teams that were in the Top 60 of the ARPI rankings, 4 teams tied for first in their conference regular season standings did not get at large selections, 18 teams that were #2 in their conference regular season standings did not get at large selections, and 4 teams tied for #2 in their conference regular season standings did not get at large selections.
  • This pattern, by itself, identifies 5 #1 seeds, out of 40, over the last 10 years, 0 #2 seeds, 1 #3 seed, and 1 #4 seed.
  • This pattern, by itself, identifies 33 unseeded at large selections and excludes 3 teams from at large selections over the last 10 years.  In other words, 3.3 "yes" and 0.3 "no" per year.




Monday, January 16, 2017

NCAA Tournament Bracket Simulations: Post-2016 "Standards" Update

As I've written previously, in order to do my NCAA tournament bracket simulations, I use "standards" I've identified that say either "yes," a team gets a particular seed or an at large selection, or "no," a team doesn't get a particular seed or an at large selection.  The "standards" are based on what the Committee has done in the past:

  • If a team meets a "yes" standard for a particular decision, it means that every team that met that standard, over the period for which I have data, received a "yes" Committee decision -- for that particular seed or for an at large selection.
  • If a team meets a "no" standard for a particular decision, it means that every team that met that standard received a "no" Committee decision.
Since what I've called "standards" really are just the patterns the Committee has followed, they really aren't standards.  So, for future purposes, I'm going to call them "patterns," which is more accurate.

Each year, following the NCAA Tournament, I have to update the "patterns" to take the current year's Committee decisions into account.  I've done that this year, and while doing it I've made some changes to my method that will make my annual updates easier to do, will provide some very good information about how the Committee appears to make its decisions, and still will provide a good basis for my next year's bracket simulations.  At this point, my data base is 10 years (2007 through 2016), so the patterns are consistent with 10 years' Committee decisions.

My updated system has, for each Committee decision -- #1 seeds, #2 seeds, #3 seeds, #4 seeds, and at large selections -- 91 patterns.  The patterns are based on the NCAA-mandated criteria the Committee is to apply in making its at large selections.  I use 13 basic criteria and have a pattern for each criterion by itself and then a pattern for each paired set of criteria.  There are 78 "paired criteria" patterns -- the number generated based on 13 criteria taken two at a time.

I've explained this in detail, and have set out the patterns, at the RPI for Division I Women's Soccer website.  At the website, I have one page for the at large selection patterns and a separate page for the seed selection patterns.  On those pages, I've provide tables showing what the patterns are.  I've also provided information on exactly how the Committee's decisions over the last 10 years match with the patterns.  Here are links to those two pages:
As we go through the season, week by week, I will match up the individual teams' data with the patterns to create simulated brackets.  The deeper we get into the season, the closer a simulated bracket will come to what the ultimate bracket will be and to showing where the Committee will have to make tough decisions.

Over the coming weeks, I'll be writing a series of posts on the patterns.  I hope the posts will be helpful to coaches and fans in evaluating their teams' prospects from an NCAA tournament seed and at large selection perspective.  In the meantime, if you're interested in this, take some time to look over the patterns at the linked webpages.

And, as always, feel free to ask questions!