Sunday, November 5, 2017

2017 NCAA Tournament Bracket Simulation: End of Season

This is my end-of-season, pre-decision day bracket simulation.  Teams have played all their games, both regular season and conference tournaments.  My simulated bracket now uses actual game results for all games and it uses teams' actual ratings, rather than using some simulated game results and simulated ratings.  Since this is the final bracket simulation, I'm going to provide more detail than in my "interim" bracket simulations.

The way the simulation process works is this:  I have a data base that covers the last 10 years.  It includes data for each of the individual factors the Women's Soccer Committee is required to consider in making at large selections for the tournament.  The Committee also considers these factors in seeding, but for seeding it has the ability to consider other factors, too, if it wants.  The data base also includes all of the Committee's decisions on seeds and at large selections over the last 10 years.  The RPI ratings all are those generated for each year using the current RPI formula.

For each factor, I have reviewed each Committee decision group (#1, #2, #3, and #4 seed groups and at large selection group) to see how all members of that group "scored" under that factor.  For most of the factors, for each group there is a score level at which every team at or above that level got a "yes" decision from the Committee; and conversely, there is a score level at which every team at or below that level got a "no" decision from the Committee.  For example, looking at #1 seeds and the Adjusted RPI Rank factor, a team with an ARPI rank of #1 always has received a #1 seed; and teams with ARPI ranks of #8 or poorer never have received #1 seeds.

Based on the NCAA's description of the factors the Committee uses for at large selections, I have a list of 14 individual factors.  In addition,  I have a list of 78 additional factors, each of which is a pair of the individual factors.  For each pair, I use a formula that weights them 50-50.  For each of the 92 factors, I look at each decision group to see what is the score where all teams that equaled or did better than that score always got a "yes" decision and what is the score where all teams that equaled of did more poorly than that score always got a "no" decision.  Then, for at least each of the top 60 RPI teams, for each decision group, I tally up the number of "yes" scores it had and the number of "no" scores it had.

By going through this process, looking back over the last 10 years, for each decision group I've identified a series of up to 92 "yes" scores and "no" scores that are consistent with every decision the Committee has made.  Thus although the criteria and their scores for each decision group don't necessarily represent how Committee members think or even the Committee's "group think," they do represent what the Committee actually has done over the last 10 years.

In my bracket simulations, I apply the criteria and scores to this year's data, for each decision group, to see how many "yes" and "no" scores each team has.  Then based on the "yes" and "no" results, I determine what decisions the Committee will make if it's decisions are going to be consistent with what it's decided over the last 10 years.

Each year, it's likely the Committee will see some teams with profiles it hasn't seen over the last 10 years.  In some of those cases, the teams' profiles may be such that the teams meet both "yes" and "no" factor scores for a decision group.  As an example, this year, LSU, Mississippi State, and St. Louis meet significant numbers of "yes" and "no" scores for at large selection.  In these cases, the Committee can't be 100% consistent with what it's done in prior years, rather it has a hard decision to make.  The decision the Committee actually makes then may provide some insight into how the Committee thinks.  In any event, once a season is over, I incorporate the Committee's decisions into my system and make whatever factor score changes are necessary so that the new factor scores are consistent with all of the Committee decisions in my data base.  Over time, this means that the factor scores become a little "looser"  because they are accommodating more decisions, although over time it's likely that the number of changes required each year will diminish.

Besides giving a picture of what one might reasonably expect from the Committee (this assumes that one can expect a reasonable amount of consistency from the Committee from year to year), this simulation process is a way to see if the Committee is departing from past practice and embarking in a new direction with how it applies the factors.

With that background, below I've provided information for each decision group.  The information shows the key candidates for a "yes" decision for each group and how many "yes" and "no" factor scores each candidate has.  The exact number of "yes" or "no" scores has some meaning, but I don't attribute too much meaning to the exact numbers.  You'll see some teams with only "yes" scores, some with only "no" scores, some with some of each (new profiles this year), and some with no "yes" and no "no" scores.  I assign "yes" decisions to teams with only "yes" scores, "no" decisions to teams with only "no" scores, and treat teams with some "yes" and some "no" scores or with no "yes" and no "no" scores as candidates for any vacant slots.

As always, the numbers in the NCAA Seed or Selection column mean:

1 = #1 seed
2 = #2 seed
3 = #3 seed
4 = #4 seed
5 = unseeded automatic qualifier
6 = unseeded at large selection
7 = top 60 team not getting an at large selection

#1 Seeds:

NCAA Seed or Selection ARPI Rank for Formation Team for Formation 1 Seed Total No 1 Seed Total
1 1 Stanford 36 0
1 2 NorthCarolinaU 15 0
1 6 TexasA&M 10 1
2 3 SouthCarolinaU 4 3
1 4 Duke 2 0
2 5 UCF 0 5

This list is based on a sort of teams first in order of those with the most "yes" factor scores for #1 seeds and second in order of those with the fewest "no" scores.  Thus teams below UCF on the list have no "yes" scores and have 5 or more "no" scores.

As you can see from the list, the candidate group for a #1 seed is the first five teams.  I included UCF only to show that the first five are the only candidates.  From the list of 5, I have identified the three with only "yes" scores as #1 seeds -- Stanford, North Carolina, and Duke.  Then, from the remaining two, I picked Texas A&M as the fourth #1 seed.

I have a table I can go to for each team that shows which "yes" and "no" factor scores it met, if I want to do a more detailed analysis.  Typically, after the Committee has made its decisions on decision Monday, I use the table in evaluating why the Committee might have made a particular decision.  For example, if the Committee were to give a #1 seed to South Carolina rather than Texas A&M, I'd look to see which factors those two teams' "yes" and "no" numbers related to, as a way to try to gain insight into what the Committee might have been thinking and whether the Committee's decision seems reasonable or seems more like an error.  In this particular case, I've already looked at the individual factor scores for the two teams and concluded Texas A&M is the more likely #1 seed, although South Carolina would be a reasonable #1, too.

#2 Seeds
NCAA Seed or Selection ARPI Rank for Formation Team for Formation 2 Seed Total No 2 Seed Total
1 1 Stanford 51 0
1 2 NorthCarolinaU 33 0
2 3 SouthCarolinaU 28 0
1 6 TexasA&M 21 0
1 4 Duke 18 0
2 5 UCF 8 0
3 15 SouthFlorida 1 20
2 10 PennState 0 0
3 8 UCLA 0 1
2 7 WestVirginiaU 0 1
3 17 OhioState 0 11

Here, South Carolina and UCF are clear #2 seeds.  And actually, although Penn State meets no "yes" and no "no" factor scores, it's a clear #2 seed, too.  This is because South Florida next above it and UCLA and West Virginia next below it and all other teams farther down on the list have 1 or more "no" scores.  As among South Florida, UCLA, and West Virginia, I ruled South Florida out, notwithstanding its 1 "yes" score due to its great number of "no" scores.  As between West Virginia and UCLA, after reviewing their detailed profiles, it looked to me like the Committee would be more likely to give the remaining #2 seed to West Virginia, so that's what I went with.

#3 Seeds
NCAA Seed or Selection ARPI Rank for Formation Team for Formation 3 Seed Total No 3 Seed Total
1 1 Stanford 53 0
1 2 NorthCarolinaU 38 0
2 3 SouthCarolinaU 32 0
1 6 TexasA&M 25 0
1 4 Duke 23 0
2 5 UCF 16 0
3 15 SouthFlorida 3 0
2 10 PennState 1 0
3 8 UCLA 1 0
2 7 WestVirginiaU 1 0
4 18 SouthernCalifornia 1 8
3 17 OhioState 0 0
3 19 FloridaU 0 0
4 14 NotreDame 0 0
6 11 Rutgers 0 1
5 23 VirginiaU 0 4
4 12 TexasU 0 4

South Florida and UCLA are clear #3 seeds.  There are four other potential #3 seeds, to fill the remaining 3 positions: Southern California, Ohio State, Florida, and Notre Dame.  I looked at each of the four teams, including at how close it came to getting a #2 seed and how strong a candidate it would be for a #4 seed and went with Ohio State and Florida.  Selections of Southern California and/or Notre Dame also would be reasonable.

#4 Seeds
NCAA Seed or Selection ARPI Rank for Formation Team for Formation 4 Seed Total No Seed Total
1 1 Stanford 56 0
1 2 NorthCarolinaU 52 0
2 3 SouthCarolinaU 42 0
1 4 Duke 39 0
1 6 TexasA&M 35 0
2 5 UCF 25 0
3 8 UCLA 18 0
2 7 WestVirginiaU 11 0
2 10 PennState 9 0
4 9 Princeton 8 1
3 15 SouthFlorida 5 0
4 12 TexasU 2 0
4 18 SouthernCalifornia 1 0
3 19 FloridaU 1 0
6 45 Butler 1 21
3 17 OhioState 0 0
4 14 NotreDame 0 0
6 11 Rutgers 0 0
5 23 VirginiaU 0 0
6 26 ArizonaU 0 2
6 16 FloridaState 0 4
6 20 TennesseeU 0 4
6 22 Georgetown 0 4

Here, Texas and Southern California are clear #4 seeds.  The candidates for the remaining two positions are Princeton, Ohio State, Notre Dame, Rutgers, and Virginia.  I did a detailed review of Princeton's profile.  For the others, I looked to see how close they were to getting a #2 or #3 seeds.  After that review, I went with Princeton and Notre Dame for the two remaining #4 seeds.  Rutgers and/or Virginia, however, also would be reasonable choices for the Committee.

At Large Selections

In the following list, I've included the unseeded automatic qualifiers, immediately following the seeded teams.  In the NCAA Seed or Selection column, they have the "5" code.  Following those teams, I have the teams I designated as the unseeded at large selections with code "6" and the teams from the top 60 I designated as not getting at large selections with code "7".  I'll give some explanation below the table.
NCAA Seed or Selection ARPI Rank for Formation Team for Formation At Large In Total At Large Out Total
1 2 NorthCarolinaU 70 0
1 1 Stanford 69 0
1 4 Duke 67 0
1 6 TexasA&M 56 0
2 3 SouthCarolinaU 71 0
2 5 UCF 63 0
2 10 PennState 56 0
2 7 WestVirginiaU 46 0
3 15 SouthFlorida 52 0
3 8 UCLA 45 0
3 19 FloridaU 44 0
3 17 OhioState 31 0
4 9 Princeton 44 0
4 14 NotreDame 39 0
4 18 SouthernCalifornia 28 0
4 12 TexasU 27 0
5 13 Pepperdine 34 0
5 23 VirginiaU 24 0
5 21 MurrayState 21 10
5 24 Hofstra 18 0
5 37 Monmouth 7 8
5 47 LaSalle 5 22
5 35 FloridaGulfCoast 4 0
5 72 Lamar 3 21
5 34 Baylor 2 0
5 59 WashingtonU 0 12
5 94 CalStateFullerton 0 14
5 81 Bucknell 0 16
5 118 William&Mary 0 16
5 65 NorthTexas 0 17
5 87 IndianaU 0 17
5 123 KentuckyU 0 18
5 80 SanDiegoState 0 20
5 104 UNCGreensboro 0 20
5 98 HighPoint 0 22
5 124 Toledo 0 22
5 158 StonyBrook    
5 177 AustinPeay    
5 213 JacksonvilleState    
5 225 Denver    
5 238 AlabamaState    
6 11 Rutgers 32 0
6 20 TennesseeU 32 0
6 32 OklahomaState 30 0
6 45 Butler 24 2
6 16 FloridaState 23 0
6 22 Georgetown 22 0
6 27 Auburn 18 0
6 26 ArizonaU 17 0
6 41 ArkansasU 17 1
6 30 NCState 16 0
6 42 WakeForest 16 0
6 28 AlabamaU 15 0
6 29 WisconsinU 14 0
6 25 CaliforniaU 10 0
6 31 SantaClara 7 0
6 39 Vanderbilt 6 0
6 33 NorthwesternU 3 0
6 40 MississippiU 3 1
6 46 TCU 2 0
6 38 Clemson 1 0
6 52 Cincinnati 1 1
6 43 ColoradoU 0 0
6 48 MinnesotaU 0 0
7 44 MississippiState 9 9
7 56 LSU 8 18
7 58 StLouis 6 12
7 49 WashingtonState 0 0
7 50 Memphis 0 0
7 53 Northeastern 0 7
7 54 BostonCollege 0 7
7 36 Rice 0 10
7 60 Drexel 0 13
7 51 Marquette 0 14
7 57 VirginiaTech 0 14
7 55 SanJoseState 0 30

In the table, I've designated as at large selections all teams that have at least 1 "yes" factor score and no "no" scores.  I consider these the easy selections.  After that, I added Butler with 24 "yes" and 2 "no" scores, after reviewing the factors for which it got "yes" and "no" scores, as a relatively easy selection.   I did the same with Arkansas (17 and 1). And, I added Mississippi with 3 "yes" and 1 "no" score, after a very careful review, as a less easy selection.  After that, it gets much more difficult.

Mississippi State and LSU clearly present the Committee with profiles it hasn't seen over the last 10 years.  When I look closely at which criteria the "yes" and "no" scores relate to, they primarily involve excellent non-conference results but poor finishing positions in the SEC.  Thus for them, the Committee needs to decide which is more important.  In that consideration, the Committee also might look to see that the SEC is the strongest conference this year in average ARPIs, by a fairly good margin.  My ultimate decision on these was that the Committee would not give at large selections to teams ranked as far down in their conference standings as those two teams.  If the Committee did go that far down, however, I could understand it.

For St. Louis, I did a similar detailed review and went with a "no" decision.  It wouldn't surprise me at all, however, if the Committee were to give them a "yes."

This left me with Cincinnati, Minnesota, Colorado, Washington State, and Memphis to fill three remaining positions.  I looked at Cincinnati's "yes" and "no" factors and concluded it was likely to get a selection.  I looked at the remaining teams to see which had the fewest negatives for #3 and #4 seeds and went with Minnesota and Colorado.

But really, of all of Mississippi, Mississippi State, LSU, St. Louis, Cincinnati, Minnesota, Colorado, Washington State, and Memphis, I would consider any to be a reasonable at large selection or reasonably left out.


No comments:

Post a Comment