Sunday, December 27, 2020

NCAA TOURNAMENT: TEAM PROFILES AND AT LARGE SELECTIONS

Team Profiles.  The Women’s Soccer Committee has a number of factors the NCAA requires it to use to pick the at large participants in the NCAA Tournament.  I break these down into 13 factors:

    RPI Rating

    RPI Rank

    Non-Conference RPI Rating

    Non-Conference RPI Rank

    Results against teams already selected for the bracket, including automatic            qualifiers ranked #75 or better.  Based on my observations of Committee               decisions over time, I use a surrogate for this factor: Results Against Top 50            Opponents

    Results Against Top 50 Opponents Rank

    Head to Head Results, for which I use the surrogate of Head to Head Results            Against Top 60 Opponents

    Results Against Common Opponents, for which I use the surrogate of Results        Against Common Opponents with Top 60 Teams

    Results Against Common Opponents Rank

    Conference Regular Season Standing and Conference Tournament Results, for        which I use the surrogate of Conference Standing (Combined)

    Conference Average RPI

    Conference Average RPI Rank

    Results Over the Last 8 Gmes, for which I use the surrogate of Poor Results

The NCAA has a scoring system for some of these factors, such as the RPI.  For other factors it does not: Results Against Top 50 Opponents, Head to Head Results Against Top 60 Opponents, Results Against Common Opponents with Top 60 Teams, Conference Standing (Combined), Poor Results.  For each of these, I have my own scoring system.

Together, teams’ scores for the above factors make up a their profiles.

The Results Against Top 50 Opponents factor is important and relates to scheduling, so it is worth describing the scoring system.  In looking at the Committee’s decisions among bubble teams, it appears that a few good results against highly ranked teams count a whole lot more than a greater number of good results against moderately ranked teams.  Further, good results against teams ranked below #50 appear not helpful at all (apart from their influence on teams’ RPI ratings and ranks).  This suggests that the Committee asks (whether consciously or not), "At how high a level have you shown you can compete?" and tends to select teams based on the answer to that question.  With that in mind, I developed this scoring system for Results Against Top 50 Opponents:


As you can see, this scoring system is very highly skewed towards good results against very highly ranked opponents.

In addition to the 13 individual factors, I use paired factors.  A paired factor puts two of the individual factors together using a formula that weights the two factors equally.  I do this because of how I have imagined a Committee member might think:  Yes, Team A’s RPI Rank is poorer than Team B’s, but when you look at their Results Against Top 50 Opponents, Team A’s are better, and when you look at the two factors together, Team A looks better overall. 

I pair each individual factor with each other individual factor.  After doing this, I end up with 78 paired factors, which when added to the 13 individual factors gives me a total of 91 factors.

Team Profiles and the Committee’s At Large Selections.  Using data from the last 13 years, including team factor and Committee at large selection data, for each factor there are two key questions:

1.  Is there a factor score (a "yes" standard) where, for a team with that score or better, the team always has gotten an at large selection?

2.  Is there a factor score (a "no" standard) where, for a team with that score or poorer, the team never has gotten an at large selection?

Using RPI Rank as an example factor, teams with RPI Ranks of #30 or better always have gotten at large selections.  And, teams with RPI Ranks of #58 or poorer never have gotten at large selections.  Thus the RPI Rank "yes" standard is 30 and the "no" standard is 58.

Asking these two questions for each of the 91 factors produces a "yes" and a "no" at large selection standard for most of them.

At the end of a season, I then can apply the standards to the end-of-season data and, based on which teams meet "yes" and "no" at large selection standards, project what the Committee will decide if it follows its historic patterns.  (And, after the season is over and if the Committee has made decisions that do not match the standards, I can revise the standards to be sure they are consistent with all past Committee decisions.)

When I apply this process for a season that just has ended, I end up with some teams that meet only "yes" at large standards, some that meet only "no" standards, a few that meet some "yes" and some "no" standards (which means they have profiles the Committee has not seen in the past), and some that meet no "yes" or "no" standards.  So far, every year there have not been enough teams that meet only "yes" standards to fill all the at large slots, so there always have been some open slots -- ranging from 2 to 8 open slots since 2007.  In my system, the teams that meet only "no" standards are out of the picture.  Thus the remaing open slots are to be filled by the teams that meet no "yes" or "no" standards or that meet some of each.

How well does this process work?  I run tests to answer this question, applying the current standards retroactively to each of the 13 years in my data base.  This tells me: If I had had the standards at the beginning of the 2007 season and had applied them to each year’s end-of-season data, how many of the teams actually getting at large selections would the standards have picked?  Here is the answer:

Since 2007, there have been 435 at large positions to fill.  My standards, if applied to the Top 60 teams in each of those seasons (that were not Automatic Qualifiers), would have filled 374 of the 435 positions.  This would have left 61 postions to fill, with 90 candidate teams to fill them (each of which met no "yes" and no "no" standards).  This amounts to a little under 5 positions per year that the standards by themselves cannot fill with a pool of roughly 7 teams from which to fill them.

The next question is: Which of the factors was the most powerful at correctly identifying at large selections from among the Top 60 teams that were not Automatic Qualifiers.  As it turns out, these are the most powerful, in order:

RPI Rank and Top 50 Results Score (factor pair): correctly identified 313 at large selections

RPI Rank and Conference Rank: 301

RPI Rank and Top 50 Results Rank: 274

RPI Rank: 270

RPI Rating and Conference Rank: 255

RPI Rank and Common Opponents Rating: 253

RPI Rank and Conference ARPI: 251

RPI Rank and Poor Results: 249 

RPI Rating and Common Opponents Rating: 248

RPI Rating: 245

RPI Rank and Common Opponents Rank: 244

RPI Rating and Common Opponents Rank: 236

RPI Rating and Top 50 Results Rank: 212

RPI Rating and Top 50 Results Score: 207

After this is a big drop in the power of factors

As stated above, after applying all of the factor standards to the profiles of each year’s Top 60 teams that were not Automatic Qualifiers, in order to fill at large positions via the standards, I was left with 61 at large possitions to fill over the 13 years for which I have data and 90 candidates from which to fill them.  I then took each of the above factors, as well as some others that are powerful for seeds but not so much for at large selections, and asked: From the candidate teams each year, what if I gave the remaining at large selections to the ones scoring the best on this factor?  How many correct selections would this factor make?  Here are the results:

RPI Rank and Top 50 Results Rank: 47 correct selections (for 61 positions)

RPI Rating and Top 50 Results Rank: 46

RPI Rank and Conference Rank: 46

RPI Rating and Conference Rank: 46

RPI Rank and Conference RPI Rating: 45

RPI Rating and Common Opponents Rating: 45

RPI Rating and Common Opponents Rank: 45

RPI Rank and Common Opponents Rating: 45

RPI Rank and Common Opponents Rank: 45

Head to Head Results: 45

Finally, I have asked one more question: What if I use only one factor to pick all of the at large teams (rather than using the factor standards system)?  How would that compare to the Committee’s actual selections?  When I run that test, here are the results:

RPI Rank and Top 50 Results Rank: correctly picks 408 of the Committee’s 435 at large selections

RPI Rating and Top 50 Results Score: 406

RPI Rank and Top 50 Results Score: 405

RPI Rank and Conference Rank: 405

RPI Rank and Conference RPI: 403

RPI Rating: 402

RPI Rank: 401

RPI Rating and Top 50 Results Rank: 399

RPI Rank and Poor Results: 397 

RPI Rank and Common Opponents Results: 394 

In other words, the RPI Rank and Top 50 Results Rank factor correctly matches 408 out of the 435 at large selections the Committee has made over the last 13 years, or all but roughly 2 per year.

A way to think about this is that if a team is in the Top 60 and scores well on the RPI Rank and Top 50 Results Rank factor, then the rest of its profile necessarily is going to be very good.  Thus whatever the Committee members think about and discuss, they are highly likely to make at large selections as if they consider teams’ RPIs and their Top 50 Results, paired together, as the most important -- indeed almost decisive -- aspect of their profiles.

Saturday, December 26, 2020

NCAA TOURNAMENT: WHAT THE SPRING 2021 BRACKET MIGHT LOOK LIKE

Background.  Ordinarily, the NCAA Tournament is a 64 team bracket of 31 conference champion Automatic Qualifiers plus 33 at large selections.  For the Spring 2021 Tournament, it will be a 48 team bracket of 29 conference champions (assuming only the Ivy League and Big West will not play during the 20-21 season) plus 19 at large selections.

For the Committee, making at large selections will be a challenge.  While we will not know until we have the full Spring schedule, there is a good possibility the RPI will be either not useful or significantly impaired.  In order for it to work, there must be enough games per team, a big enough proportion of non-conference games, and a big enough proportion of out-of-region games.  Although ordinarily there really is not a big enough proportion of non-conference games, there are enough that the Committee can work around it by considering factors other than the RPI.  This year, however, there is a good chance there will be fewer, and possibly significantly fewer, non-conference games, making any work-around difficult if not impossible.  In addition, even ordinarily there is not a big enough proportion of out-of-region games, with no obvious way for the Committee to work around it.  This year, it is possible there will be far fewer out-of-region games; and if so, the RPI will not work at all for comparing teams from one geographic region to teams from other geographic regions.  Thus overall, if the RPI has any usefulness, it may be greatly diminished.

If that happens, how is the Committee going to decide on which 19 at large teams should fill out the bracket?  There are many ways to answer this question.  Here are a few of them.

Possible Overall Approach.  First, past bracket history may have to be a resource for what the overall bracket should look like (notwithstanding that NCAA policy opposes using past history).  Second, in filling out the bracket, it may be necessary to look more than usual at where teams have finished within their conference regular season and conference tournament competitions.  Third, when deciding among bubble teams it may be necessary to rely heavily on results against other bubble teams and teams that clearly will be in the bracket.

Using Past Bracket History: Possible Approaches.

Proportionality Approach:  The number of at large selections this year will be 19/33 or 57.6% of the usual number.

The following NCAA Tournament at large selection table provides data related to all 33 of the at large selections in each of the last five years:

This table shows, for each conference:

For each of the last five years, the number of at large teams the conference had in the NCAA Tournament 

Over the five years, the minimum number of teams the conference had in the Tournament, the maximum number, and the difference between the minimum and maximum 

Over the five years, the same minimum and maximum numbers, but reduced to 57.6% of those numbers to reflect the reduction of at large positions from 33 to 19

In the last three columns, using the reduced minimum and maximum numbers: the minimum rounded down,  the maximum rounded up, and the difference between the minimum and maximum 

A way to look at the columns on the right is that for each conference, one would expect to see it have at least the minimum rounded down number of at large selections this Spring.  The sum of all these minimum numbers is 9, as the total at the bottom of the column shows.  Thus this fills 9 at large positions, leaving 10 yet to be filled.  The potential number of additional teams, on the right, is what one would expect to be the maximum number of teams that a conference also might get as at large selections.  Thus, one would expect the ACC to have at least 3 at large selections, plus the possibility of 3 more for an outside limit of 6.  And, one would expect the SEC to have at least 2 at large selections, plus the possibility of 3 more for an outside limit of 5.  The sum of all the potential additional selections is 25, as the total at the bottom of the column shows.  One of those 25 teams, however, would be from the Big West which will not be playing so the number comes down to 24.  Thus one would expect there to be up to 24 teams competing for the last 10 at large positions.

A problem with this approach, however, is that it assumes that each conference’s teams getting at large selections are distributed evenly across the Committee’s rankings.  It does not take into account the possibility that one conference may have had its at large teams near the top of the Committee’s rankings and another conference may have had them near the bottom.

Committee’s Top 19 Approaches:  A better approach would be to look at the Committee’s rankings of the at large selections for each year and simply identify the conferences of the Committee’s top 19 selections.  The difficulty with this, however, is that we do not know the rankings the Committee gave its selections.  Nevertheless, a possible similar approach is to use educated guesses about the Committee’s top 19.  Here are three different educated guess approaches:

Factor Standards Approach:  This approach uses my factor standards system, which is based on the Committee’s decisions over the last 13 years.  The approach, for each of the last five years, looks at each team that got an at large selection and at how many factor standards that team met saying "yes" it gets an at large selection.  It then ranks those teams based on the number of standards they met (the more standards met, the better) and selects the top 19 teams based on their ranks.

This approach produces the following conference results for the top 19 selected each year:


 Based on this table, one would expect a conference to receive at least the number of at large selections in the Minimum column and to have the potential to receive up to the additional number of selections in the Difference column.  Overall, 11 teams would be in the expected at large selection group, leaving 8 open slots with 19 teams in the potential at large selection group to fill those slots.

First Round Home Approach Combined with Factor Standards.  A problem with the Factor Standards approach is that it is not completely consistent with the Committee’s home field assignments in the first round of the NCAA Tournament.

Each year, we know the teams in the top 16 of the Committee’s rankings, since the Committee seeds the top 16.  The Committee, however, does not explicitly rank any of the other teams in the bracket.  But, we do know the other 16 teams to which the Committee gives home field advantage in the first round games.  And, although there is no published NCAA policy on how the Committee assigns home field other than for the seeded teams, there are indications that the Committee ordinarily gives home field to the teams it considers to fill the next 16 positions in the rankings after the seeded teams.

If we assume that the Committee assigns home field based on its own rankings of teams, this means that they rank the teams they seed in the #1 through #16 group and they rank the other teams with home field in the #17 through #32 group.  With this assumption, another approach to the upcoming Tournament bracket is to start with the teams that have had home field in each of the last five years and then rank those teams based on my factor standards approach.

This approach produces the following conference results for the top 19 selected each year:

Overall using this approach, 12 teams would be in the expected at large selection group, leaving 7 open slots with 17 teams in the potential at large selection group to fill those slots.

 First Round Home Approach Combined with the RPI Rank and Top 50 Results Rank Factor.  This approach uses the Committee home field assignments but then ranks the teams getting home field based on their scores for my combined RPI Rank and Top 50 Results Rank factor.  This is the most powerful of the factors and, by itself, is able on average to correctly pick all but 2 of the Committee at large selections per year over the last 13 years.

This approach produces the following conference results for the top 19 selected each year:


 Overall using this approach, 12 teams would be in the expected at large selection group, leaving 7 open slots with 15 teams in the potential at large selection group to fill those slots.

Comparing all four of the above tables, there are differences, but they are not extreme, being mainly in the area of expected numbers of at large positions rather than in the area of maximum potential positions.

Picking Teams to Fill Conference Expected At Large Slots and Identifying Conference Candidates for the Remaining at Large Slots.  How then would the Committee pick the teams that should get the positions expected for particular conferences?  And how would it identify the candidates from which the remaining open positions should be filled?  For these tasks, I think the Committee may be forced to rely heavily on team finishing places within their conference regular season standings and tournaments, since those may be the best indicators the Committee will have of how teams in each conference rank in relation to other teams in that conference.

Picking Teams to Fill the Remaining At Large Slots.  If the Committee were to allocate the expected at large positions per conference based on conference regular season standings and tournaments, how would the Committee fill the remaining at large slots from among the candidates.  For this, it seems like the Committee would ask the question:  Which teams have shown they are the most deserving of at large positions as compared to others in the candidate group?

How would candidates in the candidate group show they are the most deserving of at large positions?  The obvious way for them to do this would be through good results against the other candidates and, even better, good results against the expected at large selections and against the Automatic Qualifiers from the potential multi-team conferences.

From a scheduling perspective, this means that a candidate team would have to have played a significant number of games against the group comprised of the other candidates, the expected at large selections, and the Automatic Qualifiers from the potential multi-team conferences.  The number of games would have to be significant because the candidate cannot expect to get a good result in every game against teams from this group and so, needs to play enough games that it might get good results in some of them.

Forming the Bracket.  Here is an example of how the above at large selection process would work, except for selections from the candidate group of the teams to fill the last open slots.  I will use the last table above as the basis for numbers of expected and potential at large selections from each conference.  For conferences that played their conference regular season schedules and tournaments (if they had one) this Fall, to assign conference rankings to the teams I have used their combined regular season and tournament standings in the Fall (weighting the regular season and tournament finishing positions equally).  For conferences that have deferred their conference competitions to the Spring, to assign conference rankings I looked at their 2019 conference regular season and tournament results to assign them positions within their conferences.  Based on their rankings within their conferences, I then have assigned teams as expected at large selections or potential additional at large selections according to the expected and potential slots for each conference in the above table.  (NOTE:  In ranking teams within a conference,  I use the classic 3 points for a win and 1 for a tie method.  Where teams finish tied on points in the conference regular season competition, I assign them the average of the finishing positions they occupy.  For example, if two teams are tied for 1st, I treat them as occupying the 1 and 2 positions and assign each the average rank of 1.5.  For the conference tournament, I assign the winner the rank of 1, the runner up 2, the losing semi-finalists 3.5, the losing quarterfinalists 6.5, and so on.)

Here are the results of that process (with the first number for each team being its conference regular season finishing position and the second number its conference tournament finishing position).  For the teams that did not have their conference competitions in the Fall, the teams I list are speculative -- after completion of their Spring conference competitions, one would have to replace the 2019 teams identified below with the actual Spring 2021 teams:

ACC:  5 expected at large selections plus 1 potential selection 

Automatic Qualifier:  Florida State (1.5, 1)

At large expected:  North Carolina (1.5, 2); Virginia (3, 3.5); Duke (5, 3.5); Clemson (4, 6.5); plus one of Louisville, Virginia Tech, or Notre Dame

Potential at large:  Louisville (7, 6.5); Virginia Tech (7, 6.5); or Notre Dame (7, 6.5)

SEC:  3 expected at large selections plus 2 potential selections

AQ:  Vanderbilt (5.5, 1)

At large expected: Arkansas (1.5, 2); Texas A&M (1.5, 3.5); South Carolina (3, 3.5)

Potential at large: Tennessee (4, 6.5); Auburn (7.5, 6.5); or Missouri (7.5, 6.5)

Pac 12:  2 expected at large selections plus 2 potential selections

AQ:  Stanford (1)

At large expected:  UCLA (2); Southern California (3.5)

Potential at large:  Washington (3.5); California (5)

Big 10:  1 expected at large selection plus 3 potential selections

AQ:  Penn State (4, 1)

At large expected:  Michigan (2.5, 2)

Potential at large:  Rutgers (2.5, 3.5); Wisconsin (1, 6.5); Iowa (5, 6.5)

Big 12:  1 expected at large selections plus 2 potential selections

AQ:  TCU (1)

At large expected:  WestVirginia (2)

Potential at large:  Oklahoma State (3); Kansas (4)

American:  0 expected at large selections plus 2 potential selections

AQ:  South Florida (2, 1)

At large expected:  none

Potential at large:  Memphis (1, 2); UCF (3, 3.5)

West Coast:  0 expected at large selections plus 2 potential selections

AQ:  BYU (1)

At large expected:  none

Potential at large:  Santa Clara (2); Pepperdine (3)

Big East:  0 expected at large selections plus 1 potential selection

AQ:  Xavier (1, 1)

At large expected:  none

Potential at large:  Georgetown (2, 2)

Colonial:  0 expected at large selections plus 0 potential selections

Conference USA:  0 expected at large selections plus 0 potential selections