Wednesday, April 14, 2021

NCAA TOURNAMENT: SEEDING THE TEAMS, PART 2

In part 1 of this series, I showed a history-based process for seeding teams into the NCAA Tournament bracket:  pods of 4 #1 seeds, 4 #2 seeds, 4 #3 seeds, and 4 #4 seeds; a pod of additional automatic qualifiers and at large selections to bring the total at large selections to 19; and the remaining automatic qualifiers.  Please refer to part 1 for how the process works.

Using on that process, based on games played through April 11, I come up with the following initial array of teams (not in order within their pods).  (To rank teams within their conferences, for the conferences that played in the Fall, I used the average of their conference regular season and conference tournament finishing positions; and for the conferences that did not play in the Fall, I used their current conference ranks based average points per game played, which means I have disregarded what has happened so far in their conference tournaments.  Once the season is done, I will re-rank them using the system I used for the Fall-playing conferences.  Bold face = actual or most likely automatic qualifier.)

#1 seeds:

ACC, Florida State

ACC, North Carolina

Pac 12, UCLA

SEC, Arkansas

#2 seeds:

ACC, Virginia

Big 10, Penn State

Big 12, TCU

Pac 12, Southern California

#3 seeds:

ACC, Duke

SEC, Texas A&M

SEC, South Carolina

West Coast, Santa Clara

#4 seeds:

ACC, Clemson

Big 10: Ohio State

Big 12: West Virginia

Pac 12: Washington

Pod of most likely at large selections (from the non-automatic qualifiers, three will have to be removed to get the total number of at large selections down to 19):

ACC, Louisville

ACC, Notre Dame

American, South Florida

American, Memphis

Big East, Georgetown 

Big 10, Illinois

Big 10, Indiana

Big 12, Oklahoma State

Big 12, Kansas

Pac 12, Arizona State

Pac 12, Stanford

SEC, Vanderbilt

SEC, Tennessee

West Coast, BYU

Pod of longer shots to displace other potential at large selections:

ACC, Virginia Tech

ACC, Wake Forest

American, SMU

Big East, Butler

Big East, Connecticut

Big 10, Wisconsin

Big 10, Minnesota

Big 10, Rutgers

Big 10, Michigan

Big 12, Baylor

Big 12, Texas

Pac 12, Colorado

Pac 12, Washington State

SEC, Missouri

SEC, Auburn

SEC, Mississippi

West Coast, Pepperdine

Atlantic 10, St. Louis  (included on basis of win over Arkansas)

Conference USA, Rice (included on basis of win over Texas A&M)

The above is my history-based initial array of teams.  Starting there, I need to remove three of the non-automatic qualifiers from the most likely at large selection group and, after that, I need to review the longer shot group to see if I should pick some of them to replace teams remaining in the most likely at large selection group.  To do this, I use the following table.  This table, and the next one, likely will not be readable as you first see it.  To see it clearly, simply click on the table.  That should bring up a readable version.  Then, to return to this article, simply on the black area to either side of the table.

In this table, for each team on the list, I show its numbers of wins, ties, and losses against the initial #1, #2, #3, and #4 seeds, against the most likely at large selection teams (group 5), against the longer shot at large selection teams (group 6), and against other teams it played (group 7).  Then, on the table, I have highlighted in green good results and in orange poor results, in relation to the position the team is competing for.

Using the table as highlighted, I then decided which three teams to remove from the likely at large selection group in order to bring the number of at large selections to 19.  I removed Louisville, Indiana, and Illinois.

Next, I compared teams on the longer shot list to the remaining teams on the likely at large selection list and made two more changes:  I removed Memphis and Kansas from the likely list and added Rutgers and Colorado from the longer shot list.

I then re-created the above table, but with the changes I just made:

I reviewed this table to see if I should make further changes.  Although Notre Dame is a close call to retain as an at large selection, I decided to leave it with a selection.  However, if St. Louis turns out not to be an automatic qualifier, I probably would give it an at large selection and would remove Notre Dame from that group.  I also would consider Rice, although its chances would be smaller.

Once the selections are set, I could use the above table to place in exact order some or possibly even all of the 16 #1 to #4 seeded teams and the additional 11 teams.  Some of the ordering of teams would be clear and some of it would be speculative.  Since the season is not yet over, I am not going to try to do that this week.  I may try to do it once the coming weekend’s final results are in.

For the remaining automatic qualifiers that are not covered by the above tables but that will be in the Tournament, I have shown in Part 1 how I would order them for seeding purposes.

Observation.  The point of this is not to show what the Tournament at large selections and seeding should be.  Rather, it is to show a workable system for making the at large selections and doing the seeding.  The system is based on historic and current-year data, allowing for the exercise of some discretion but with pretty clearly defined boundaries.  I believe, the system would produce a relatively credible and defensible bracket given the unique circumstances of this year.

Monday, April 5, 2021

NCAA TOURNAMENT: SEEDING THE TEAMS, PART 1

 In previous articles, I wrote about how the Committee might make at large selections for the NCAA Tournament, if the RPI is not usable.  In those articles, I identified three history-based tiers of teams:

Tier 1 for the most likely Top 16 seeded teams, consisting of automatic qualifiers and at large selections;

Tier 2 for the 14 most likely next teams, consisting of automatic qualifiers and potential competitors for at large selections; and

Tier 3 for next level automatic qualifiers and longer-shot competitors for at large selections.

Based on bracket history since 2013, I assigned tier positions to conferences as shown in this table:

In this article, I will write about how the Committee might seed teams.  And, since the Tournament will be at one site this year, I will cover how the Committee might seed all of the 48 teams in the bracket (with some in pods).

Tier 1 - Top 16 Teams

The above table, in Tier 1, shows the history-based distribution of the 16 seeded teams among conferences:  5 ACC teams, 3 Pac 12 teams, 3 SEC teams, 2 Big 10 teams, 2 Big 12 teams, and 1 West Coast team.  The table, however, does not put the teams in seed order.  The Committee practice has been to have four pods of seeds: four #1s, four #2s, four #3s, and four #4s.  Is there a legitimate way to do that this year?

Here is a table with the history of the #1 seeds, by conference, for the period since 2013;

The table shows the distribution of the #1 seeds each year since 2013, by conference.  The Average column shows the average per year.  Using rounding of the averages, the Awarded column shows the distribution of the #1 seeds most consistent with history:  2 to the ACC, 1 to the Pac 12, and 1 to the SEC.  Thus under a history-based distribution the tentative #1 seeds go to the #1 and #2 ACC teams and the #1 Pac 12 and SEC teams.

Here is a mostly similar table for the #2 seeds:

This table has the same format as the #1 seed table, but adds the two next-to-last columns on the right.  In the Carry Forward column, I have carried forward from the #1 seed table the difference between a conference’s Average #1 seeds and its Awarded seeds.  Thus, for example, the Big East had an Average of 0.1 #1 seeds but did not get a #1 seed, so I carried forward 0.1 to add to its #2 seed Average.  Conversely, the SEC had an Average of 0.6 #1 and got 1 #1 seed, so I carried forward -0.4 to subtract from its #2 seed Average.  The Net column shows the total of the #2 seed Average and the Carry Forward amount.  Based on the Net amounts, the Awarded column shows the history-based distribution of the tentative #2 seeds:  1 each to the ACC, Big 10, Big 12, and Pac 12.  Given the #1 seed assignments, these would be the the #3 ACC team, the #2 Pac 12 team, and the #1 Big 10 and Big 12 teams.

Here is a similar table for the #3 seeds:

The Awarded column shows the history-based distribution of the tentative #3 seeds:  2 to the SEC and 1 each to the ACC and West Coast.  Given the #1 and 2 seed assignments, these would be the the #2 and #3 SEC teams, the #4 ACC team, and the #1 West Coast team.

Here is a similar table for the #4 seeds:

The Awarded column shows the history-based distribution of the tentative #4 seeds:  1 each to the ACC, Big 10, Big 12, and Pac 12.  Given the #1 through #3 seed assignments, these would be the the #5 ACC team, the #2 Big 10 team, the #2 Big 12 team, and the #3 Pac 12 team.

The following table summarizes the tentative assignments of pod positions, by conference (not in order within the pods):

I do not see legitimately being able to put all four of the teams in a pod in order.  The reason for this is simple.  The Big 10 teams have played only conference games.  Therefore, there is no way based on actual game results this season to compare Big 10 teams to the other teams in their seed pods.  Further, the Pac 12 and West Coast teams combined have played a grand total of 4 games outside of the west region, with only 1 against a non-west conference that has teams in the seed group (BYU defeated Missouri).  Therefore, there is no way based on actual game results to compare Pac 12 and West Coast teams to the other teams in their seed pods.  Perhaps conference history and actual game results will support putting some teams in each pod in order, but that probably is the best that will be possible from a strict data-based perspective.

Tier 2 - Most Likely Candidates for Unseeded At Large Positions

Of the 16 Tier 1 seeded teams, 5 are likely to be automatic qualifiers and 11 at large selections.  Of the 14 Tier 2 teams, 3 are likely to be automatic qualifiers and 11 are potential at large selections.  With Tier 1 having 11 at large selections, this means it will be necessary to reduce Tier 2’s 11 potential at large selections down to 8, to get to the total of 19 at large selections for the Tournament.

To get from 11 Tier 2 potential at large selections down to 8, if the RPI is not usable this means the selection method must use the other NCAA-mandated factors to do it: head to head results, results against common opponents, results against highly ranked opponents (meaning, this year, against Tier 1 through Tier 3 opponents), and recent results.  It will be a matter of doing the best one can with a limited set of data.

After eliminating three Tier 2 teams, what about trying to put the remaining Tier 2 automatic qualifiers and at large selection teams in seed order?  For the same reasons that the Committee will not be able to put all the teams within the seed pods in order, it will not be able to put all of the Tier 2 automatic qualifiers and selected teams in order.  It may be able to put some in order in relation to each other, but not all of them.  This also will be true if the Committee drops some teams from Tier 2 and replaces them with Tier 3 teams.

Thus there would be pods of teams in seed order as follows:

#1 seeds: 4 teams, in order only to the extent justified by actual data

#2 seeds: 4 teams, in order only to the extent justified by actual data

#3 seeds: 4 teams, in order only to the extent justified by actual data

#4 seeds: 4 teams, in order only to the extent justified by actual data

Tier 2 automatic qualifiers and unseeded at large selections: 11 teams, in order only to the extent justified by actual data

Remaining Automatic Qualifiers

Finally, after identifying and seeding either individually or in pods the 27 Tier 1 and Tier 2 teams that will be in the bracket, there will be 21 automatic qualifiers to place in the bracket (assuming none of them have moved up into Tier 1 or 2 as part of the process).  What about assigning seed positions to these teams?  A reasonable way to do it is to base the order on the historic ranks of the automatic qualifiers from each conference.  Here is a table that shows what I mean:

This table shows the rank of each conference’s automatic qualifier for each year since 2013.  The Average column shows the average ranks of the conferences’ automatic qualifiers.  The Rank by AQ column shows the rank of each conference in terms of the average rank of its automatic qualifiers.  (Interestingly, as you can see, there is a big drop from the first eight conferences - the highlighted ones in earlier articles - to the remaining conferences.)

For the automatic qualifiers other than those from the first eight conferences (all of which are in Tier 1 or 2), a history-based way to rank them would be to follow the order in the table.  This would assign tentative rank positions to the 21 remaining conference automatic qualifiers as follows:

#28  Conference USA

#29  Colonial

#30  Patriot

#31  Sun Belt

#32  Atlantic 10

#33  Southern

#34  Atlantic Sun

#35  Metro Atlantic

#36  Mid American

#37  Mountain West

#38  Big South

#39  Horizon

#40  Missouri Valley

#41  Southland

#42  WAC

#43  Ohio Valley

#44  Big Sky

#45  America East

#46  Summit

#47  Northeast

#48  Southwestern

Final Steps

The above creates an overall history-based assigned conference position framework for the bracket.  This season’s actual in-conference results then would be the basis for selecting specific teams to fill the assigned conference positions.  This would create a tentative bracket with teams seeded either individually or in pods from top to bottom.

Actual game results data from this season, however, might justify altering any of the tentative seed assignments.  Looking for and making results-based adjustments thus would be the final step of the process.

As one example of a possible results-based adjustment, if St. Louis is the Atlantic 10 conference champion, it has a win over Arkansas.  If Arkansas nevertheless gets a #1 seed, the Committee might elevate St. Louis a significant distance on the seed list.  Indeed, if St. Louis by chance is not the Atlantic 10 automatic qualifier, the Arkansas win might justify the Committee moving St. Louis all the way from not in any of the three tiers into consideration for an at large selection.

This week, I am not discussing which teams will fill assigned conference positions (I will do that elsewhere strictly for fun).  I likewise am not looking at actual game results to see changes they might justify to the tentative bracket.  Next week, I will add those step to show how the process, in its entirety, will work.

NCAA TOURNAMENT: KEEPING THE RPI HONEST, PART 7

 In Parts 1 and 2 of this series, I described a test to see if this year’s RPI will be usable.  The test compares the Top 60 and Top 30 in the RPI rankings to baselines for the Top 60 and Top 30 derived from 2013 to the present.  It looks at two groups of conferences: a highlighted group consisting of the eight conferences that have had at least one team in the Top 60 every year since 2013 (ACC, American, Big East, Big 10, Big 12, Pac 12, SEC, West Coast) and a not-highlighted group consisting of all the other conferences.  The test shows the average number of teams and the high and low number of teams each group has had in the Top 60 and Top 30 since 2013.  It asks the question of how the numbers for the RPI ranks this year compare to the test period numbers.

Top 60 Test:  The RPI Top 60 should include roughly 49 teams from the highlighted conferences and 11 teams from the not-highlighted conferences.  The actual numbers can range on either side of these test numbers, but 45 teams should be the minimum from the highlighted group and 15 the maximum from the not-highlighted group.

Top 30 Test:  The RPI Top 30 should include roughly 28 teams from the highlighted conferences and 2 teams from the not-highlighted conferences.  The actual numbers can range on either side of these test numbers, but 27 teams should be the minimum from the highlighted group and 3 the maximum from the not-highlighted group.

Here are three different looks at how these tests apply to the current season, for games played through Sunday, April 4.

 ACTUAL RPI, TO DATE

The following table shows conference representation in the RPI Top 60 and Top 30, plus totals for the highlighted and not-highlighted conferences at the bottom.  The left portion of the table is based on actual RPI ranks, to date, and the right portion of the table has the historic baseline test numbers.


The following table is similar but for regional playing pools:


ACTUAL RPI, TO DATE, CONFERENCES PLAYING SOME NON-CONFERENCE GAMES

Here are similar tables, but for this year’s RPI Top 60 and Top 30 if I consider only conferences playing at least some non-conference games.  I include these tables because it is indisputable that for conferences that play no non-conference games, the RPI cannot rank their teams in relation to teams from other conferences.



SIMULATED RPI, USING ACTUAL RESULTS TO DATE

Here are similar tables, but based on the entire season, including conference tournaments.  Their underlying data are the actual results of games played through April 4 plus simulated results of future games.  These should give a pretty reliable picture of what the end-of-season numbers will look like.



COMMENT

As the above tables show, the RPI is going to greatly underrate teams from stronger conferences and regions and greatly overrate teams from weaker conferences and regions.  This is due to teams not playing enough total games, not playing enough non-conference games, and not playing enough out-of-region games.  Further, it is true even if one considers only teams from conferences that are allowing members to play non-conference games.

As we get closer to the end of the season, this does not seem to be changing.