Thursday, October 15, 2020

NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS: IT’S "AS IF" THIS IS HOW THE COMMITTEE THINKS

 When making NCAA tournament at large selections and seeds, the Women’s Soccer Committee considers a number of data-based factors.  For the at large selections, the Committee must base its decisions on these factors.  For the seeds, the Committee considers these factors but is not bound by them.

What factors have the greatest effect on the Committee decisions?  What is the thinking process of the Committee in getting to its decisions?  For coaches whose teams are contenders for at large selections and seeds, these are important questions.

The details are confidential of what the individual Committee members think, what the Committee discussions are, and the reasoning by which the Committee makes its decisions.  In today’s world, however, it is possible, with the aid of a computer and programming, to analyze the relationship between the data the Committee uses and the decisions it makes.  This can let us say that the Committee makes decisions "as if" the Committee thinks in a particular way.

At the RPI for Division I Women’s Soccer website, starting with the NCAA Tournament: Predicting the Bracket, At Large Selections page and the page that follows it, I have described a computer-driven system I have used for the last few years to analyze the Committee’s decisions.  The system is set up based on matching data for the factors the Committee considers to the Committee’s actual decisions. It lets me say that although I do not know how the Committee members think they make their decisions, it is "as if" they make them this way.

The Factors.  Here are the primary factors the Committee is supposed to consider.  For factors where there is not an obvious "score" associated with the factor, I have set up a system that assigns a score to each team based on its game results.  For a more detailed description of the factors and the scoring systems I have set up for them, go to the page linked above at the RPI website.

RPI rating

RPI rank

Non-conference RPI rating

Non-conference RPI rank

Results against Top 50 opponents

Top 50 results rank

Combined conference tournament and conference regular season standing

Conference average ARPI

Conference average ARPI rank

Top 60 head to head results

Top 60 common opponent results

 Top 60 common opponent results rank

Poor results

Number of Top 60 opponents:  This is not a factor set by the NCAA, although it could fall under the heading Strength of Schedule.  Nevertheless, it is something my system calculates as an aid to coaches with NCAA tournament aspirations.  It gives them a picture of how many Top 60 opponents they should think about scheduling.

Paired factors:  In addition to those single factors, my system also uses paired factors.  For each single factor, I pair it with each other factor weighted 50-50 for each factor in a pair.  (I do not use the Number of Top 60 Opponents in setting up the paired factors.) 

The Standards.  My system compares all of the Committee’s at large and seeding decisions over the last 13 years with the Top 60 teams’ factor scores.  This is for teams that did and did not get at large selections, #1 seeds, #2 seeds, and so on.  The system tells me that teams scoring x or better on a particular factor always got a 👍 decision from the Committee, whereas teams scoring y or poorer always got a 👎.  As examples, teams ranked #1 by the RPI always have gotten a #1 seed.  Teams ranked #8 or poorer never have gotten a #1 seed.

This process, in most cases, gives me a "yes" and a "no" standard for each factor (whether single or paired) for each of the at large, #1 seed, #2 seed, and so on, decisions.

To see the standards, go to the RPI website page linked above for at large selections (bottom of the page) and the page that follows it for seeds.

The Bracket.  At the end of the season, my system applies the standards to the factor scores of the Top 60 teams.  Teams that only meet "yes" standards for a decision get a 👍 for that decision, for example an at large selection.  Teams that only meet "no" standards get a 👎 decision.  Ordinarily, this will leave a number of teams in the middle that meet no "yes" and no "no" standards.  If all the positions -- at large selections or particular seeds -- are not filled yet, these teams are candidates to fill the vacant positions.  Also, for a new year, occasionally there are teams that meet both some "yes" and some "no" standards.  These are teams with profiles the Committee has not seen over the last 13 years (and that will require me to update the standards for future years to incorporate what the Committee decided as to these teams).  They also are candidates to fill the vacant positions.

New Addition to the System.  The above describes the system I have used for several years.  To improve the system, I have worked this year to see if there is an additional step I can add to the system so it will come closer to exactly matching the Committee’s decisions.  The additional step would select teams from among the candidates for as yet unfilled at large selection and seed positions based on the question, "All other things being equal, what should be the ‘decider’ for selection purposes?"

To do this, I looked at each of the single factors and the most "powerful" of the paired factors to see, if I used one of them as the "decider," how many of that "decider’s" decisions would match the Committee decisions.

My work produced quite clear results: Decisions based on the paired RPI rank and Top 50 results rank factor produce the best match with the Committee decisions.  In other words, if my system fills vacant slots by choosing from the candidate teams those with the best scores for that paired factor, my system comes closer to the teams the Committee actually selected than it would using any other factor.  The only exception is for the #1 seeds. There, Poor Results is the only factor that picks the correct teams to fill the #1 seed vacancies.

Results.  Here is how well the updated system works when applied retrospectively to the last 13 years:

At large selections:

 435 postions to be filled

369 positions filled by standards

66 postions left to be filled

 51 of the still open positions correctly filled by decider

15 positions incorrectly filled

In five of the years, the system matches all of the Committee at large selections; in four of the years, it misses 1; in one year, it misses 2; and in three years, it misses 3. 

#1 seeds:

52 positions to be filled

50 positions filled by standards

2 positions left to be filled

2 positions correctly filled by decider

0 decisions incorrectly filled 

#2 seeds:

52 positions to be filled

40 positions filled by standards

12 positions left to be filled

 8 positions correctly filled by decider

 4 positions incorrectly filled

In ten of the years, the system matches all of the Committee #2 seeds; in two of the years, it misses 1; in one year, it misses 2.  

  #3 seeds:

52 positions to be filled

13 positions filled by standards

39 positions left to be filled

23 positions correctly filled by decider

16  positions incorrectly filled

In three of the years, the system matches all of the Committee #3 seeds; in five of the years, it misses 1; in four of the years, it misses 2; in one of the years, it misses 3.   

#4 seeds:

52 positions to be filled

26 positions filled by standards

26 positions left to be filled

18 positions correctly filled by decider

8 positions incorrectly filled

In seven of the years, the system matches all of the Committee #4 seeds; in four of the years, it misses 1; in two of the years, it misses 2.

Finally regarding seeds, if we disregard the seed positions and look only at teams that got seeded, in six of the years the system seeds and the Committee seeds were identical; in four of the years, it misses 1; and in 3 of the years it misses 2.  In other words, of 208 teams getting a seed over the last 13 years, the system matches 198 of them and misses 10.

Thus for seeds, although the system has difficulty especially with #3 seeds, its issues with seeds ordinarily relate not to whether a team should be seeded but whether it should be seeded #3 or #4.

For illustration, I will use the 2019 Committee decisions as compared to the system decisions:

At large selections: The Committee decisions and system decisions match except that the Committee gave Utah an at large selection and the system would have given it to Alabama.

#1 seeds:  The Committee and system decisions are the same.

#2 seeds:  The Committee and system decisions are the same.

#3 seeds:  The Committee and system decisions are the same except that the system gives Duke a #3 seed, whereas the Committee gave Wisconsin that #3 seed position.

#4 seeds:  The Committee and system decisions are the same except that the system gave Wisconsin a #4 seed (which the Committee gave a #3).  The Committee gave Penn State the remaining #4 seed, leaving Duke with no seed.

Final Thoughts.  Here are some final thoughts about what this shows:

First, it is remarkable how close the system comes to the Committee’s actual decisions.  Although we do not know exactly how the Committee members and the Committee as a whole think, it is "as if" their thinking and the system’s are almost the same.

Second, the results of this work show that the Committee’s decision-making has been very consistent over time. 

Monday, October 12, 2020

THE 2020-21 SEASON: PRE-SEASON CONFERENCE RANKINGS

 Each conference’s coaches do pre-season rankings of teams within their conference.  Women’s soccer expert Chris Henderson (https://twitter.com/chris_awk) likewise does pre-season conference rankings.  And I do pre-season conference rankings.

Chris Henderson incorporates a number of factors into his rankings.  There is a point system for each factor:

Returning starters -- Starters are defined as last year’s top 10 minutes getters for field players and the goalkeeper with the most minutes.

Returning award winners from last year -- This rewards a team for top returning talent.  There is a sliding scale with higher value awards getting more points.

Returning award winners from the two years before last year -- Again, this rewards a team for top returning talent and has a sliding scale, allowing the system to better factor in superstar level players.

CoachRank -- This is based on Henderson’s system for ranking coaches.  A team whose coach has a high CoachRank score receives bonus points and a team with a low CoachRank score receives penalty points.

 Recruiting -- This is based on the TopDrawer Soccer player ratings, since there is not a better system available right now.  For transfers and international players, Henderson assigns his own player ratings.

Experience -- If a team has fewer than six returning starters, it gets penalized with the penalty increasing for each decrease in returning starters.

"Bust" potential -- If the value of last year’s talent is much higher or lower than the two preceding years, the team gets penalized.  This is to factor in teams that had a "fluke" recruiting year in 2019, for better or for worse.

Talent gap -- This penalizes teams that were far worse than much of the rest of the conference last year.  It assigns penalty points for teams with a league goal differential of worse than -1.0 per conference game, with the penalty escalating as the negative goal differential increases to greater whole numbers.

The scores for all of these factors are combined using a weighted formula and teams are ranked based on their overall scores.

My rankings likewise use a mathematical system, but it is completely different.  It looks at the rankings of teams historically and determines their ranking trends.  Based on trend formulas, it projects rankings for the coming year and converts those rankings to RPI ratings.  Using the conference schedule for the coming year, including game locations, it then compares each game’s opponents’ ratings as adjusted for home field advantage and based on the comparison assigns a game result (either win-loss or tie).  With the results of all the conference games, it assigns 3 points for a win and 1 for a loss, computes the conference points scored by each team, and ranks the teams accordingly.

I do not know how each coach ranks his or her conference’s teams, but I assume the coaches do it based on all of the knowledge they have, including from direct in-game observations, about each team in the conference.

The following table shows how our three sets of rankings compare for the four conferences having their conference regular season competition this Fall.  For each conference, the teams are in the order the coaches ranked them:



The three right-hand columns show the ranking difference, for each team, between the Henderson ranks and the coach ranks, between my ranks and the coach ranks, and between the Henderson ranks and my ranks.

At the end of the conference regular seasons, when we have the actual conference regular season rankings, I compare how well each set of rankings compares to the actual rankings.  2020-21 will be my third year doing these rankings, so I have 2018 and 2019 for which I have been able to compare pre-season rankings to actual end-of-regular-season rankings.  (Both Henderson and I have tweaked our ranking systems over this period.)  Here is how the ranking systems compared in 2018 and 2019 to actual conference regular season rankings for all conferences:

2018:  Coaches, on average, were within 2.03 positions of actual ranks

           Henderson was within 2.20

           I was within 2.24

2019:  Coaches within 2.16

   Henderson was within 2.22

           I was within 2.30

As you can see, it is very hard to do pre-season rankings of teams within a conference with a high degree of accuracy.  Although the coaches, with their direct experience and detailed knowledge of the other conference teams sometimes over a significant number of years, do the best with the pre-season rankings, they really do little better than the two mathematical systems.

This illustrates that there is not just one way to do reasonable advance rankings of teams.  Each of the above three systems brings its own perspective and contributes something to the picture of how teams are likely to do.  And none is able give a highly accurate picture of where teams will end up.  At least when it comes to Division I women’s soccer, even very good predictive models leave a pretty good degree of uncertainty.  Given that the three models here -- each of which is quite sophisticated in its own way -- are quite close in their degree of accuracy, there is a good possibility that they are approaching the limits of how close predictive models can come for the rankings of teams within their conferences.

Friday, October 9, 2020

THE 2020-2021 SEASON: RPI RATING AND RANKING ISSUES

 This will be the first of three posts.  In this one, I will summarize publicly available information about the 2020-21 season structure.  I also will describe issues that will make it impossible to have reliable RPI ratings and ranks based on the planned Fall schedules. And, I will give some thoughts on how we might end up having at least somewhat usable ratings and ranks come the Spring.

In my second post, I will compare three sets of pre-season rankings for the conferences that are playing conference schedules in the Fall.  The rankings are those of the coaches of each conference, Chris Henderson’s, and mine.

In my third post, I will describe some new work I have done on the patterns the Womens Soccer Committee has followed over the last 13 years in its at large selections and seeding for the NCAA tournament.

Current Information About the 20-21 Season.  Of the 31 conferences, 4 are playing Fall conference schedules:  the ACC, Big 12, SEC, and Sun Belt.  Of the other conferences, none are playing in the Fall except that a few will have some non-conference games:  CUSA, Missouri Valley, Patriot, Southern, and Southland.  Sixty teams are playing at least 1 Fall game, with 281 not playing in the Fall.

Two conferences, the Southland and Southwestern have announced their conference formats for Spring soccer (and SWAC has announced the dates of all its conference games).  The Southland definitely is allowing non-conference games.  The Southwestern has not made a statement about non-conference games, but its teams will have some dates on which they will be able to play non-conference games if allowed.

There presently are 243 Fall games scheduled, not counting conference tournaments and also not counting 8 postponed games likely to be re-scheduled.  This compares to about 3,000 games during a normal season.

For the teams that are playing at least some games in the Fall, they have scheduled an average of 9.4 games each (including conference tournaments):  7.3 conference and 1.1 non-conference.  That comes out to 88.3% conference and 11.7% non-conference.  With one exception (Pittsburgh), the ACC, Big 12, and SEC are playing conference only, so the bulk of the non-conference games involve the Sun Belt or other conferences that are playing a few Fall non-conference games.

Over the last 7 years (since the last major conference realignment), teams played an average of 18.2 games per year.  Of these, 56% were conference games and 44% non-conference.

For the Spring, according to the schedule adopted by the Division I Board of Directors, the regular season competition season will be February 3 to April 17.  The Committee will make its NCAA Tournament at large selections and seeds on Sunday, April 18.  (Ending the season on Saturday with the Committee making its decisions on Sunday, which appears to be what has been decided, is a little different than normal, where the season ends on a Sunday and the NCAA announces the bracket on Monday.) The bracket will consist of 31 conference automatic qualifiers plus 17 at large selections for a total of 48 teams.  It seems reasonable to assume that if some conferences do not play in the Spring, then the tournament bracket still will be 48 teams and the number of at large teams will increase.  Games teams play in the Fall will count as part of their season schedules for purposes of NCAA tournament bracket formation.  The NCAA tournament will run from Saturday, April 24 to Sunday, May 16.  The exact overall scheduling and game locations for the tournament are yet to be announced, but the NCAA Board of Governors has directed that all tournament game sites will be pre-determined and the number of sites will be reduced, for health and safety and operational management purposes.

Ratings and Ranks Reliability.  So far as RPI ratings and ranks are concerned, they will not be meaningful during the Fall.

As a general rule, the more games teams play the more reliable are mathematical rating systems like the RPI.  For most rating systems to function well, teams must play between 25 and 30 games.  As the numbers of games decrease, the ratings’ reliability decreases.

The NCAA has not published information on the minimum number of games it believes teams need to play in order for the RPI to have an acceptable level of reliability.  Regarding football, however, it has said that "a [Football Championship Subdivision] RPI would be very difficult to use" because of the small number of games FCS teams play.  This is why the NCAA does not use the RPI in selecting at large teams for the FCS NCAA Tournament.  FCS teams play up to 12 pre-NCAA Tournament games per year.  Given this, we know the NCAA believes 12 games per year is not enough for the RPI to be reliable.

To determine how many games teams need to play for the RPI to be reliable, I used the 2019 season as a case study.  In 2019, teams played an average of 18.74 games.  I started with teams’ actual end of season RPI ranks.  I then deleted all games from the first weekend of the season, which reduced the number of games teams played to 17.04 games.  All of the deleted games were non-conference.  I then recalculated the RPI and determined teams’ revised ranks.  After that, I repeated the process for successive weeks, for each week determining the new number of games per team and teams’ revised RPI ranks.  At the bottom of this post is a table that shows the results of this project.  The table shows the Top 250 teams, in order, with their actual end-of-season RPI ranks and their ranks after each successive deletion of games.  It also shows the teams’ NCAA tournament seeds and which unseeded teams got at large selections.  Using this table, I looked to see the number of games at which the RPI no longer was able to produce rankings that were reasonably close to teams’ actual end-of-season ranks.  I highlighted in green team RPI ranks that seemed unreasonably poor compared to their actual final ranks and in orange team ranks that seemed unreasonably good.  My conclusion from the table is that even slightly over 15 games per team produces too many unreasonable rankings for the RPI to be usable.  (In considering the table, it is important to know that historically, teams ranked 58 and poorer have not gotten at large selections; and the poorest rank getting a #1 seed has been 8, #2 has been 14, #3 has been 23, and #4 has been 26.)

Thus, for Fall 2020’s games alone, the planned 9.4 games per team is not enough for the RPI to be usable.

In addition, as stated above, historically 54.0% of games have been in-conference and 46.0% non-conference.  As shown in detail at the RPI: Non-Conference RPI page of the RPI for Division I Women’s Soccer website, even with this percentage of non-conference games, the RPI has trouble ranking teams from different conferences properly in relation to each other: It tends to rate teams from stronger conferences too poorly and teams from weaker conferences too well. When teams significantly reduce their numbers of non-conference games, this problem gets worse.  You can see this in the table at the bottom of this post, where as I deleted successive numbers of non-conference games, the rankings of teams from the strongest conferences tend to get poorer and the rankings of teams from other conferences tend to get better.  You can see this problem from a different perspective in the following table based on the 2019 season:


In this table, the second and third columns from the left show the 2019 actual conference average ARPI ratings and the resulting conference ranks.  The next column to look at is the fifth from the left, which shows the conference average ARPI ratings derived only from the conference regular season competition.  As you can see, all conferences have an identical 0.5000 rating except for the ACC at 0.4999.  The only reason for the ACC difference is that Syracuse v Virginia got canceled so that all teams did not play the same number of games.  So long as all teams play the same number of games in conference regular season competition, conferences always will have 0.5000 ratings if the ratings are based on only their conference competition.  In other words, based only on conference regular season competion, the RPI cannot distinguish among the conferences.  If you look at the fourth column from the left, you will see that conference tournaments slightly disrupt this equality.  This, however, has nothing to do with how strong the conferences are, it only has to do with the structure of their conference tournaments.  Finally, if you look at the sixth and seventh columns from the left, you will see the RPI ratings and ranks of the conferences derived only from their non-conference games.  The eighth column shows the difference between these ranks and the actual RPI ranks.  In most cases, and particularly for the stronger conferences, these differences are small.  Thus in-conference games pull conference and conference team ratings to 0.5000, whereas non-conference games pull conference and conference team ratings to the level of their true strength among the entire body of Division I teams.  Because of this, any reduction in the proportion of non-conference games will result in teams from stronger conferences being underrated and teams from other conferences being overrated.  The table at the bottom of this post shows this effect.

In addition, historically 81.9% of games are played within geographic regional groups and 18.1% are non-regional.  As shown on the RPI: Regional Issues page of the RPI website, this is not a big enough proportion of non-regional games for the RPI to be able to properly rate and rank teams from one regional group in relation to teams from other regional groups.  Thus if teams move to more region-based non-conference schedules to reduce travel, it will cause the RPI to discriminate against some teams from some geographic regions -- most notably the West -- and in favor of others even more than it already does.

Usable Ratings and Ranks in the Spring.  As I look at the Spring season, I see some possibilities for scheduling that will make RPI ratings and ranks at least somewhat usable:

1.  The ACC, Big 12, SEC, and Sun Belt are playing their conference schedules in the Fall, so they can play non-conference schedules in the Spring.  This will be especially critical for the ACC, Big 12, and SEC to do since they are playing almost entirely conference-only Fall schedules.

2.  The Spring game playing season is 11 weeks.  This will give conferences and teams not playing in the Fall the opportunity to play close to full conference and non-conference schedules.

Thus it is possible, by the end of the Spring game playing season, that teams will have played close to their normal schedules.  If that happens, then the RPI will work more or less as it normally does.

I see two issues, however, that may cause problems.  One will be if teams significantly reduce the numbers of games they play, in particular reducing their proportions of non-conference games.  The second will be if teams shift to scheduling more regionally close non-conference opponents, significantly reducing their proportions of non-regional games.

NCAA Tournament Bracket Formation.  If either of these problems occurs, the Womens Soccer Committee will need to be aware of it and do the best it can, with the available data, to compensate for the RPI’s weaknesses.  If either occurs to the extreme, it will create a big problem for the Committee.  Since I believe there will be the opportunity for significant numbers of non-conference games, my biggest concern is about the possibility that we will see almost entirely region-based non-conference scheduling.  If that occurs, then the Committee will need to recognize that although the RPI may be good for rating and ranking teams within a geographic region in relation to each other, it will have little value when it comes to rating and ranking teams from different regions in relation to each other.  In particular, it will tend to underrate teams from the West.  I am not sure whether or how the Committee will be able to deal with this problem.