Sunday, December 27, 2020

NCAA TOURNAMENT: TEAM PROFILES AND AT LARGE SELECTIONS

Team Profiles.  The Women’s Soccer Committee has a number of factors the NCAA requires it to use to pick the at large participants in the NCAA Tournament.  I break these down into 13 factors:

    RPI Rating

    RPI Rank

    Non-Conference RPI Rating

    Non-Conference RPI Rank

    Results against teams already selected for the bracket, including automatic            qualifiers ranked #75 or better.  Based on my observations of Committee               decisions over time, I use a surrogate for this factor: Results Against Top 50            Opponents

    Results Against Top 50 Opponents Rank

    Head to Head Results, for which I use the surrogate of Head to Head Results            Against Top 60 Opponents

    Results Against Common Opponents, for which I use the surrogate of Results        Against Common Opponents with Top 60 Teams

    Results Against Common Opponents Rank

    Conference Regular Season Standing and Conference Tournament Results, for        which I use the surrogate of Conference Standing (Combined)

    Conference Average RPI

    Conference Average RPI Rank

    Results Over the Last 8 Gmes, for which I use the surrogate of Poor Results

The NCAA has a scoring system for some of these factors, such as the RPI.  For other factors it does not: Results Against Top 50 Opponents, Head to Head Results Against Top 60 Opponents, Results Against Common Opponents with Top 60 Teams, Conference Standing (Combined), Poor Results.  For each of these, I have my own scoring system.

Together, teams’ scores for the above factors make up a their profiles.

The Results Against Top 50 Opponents factor is important and relates to scheduling, so it is worth describing the scoring system.  In looking at the Committee’s decisions among bubble teams, it appears that a few good results against highly ranked teams count a whole lot more than a greater number of good results against moderately ranked teams.  Further, good results against teams ranked below #50 appear not helpful at all (apart from their influence on teams’ RPI ratings and ranks).  This suggests that the Committee asks (whether consciously or not), "At how high a level have you shown you can compete?" and tends to select teams based on the answer to that question.  With that in mind, I developed this scoring system for Results Against Top 50 Opponents:


As you can see, this scoring system is very highly skewed towards good results against very highly ranked opponents.

In addition to the 13 individual factors, I use paired factors.  A paired factor puts two of the individual factors together using a formula that weights the two factors equally.  I do this because of how I have imagined a Committee member might think:  Yes, Team A’s RPI Rank is poorer than Team B’s, but when you look at their Results Against Top 50 Opponents, Team A’s are better, and when you look at the two factors together, Team A looks better overall. 

I pair each individual factor with each other individual factor.  After doing this, I end up with 78 paired factors, which when added to the 13 individual factors gives me a total of 91 factors.

Team Profiles and the Committee’s At Large Selections.  Using data from the last 13 years, including team factor and Committee at large selection data, for each factor there are two key questions:

1.  Is there a factor score (a "yes" standard) where, for a team with that score or better, the team always has gotten an at large selection?

2.  Is there a factor score (a "no" standard) where, for a team with that score or poorer, the team never has gotten an at large selection?

Using RPI Rank as an example factor, teams with RPI Ranks of #30 or better always have gotten at large selections.  And, teams with RPI Ranks of #58 or poorer never have gotten at large selections.  Thus the RPI Rank "yes" standard is 30 and the "no" standard is 58.

Asking these two questions for each of the 91 factors produces a "yes" and a "no" at large selection standard for most of them.

At the end of a season, I then can apply the standards to the end-of-season data and, based on which teams meet "yes" and "no" at large selection standards, project what the Committee will decide if it follows its historic patterns.  (And, after the season is over and if the Committee has made decisions that do not match the standards, I can revise the standards to be sure they are consistent with all past Committee decisions.)

When I apply this process for a season that just has ended, I end up with some teams that meet only "yes" at large standards, some that meet only "no" standards, a few that meet some "yes" and some "no" standards (which means they have profiles the Committee has not seen in the past), and some that meet no "yes" or "no" standards.  So far, every year there have not been enough teams that meet only "yes" standards to fill all the at large slots, so there always have been some open slots -- ranging from 2 to 8 open slots since 2007.  In my system, the teams that meet only "no" standards are out of the picture.  Thus the remaing open slots are to be filled by the teams that meet no "yes" or "no" standards or that meet some of each.

How well does this process work?  I run tests to answer this question, applying the current standards retroactively to each of the 13 years in my data base.  This tells me: If I had had the standards at the beginning of the 2007 season and had applied them to each year’s end-of-season data, how many of the teams actually getting at large selections would the standards have picked?  Here is the answer:

Since 2007, there have been 435 at large positions to fill.  My standards, if applied to the Top 60 teams in each of those seasons (that were not Automatic Qualifiers), would have filled 374 of the 435 positions.  This would have left 61 postions to fill, with 90 candidate teams to fill them (each of which met no "yes" and no "no" standards).  This amounts to a little under 5 positions per year that the standards by themselves cannot fill with a pool of roughly 7 teams from which to fill them.

The next question is: Which of the factors was the most powerful at correctly identifying at large selections from among the Top 60 teams that were not Automatic Qualifiers.  As it turns out, these are the most powerful, in order:

RPI Rank and Top 50 Results Score (factor pair): correctly identified 313 at large selections

RPI Rank and Conference Rank: 301

RPI Rank and Top 50 Results Rank: 274

RPI Rank: 270

RPI Rating and Conference Rank: 255

RPI Rank and Common Opponents Rating: 253

RPI Rank and Conference ARPI: 251

RPI Rank and Poor Results: 249 

RPI Rating and Common Opponents Rating: 248

RPI Rating: 245

RPI Rank and Common Opponents Rank: 244

RPI Rating and Common Opponents Rank: 236

RPI Rating and Top 50 Results Rank: 212

RPI Rating and Top 50 Results Score: 207

After this is a big drop in the power of factors

As stated above, after applying all of the factor standards to the profiles of each year’s Top 60 teams that were not Automatic Qualifiers, in order to fill at large positions via the standards, I was left with 61 at large possitions to fill over the 13 years for which I have data and 90 candidates from which to fill them.  I then took each of the above factors, as well as some others that are powerful for seeds but not so much for at large selections, and asked: From the candidate teams each year, what if I gave the remaining at large selections to the ones scoring the best on this factor?  How many correct selections would this factor make?  Here are the results:

RPI Rank and Top 50 Results Rank: 47 correct selections (for 61 positions)

RPI Rating and Top 50 Results Rank: 46

RPI Rank and Conference Rank: 46

RPI Rating and Conference Rank: 46

RPI Rank and Conference RPI Rating: 45

RPI Rating and Common Opponents Rating: 45

RPI Rating and Common Opponents Rank: 45

RPI Rank and Common Opponents Rating: 45

RPI Rank and Common Opponents Rank: 45

Head to Head Results: 45

Finally, I have asked one more question: What if I use only one factor to pick all of the at large teams (rather than using the factor standards system)?  How would that compare to the Committee’s actual selections?  When I run that test, here are the results:

RPI Rank and Top 50 Results Rank: correctly picks 408 of the Committee’s 435 at large selections

RPI Rating and Top 50 Results Score: 406

RPI Rank and Top 50 Results Score: 405

RPI Rank and Conference Rank: 405

RPI Rank and Conference RPI: 403

RPI Rating: 402

RPI Rank: 401

RPI Rating and Top 50 Results Rank: 399

RPI Rank and Poor Results: 397 

RPI Rank and Common Opponents Results: 394 

In other words, the RPI Rank and Top 50 Results Rank factor correctly matches 408 out of the 435 at large selections the Committee has made over the last 13 years, or all but roughly 2 per year.

A way to think about this is that if a team is in the Top 60 and scores well on the RPI Rank and Top 50 Results Rank factor, then the rest of its profile necessarily is going to be very good.  Thus whatever the Committee members think about and discuss, they are highly likely to make at large selections as if they consider teams’ RPIs and their Top 50 Results, paired together, as the most important -- indeed almost decisive -- aspect of their profiles.

Saturday, December 26, 2020

NCAA TOURNAMENT: WHAT THE SPRING 2021 BRACKET MIGHT LOOK LIKE

Background.  Ordinarily, the NCAA Tournament is a 64 team bracket of 31 conference champion Automatic Qualifiers plus 33 at large selections.  For the Spring 2021 Tournament, it will be a 48 team bracket of 29 conference champions (assuming only the Ivy League and Big West will not play during the 20-21 season) plus 19 at large selections.

For the Committee, making at large selections will be a challenge.  While we will not know until we have the full Spring schedule, there is a good possibility the RPI will be either not useful or significantly impaired.  In order for it to work, there must be enough games per team, a big enough proportion of non-conference games, and a big enough proportion of out-of-region games.  Although ordinarily there really is not a big enough proportion of non-conference games, there are enough that the Committee can work around it by considering factors other than the RPI.  This year, however, there is a good chance there will be fewer, and possibly significantly fewer, non-conference games, making any work-around difficult if not impossible.  In addition, even ordinarily there is not a big enough proportion of out-of-region games, with no obvious way for the Committee to work around it.  This year, it is possible there will be far fewer out-of-region games; and if so, the RPI will not work at all for comparing teams from one geographic region to teams from other geographic regions.  Thus overall, if the RPI has any usefulness, it may be greatly diminished.

If that happens, how is the Committee going to decide on which 19 at large teams should fill out the bracket?  There are many ways to answer this question.  Here are a few of them.

Possible Overall Approach.  First, past bracket history may have to be a resource for what the overall bracket should look like (notwithstanding that NCAA policy opposes using past history).  Second, in filling out the bracket, it may be necessary to look more than usual at where teams have finished within their conference regular season and conference tournament competitions.  Third, when deciding among bubble teams it may be necessary to rely heavily on results against other bubble teams and teams that clearly will be in the bracket.

Using Past Bracket History: Possible Approaches.

Proportionality Approach:  The number of at large selections this year will be 19/33 or 57.6% of the usual number.

The following NCAA Tournament at large selection table provides data related to all 33 of the at large selections in each of the last five years:

This table shows, for each conference:

For each of the last five years, the number of at large teams the conference had in the NCAA Tournament 

Over the five years, the minimum number of teams the conference had in the Tournament, the maximum number, and the difference between the minimum and maximum 

Over the five years, the same minimum and maximum numbers, but reduced to 57.6% of those numbers to reflect the reduction of at large positions from 33 to 19

In the last three columns, using the reduced minimum and maximum numbers: the minimum rounded down,  the maximum rounded up, and the difference between the minimum and maximum 

A way to look at the columns on the right is that for each conference, one would expect to see it have at least the minimum rounded down number of at large selections this Spring.  The sum of all these minimum numbers is 9, as the total at the bottom of the column shows.  Thus this fills 9 at large positions, leaving 10 yet to be filled.  The potential number of additional teams, on the right, is what one would expect to be the maximum number of teams that a conference also might get as at large selections.  Thus, one would expect the ACC to have at least 3 at large selections, plus the possibility of 3 more for an outside limit of 6.  And, one would expect the SEC to have at least 2 at large selections, plus the possibility of 3 more for an outside limit of 5.  The sum of all the potential additional selections is 25, as the total at the bottom of the column shows.  One of those 25 teams, however, would be from the Big West which will not be playing so the number comes down to 24.  Thus one would expect there to be up to 24 teams competing for the last 10 at large positions.

A problem with this approach, however, is that it assumes that each conference’s teams getting at large selections are distributed evenly across the Committee’s rankings.  It does not take into account the possibility that one conference may have had its at large teams near the top of the Committee’s rankings and another conference may have had them near the bottom.

Committee’s Top 19 Approaches:  A better approach would be to look at the Committee’s rankings of the at large selections for each year and simply identify the conferences of the Committee’s top 19 selections.  The difficulty with this, however, is that we do not know the rankings the Committee gave its selections.  Nevertheless, a possible similar approach is to use educated guesses about the Committee’s top 19.  Here are three different educated guess approaches:

Factor Standards Approach:  This approach uses my factor standards system, which is based on the Committee’s decisions over the last 13 years.  The approach, for each of the last five years, looks at each team that got an at large selection and at how many factor standards that team met saying "yes" it gets an at large selection.  It then ranks those teams based on the number of standards they met (the more standards met, the better) and selects the top 19 teams based on their ranks.

This approach produces the following conference results for the top 19 selected each year:


 Based on this table, one would expect a conference to receive at least the number of at large selections in the Minimum column and to have the potential to receive up to the additional number of selections in the Difference column.  Overall, 11 teams would be in the expected at large selection group, leaving 8 open slots with 19 teams in the potential at large selection group to fill those slots.

First Round Home Approach Combined with Factor Standards.  A problem with the Factor Standards approach is that it is not completely consistent with the Committee’s home field assignments in the first round of the NCAA Tournament.

Each year, we know the teams in the top 16 of the Committee’s rankings, since the Committee seeds the top 16.  The Committee, however, does not explicitly rank any of the other teams in the bracket.  But, we do know the other 16 teams to which the Committee gives home field advantage in the first round games.  And, although there is no published NCAA policy on how the Committee assigns home field other than for the seeded teams, there are indications that the Committee ordinarily gives home field to the teams it considers to fill the next 16 positions in the rankings after the seeded teams.

If we assume that the Committee assigns home field based on its own rankings of teams, this means that they rank the teams they seed in the #1 through #16 group and they rank the other teams with home field in the #17 through #32 group.  With this assumption, another approach to the upcoming Tournament bracket is to start with the teams that have had home field in each of the last five years and then rank those teams based on my factor standards approach.

This approach produces the following conference results for the top 19 selected each year:

Overall using this approach, 12 teams would be in the expected at large selection group, leaving 7 open slots with 17 teams in the potential at large selection group to fill those slots.

 First Round Home Approach Combined with the RPI Rank and Top 50 Results Rank Factor.  This approach uses the Committee home field assignments but then ranks the teams getting home field based on their scores for my combined RPI Rank and Top 50 Results Rank factor.  This is the most powerful of the factors and, by itself, is able on average to correctly pick all but 2 of the Committee at large selections per year over the last 13 years.

This approach produces the following conference results for the top 19 selected each year:


 Overall using this approach, 12 teams would be in the expected at large selection group, leaving 7 open slots with 15 teams in the potential at large selection group to fill those slots.

Comparing all four of the above tables, there are differences, but they are not extreme, being mainly in the area of expected numbers of at large positions rather than in the area of maximum potential positions.

Picking Teams to Fill Conference Expected At Large Slots and Identifying Conference Candidates for the Remaining at Large Slots.  How then would the Committee pick the teams that should get the positions expected for particular conferences?  And how would it identify the candidates from which the remaining open positions should be filled?  For these tasks, I think the Committee may be forced to rely heavily on team finishing places within their conference regular season standings and tournaments, since those may be the best indicators the Committee will have of how teams in each conference rank in relation to other teams in that conference.

Picking Teams to Fill the Remaining At Large Slots.  If the Committee were to allocate the expected at large positions per conference based on conference regular season standings and tournaments, how would the Committee fill the remaining at large slots from among the candidates.  For this, it seems like the Committee would ask the question:  Which teams have shown they are the most deserving of at large positions as compared to others in the candidate group?

How would candidates in the candidate group show they are the most deserving of at large positions?  The obvious way for them to do this would be through good results against the other candidates and, even better, good results against the expected at large selections and against the Automatic Qualifiers from the potential multi-team conferences.

From a scheduling perspective, this means that a candidate team would have to have played a significant number of games against the group comprised of the other candidates, the expected at large selections, and the Automatic Qualifiers from the potential multi-team conferences.  The number of games would have to be significant because the candidate cannot expect to get a good result in every game against teams from this group and so, needs to play enough games that it might get good results in some of them.

Forming the Bracket.  Here is an example of how the above at large selection process would work, except for selections from the candidate group of the teams to fill the last open slots.  I will use the last table above as the basis for numbers of expected and potential at large selections from each conference.  For conferences that played their conference regular season schedules and tournaments (if they had one) this Fall, to assign conference rankings to the teams I have used their combined regular season and tournament standings in the Fall (weighting the regular season and tournament finishing positions equally).  For conferences that have deferred their conference competitions to the Spring, to assign conference rankings I looked at their 2019 conference regular season and tournament results to assign them positions within their conferences.  Based on their rankings within their conferences, I then have assigned teams as expected at large selections or potential additional at large selections according to the expected and potential slots for each conference in the above table.  (NOTE:  In ranking teams within a conference,  I use the classic 3 points for a win and 1 for a tie method.  Where teams finish tied on points in the conference regular season competition, I assign them the average of the finishing positions they occupy.  For example, if two teams are tied for 1st, I treat them as occupying the 1 and 2 positions and assign each the average rank of 1.5.  For the conference tournament, I assign the winner the rank of 1, the runner up 2, the losing semi-finalists 3.5, the losing quarterfinalists 6.5, and so on.)

Here are the results of that process (with the first number for each team being its conference regular season finishing position and the second number its conference tournament finishing position).  For the teams that did not have their conference competitions in the Fall, the teams I list are speculative -- after completion of their Spring conference competitions, one would have to replace the 2019 teams identified below with the actual Spring 2021 teams:

ACC:  5 expected at large selections plus 1 potential selection 

Automatic Qualifier:  Florida State (1.5, 1)

At large expected:  North Carolina (1.5, 2); Virginia (3, 3.5); Duke (5, 3.5); Clemson (4, 6.5); plus one of Louisville, Virginia Tech, or Notre Dame

Potential at large:  Louisville (7, 6.5); Virginia Tech (7, 6.5); or Notre Dame (7, 6.5)

SEC:  3 expected at large selections plus 2 potential selections

AQ:  Vanderbilt (5.5, 1)

At large expected: Arkansas (1.5, 2); Texas A&M (1.5, 3.5); South Carolina (3, 3.5)

Potential at large: Tennessee (4, 6.5); Auburn (7.5, 6.5); or Missouri (7.5, 6.5)

Pac 12:  2 expected at large selections plus 2 potential selections

AQ:  Stanford (1)

At large expected:  UCLA (2); Southern California (3.5)

Potential at large:  Washington (3.5); California (5)

Big 10:  1 expected at large selection plus 3 potential selections

AQ:  Penn State (4, 1)

At large expected:  Michigan (2.5, 2)

Potential at large:  Rutgers (2.5, 3.5); Wisconsin (1, 6.5); Iowa (5, 6.5)

Big 12:  1 expected at large selections plus 2 potential selections

AQ:  TCU (1)

At large expected:  WestVirginia (2)

Potential at large:  Oklahoma State (3); Kansas (4)

American:  0 expected at large selections plus 2 potential selections

AQ:  South Florida (2, 1)

At large expected:  none

Potential at large:  Memphis (1, 2); UCF (3, 3.5)

West Coast:  0 expected at large selections plus 2 potential selections

AQ:  BYU (1)

At large expected:  none

Potential at large:  Santa Clara (2); Pepperdine (3)

Big East:  0 expected at large selections plus 1 potential selection

AQ:  Xavier (1, 1)

At large expected:  none

Potential at large:  Georgetown (2, 2)

Colonial:  0 expected at large selections plus 0 potential selections

Conference USA:  0 expected at large selections plus 0 potential selections

Friday, November 27, 2020

RPI RANKINGS BASED ON FALL GAMES: A LESSON ON HOW THE RPI WORKS

 Although the NCAA has not published RPI rankings for the Fall, I have been generating them.  They are meaningless as ranks of teams, but they do help illustrate how the RPI works.

During the Fall, four conferences played conference schedules:  ACC, Big Twelve, SEC, and Sun Belt.  In addition, some teams from four other conferences played at least a few games: Conference USA, Southern, Southland, and Patriot.  All told, 58 teams played at least some games.

Here are the RPI rankings for the teams that played at least some games in the Fall:

Looking at Florida State and North Carolina in the #1 and #2 positions, you might think the rankings look pretty good.  As soon as you get to Arkansas State and Central Arkansas at #5 and #6, you can see the rankings are not good.

If you look only at the conferences that played conference schedules, you can see the distribution of teams, in order, is as follows:

ACC, ACC, SEC, Big 12, Sun Belt, SEC, ACC, Sun Belt, Big 12, ACC, SEC, Sun Belt, Sun Belt, ACC, Big 12, SEC, Big 12, ....

 What this shows is that the distribution of teams based on their conferences is as if the conferences are almost equal.

Here is how the conferences stack up based on average RPI ratings and ranks:


The four conferences to look at on the table are the top 4 that played conference schedules.  The key one is the Big 12.  It played a full round robin, did not have a conference tournament, and ended with an average RPI rating of 0.5000.  What is important about that is, it shows what the average RPI rating will be for any conference that plays a full round robin with no conference tournament and no non-conference games.

So, why do the ACC, SEC, and Sun Belt have different average ratings?  The ACC did not play a full round robin, some of its teams played extra games against each other, it had a conference tournament, and Pittsburgh played a number of games against teams from other conferences winning all of them.  Because of the RPI structure, the Pittsburgh non-conference wins benefitted all of the conference teams’ ratings, thus accounting for the conference having an average rating slightly above 0.5000.  The SEC played less than a full round robin and an extended conference tournament, and two teams missed a game, resulting in its average rating being slightly below 0.5000.  The Sun Belt played less than a full round robin and a conference tournament, plus some non-conference games, accounting for its average rating being below 0.5000.

This provides a great illustration of the fact that if conferences play only conference games, the RPI cannot rate or rank the conferences’ teams properly in relation to teams from other conferences.  Rather, the RPI will distribute the teams from the different conferences relatively equally across all the ratings and rankings.  In order to correct for this, the RPI is completely dependent on non-conference games.

This is a point to bear in mind as teams begin to fill out their schedules for play in the Spring:  In order for the RPI to be usable, teams will need to play a significant number of non-conference games.  In fact, based on studies I have done, they will need about half of their games to be non-conference or the RPI will be impaired.

Some of the conferences, however, already have indicated they will be playing no non-conference games for the entire season -- so far, Mountain West and Ohio Valley.  Others are playing expanded conference schedules that will not allow them to come close to half of their games being non-conference -- so far, WAC will have 14 conference games and Summit will have 16.  And the Big South will play a full round robin of 9 games each with a limitation to 2 non-conference games.  The RPI will not work for ranking these conferences’ teams in relation to teams from other conferences.

Even if teams from other conferences end up coming close to half of their games being non-conference, another concern will have to do with travel limitations.  Just as the RPI cannot rank teams from different conferences properly in relation to each other without enough non-conference games, it cannot rank teams from different geographical regions properly in relation to each other without enough outside-of-region games.  Historically, 20% of games have been out-of-region, and that really is not enough.  For the Fall, the number is 1.5%.  So this will be something else to watch for in the Spring.

The bottom line of this is that a big question for the Spring and the NCAA Tournament will be whether the RPI will be useful in relation to Tournament at large selections and seeds.  And, if it is not, an equally big question will be how the Women’s Soccer Committee will make its decisions.

Thursday, October 15, 2020

NCAA TOURNAMENT AT LARGE SELECTIONS AND SEEDS: IT’S "AS IF" THIS IS HOW THE COMMITTEE THINKS

 When making NCAA tournament at large selections and seeds, the Women’s Soccer Committee considers a number of data-based factors.  For the at large selections, the Committee must base its decisions on these factors.  For the seeds, the Committee considers these factors but is not bound by them.

What factors have the greatest effect on the Committee decisions?  What is the thinking process of the Committee in getting to its decisions?  For coaches whose teams are contenders for at large selections and seeds, these are important questions.

The details are confidential of what the individual Committee members think, what the Committee discussions are, and the reasoning by which the Committee makes its decisions.  In today’s world, however, it is possible, with the aid of a computer and programming, to analyze the relationship between the data the Committee uses and the decisions it makes.  This can let us say that the Committee makes decisions "as if" the Committee thinks in a particular way.

At the RPI for Division I Women’s Soccer website, starting with the NCAA Tournament: Predicting the Bracket, At Large Selections page and the page that follows it, I have described a computer-driven system I have used for the last few years to analyze the Committee’s decisions.  The system is set up based on matching data for the factors the Committee considers to the Committee’s actual decisions. It lets me say that although I do not know how the Committee members think they make their decisions, it is "as if" they make them this way.

The Factors.  Here are the primary factors the Committee is supposed to consider.  For factors where there is not an obvious "score" associated with the factor, I have set up a system that assigns a score to each team based on its game results.  For a more detailed description of the factors and the scoring systems I have set up for them, go to the page linked above at the RPI website.

RPI rating

RPI rank

Non-conference RPI rating

Non-conference RPI rank

Results against Top 50 opponents

Top 50 results rank

Combined conference tournament and conference regular season standing

Conference average ARPI

Conference average ARPI rank

Top 60 head to head results

Top 60 common opponent results

 Top 60 common opponent results rank

Poor results

Number of Top 60 opponents:  This is not a factor set by the NCAA, although it could fall under the heading Strength of Schedule.  Nevertheless, it is something my system calculates as an aid to coaches with NCAA tournament aspirations.  It gives them a picture of how many Top 60 opponents they should think about scheduling.

Paired factors:  In addition to those single factors, my system also uses paired factors.  For each single factor, I pair it with each other factor weighted 50-50 for each factor in a pair.  (I do not use the Number of Top 60 Opponents in setting up the paired factors.) 

The Standards.  My system compares all of the Committee’s at large and seeding decisions over the last 13 years with the Top 60 teams’ factor scores.  This is for teams that did and did not get at large selections, #1 seeds, #2 seeds, and so on.  The system tells me that teams scoring x or better on a particular factor always got a 👍 decision from the Committee, whereas teams scoring y or poorer always got a 👎.  As examples, teams ranked #1 by the RPI always have gotten a #1 seed.  Teams ranked #8 or poorer never have gotten a #1 seed.

This process, in most cases, gives me a "yes" and a "no" standard for each factor (whether single or paired) for each of the at large, #1 seed, #2 seed, and so on, decisions.

To see the standards, go to the RPI website page linked above for at large selections (bottom of the page) and the page that follows it for seeds.

The Bracket.  At the end of the season, my system applies the standards to the factor scores of the Top 60 teams.  Teams that only meet "yes" standards for a decision get a 👍 for that decision, for example an at large selection.  Teams that only meet "no" standards get a 👎 decision.  Ordinarily, this will leave a number of teams in the middle that meet no "yes" and no "no" standards.  If all the positions -- at large selections or particular seeds -- are not filled yet, these teams are candidates to fill the vacant positions.  Also, for a new year, occasionally there are teams that meet both some "yes" and some "no" standards.  These are teams with profiles the Committee has not seen over the last 13 years (and that will require me to update the standards for future years to incorporate what the Committee decided as to these teams).  They also are candidates to fill the vacant positions.

New Addition to the System.  The above describes the system I have used for several years.  To improve the system, I have worked this year to see if there is an additional step I can add to the system so it will come closer to exactly matching the Committee’s decisions.  The additional step would select teams from among the candidates for as yet unfilled at large selection and seed positions based on the question, "All other things being equal, what should be the ‘decider’ for selection purposes?"

To do this, I looked at each of the single factors and the most "powerful" of the paired factors to see, if I used one of them as the "decider," how many of that "decider’s" decisions would match the Committee decisions.

My work produced quite clear results: Decisions based on the paired RPI rank and Top 50 results rank factor produce the best match with the Committee decisions.  In other words, if my system fills vacant slots by choosing from the candidate teams those with the best scores for that paired factor, my system comes closer to the teams the Committee actually selected than it would using any other factor.  The only exception is for the #1 seeds. There, Poor Results is the only factor that picks the correct teams to fill the #1 seed vacancies.

Results.  Here is how well the updated system works when applied retrospectively to the last 13 years:

At large selections:

 435 postions to be filled

369 positions filled by standards

66 postions left to be filled

 51 of the still open positions correctly filled by decider

15 positions incorrectly filled

In five of the years, the system matches all of the Committee at large selections; in four of the years, it misses 1; in one year, it misses 2; and in three years, it misses 3. 

#1 seeds:

52 positions to be filled

50 positions filled by standards

2 positions left to be filled

2 positions correctly filled by decider

0 decisions incorrectly filled 

#2 seeds:

52 positions to be filled

40 positions filled by standards

12 positions left to be filled

 8 positions correctly filled by decider

 4 positions incorrectly filled

In ten of the years, the system matches all of the Committee #2 seeds; in two of the years, it misses 1; in one year, it misses 2.  

  #3 seeds:

52 positions to be filled

13 positions filled by standards

39 positions left to be filled

23 positions correctly filled by decider

16  positions incorrectly filled

In three of the years, the system matches all of the Committee #3 seeds; in five of the years, it misses 1; in four of the years, it misses 2; in one of the years, it misses 3.   

#4 seeds:

52 positions to be filled

26 positions filled by standards

26 positions left to be filled

18 positions correctly filled by decider

8 positions incorrectly filled

In seven of the years, the system matches all of the Committee #4 seeds; in four of the years, it misses 1; in two of the years, it misses 2.

Finally regarding seeds, if we disregard the seed positions and look only at teams that got seeded, in six of the years the system seeds and the Committee seeds were identical; in four of the years, it misses 1; and in 3 of the years it misses 2.  In other words, of 208 teams getting a seed over the last 13 years, the system matches 198 of them and misses 10.

Thus for seeds, although the system has difficulty especially with #3 seeds, its issues with seeds ordinarily relate not to whether a team should be seeded but whether it should be seeded #3 or #4.

For illustration, I will use the 2019 Committee decisions as compared to the system decisions:

At large selections: The Committee decisions and system decisions match except that the Committee gave Utah an at large selection and the system would have given it to Alabama.

#1 seeds:  The Committee and system decisions are the same.

#2 seeds:  The Committee and system decisions are the same.

#3 seeds:  The Committee and system decisions are the same except that the system gives Duke a #3 seed, whereas the Committee gave Wisconsin that #3 seed position.

#4 seeds:  The Committee and system decisions are the same except that the system gave Wisconsin a #4 seed (which the Committee gave a #3).  The Committee gave Penn State the remaining #4 seed, leaving Duke with no seed.

Final Thoughts.  Here are some final thoughts about what this shows:

First, it is remarkable how close the system comes to the Committee’s actual decisions.  Although we do not know exactly how the Committee members and the Committee as a whole think, it is "as if" their thinking and the system’s are almost the same.

Second, the results of this work show that the Committee’s decision-making has been very consistent over time. 

Monday, October 12, 2020

THE 2020-21 SEASON: PRE-SEASON CONFERENCE RANKINGS

 Each conference’s coaches do pre-season rankings of teams within their conference.  Women’s soccer expert Chris Henderson (https://twitter.com/chris_awk) likewise does pre-season conference rankings.  And I do pre-season conference rankings.

Chris Henderson incorporates a number of factors into his rankings.  There is a point system for each factor:

Returning starters -- Starters are defined as last year’s top 10 minutes getters for field players and the goalkeeper with the most minutes.

Returning award winners from last year -- This rewards a team for top returning talent.  There is a sliding scale with higher value awards getting more points.

Returning award winners from the two years before last year -- Again, this rewards a team for top returning talent and has a sliding scale, allowing the system to better factor in superstar level players.

CoachRank -- This is based on Henderson’s system for ranking coaches.  A team whose coach has a high CoachRank score receives bonus points and a team with a low CoachRank score receives penalty points.

 Recruiting -- This is based on the TopDrawer Soccer player ratings, since there is not a better system available right now.  For transfers and international players, Henderson assigns his own player ratings.

Experience -- If a team has fewer than six returning starters, it gets penalized with the penalty increasing for each decrease in returning starters.

"Bust" potential -- If the value of last year’s talent is much higher or lower than the two preceding years, the team gets penalized.  This is to factor in teams that had a "fluke" recruiting year in 2019, for better or for worse.

Talent gap -- This penalizes teams that were far worse than much of the rest of the conference last year.  It assigns penalty points for teams with a league goal differential of worse than -1.0 per conference game, with the penalty escalating as the negative goal differential increases to greater whole numbers.

The scores for all of these factors are combined using a weighted formula and teams are ranked based on their overall scores.

My rankings likewise use a mathematical system, but it is completely different.  It looks at the rankings of teams historically and determines their ranking trends.  Based on trend formulas, it projects rankings for the coming year and converts those rankings to RPI ratings.  Using the conference schedule for the coming year, including game locations, it then compares each game’s opponents’ ratings as adjusted for home field advantage and based on the comparison assigns a game result (either win-loss or tie).  With the results of all the conference games, it assigns 3 points for a win and 1 for a loss, computes the conference points scored by each team, and ranks the teams accordingly.

I do not know how each coach ranks his or her conference’s teams, but I assume the coaches do it based on all of the knowledge they have, including from direct in-game observations, about each team in the conference.

The following table shows how our three sets of rankings compare for the four conferences having their conference regular season competition this Fall.  For each conference, the teams are in the order the coaches ranked them:



The three right-hand columns show the ranking difference, for each team, between the Henderson ranks and the coach ranks, between my ranks and the coach ranks, and between the Henderson ranks and my ranks.

At the end of the conference regular seasons, when we have the actual conference regular season rankings, I compare how well each set of rankings compares to the actual rankings.  2020-21 will be my third year doing these rankings, so I have 2018 and 2019 for which I have been able to compare pre-season rankings to actual end-of-regular-season rankings.  (Both Henderson and I have tweaked our ranking systems over this period.)  Here is how the ranking systems compared in 2018 and 2019 to actual conference regular season rankings for all conferences:

2018:  Coaches, on average, were within 2.03 positions of actual ranks

           Henderson was within 2.20

           I was within 2.24

2019:  Coaches within 2.16

   Henderson was within 2.22

           I was within 2.30

As you can see, it is very hard to do pre-season rankings of teams within a conference with a high degree of accuracy.  Although the coaches, with their direct experience and detailed knowledge of the other conference teams sometimes over a significant number of years, do the best with the pre-season rankings, they really do little better than the two mathematical systems.

This illustrates that there is not just one way to do reasonable advance rankings of teams.  Each of the above three systems brings its own perspective and contributes something to the picture of how teams are likely to do.  And none is able give a highly accurate picture of where teams will end up.  At least when it comes to Division I women’s soccer, even very good predictive models leave a pretty good degree of uncertainty.  Given that the three models here -- each of which is quite sophisticated in its own way -- are quite close in their degree of accuracy, there is a good possibility that they are approaching the limits of how close predictive models can come for the rankings of teams within their conferences.

Friday, October 9, 2020

THE 2020-2021 SEASON: RPI RATING AND RANKING ISSUES

 This will be the first of three posts.  In this one, I will summarize publicly available information about the 2020-21 season structure.  I also will describe issues that will make it impossible to have reliable RPI ratings and ranks based on the planned Fall schedules. And, I will give some thoughts on how we might end up having at least somewhat usable ratings and ranks come the Spring.

In my second post, I will compare three sets of pre-season rankings for the conferences that are playing conference schedules in the Fall.  The rankings are those of the coaches of each conference, Chris Henderson’s, and mine.

In my third post, I will describe some new work I have done on the patterns the Womens Soccer Committee has followed over the last 13 years in its at large selections and seeding for the NCAA tournament.

Current Information About the 20-21 Season.  Of the 31 conferences, 4 are playing Fall conference schedules:  the ACC, Big 12, SEC, and Sun Belt.  Of the other conferences, none are playing in the Fall except that a few will have some non-conference games:  CUSA, Missouri Valley, Patriot, Southern, and Southland.  Sixty teams are playing at least 1 Fall game, with 281 not playing in the Fall.

Two conferences, the Southland and Southwestern have announced their conference formats for Spring soccer (and SWAC has announced the dates of all its conference games).  The Southland definitely is allowing non-conference games.  The Southwestern has not made a statement about non-conference games, but its teams will have some dates on which they will be able to play non-conference games if allowed.

There presently are 243 Fall games scheduled, not counting conference tournaments and also not counting 8 postponed games likely to be re-scheduled.  This compares to about 3,000 games during a normal season.

For the teams that are playing at least some games in the Fall, they have scheduled an average of 9.4 games each (including conference tournaments):  7.3 conference and 1.1 non-conference.  That comes out to 88.3% conference and 11.7% non-conference.  With one exception (Pittsburgh), the ACC, Big 12, and SEC are playing conference only, so the bulk of the non-conference games involve the Sun Belt or other conferences that are playing a few Fall non-conference games.

Over the last 7 years (since the last major conference realignment), teams played an average of 18.2 games per year.  Of these, 56% were conference games and 44% non-conference.

For the Spring, according to the schedule adopted by the Division I Board of Directors, the regular season competition season will be February 3 to April 17.  The Committee will make its NCAA Tournament at large selections and seeds on Sunday, April 18.  (Ending the season on Saturday with the Committee making its decisions on Sunday, which appears to be what has been decided, is a little different than normal, where the season ends on a Sunday and the NCAA announces the bracket on Monday.) The bracket will consist of 31 conference automatic qualifiers plus 17 at large selections for a total of 48 teams.  It seems reasonable to assume that if some conferences do not play in the Spring, then the tournament bracket still will be 48 teams and the number of at large teams will increase.  Games teams play in the Fall will count as part of their season schedules for purposes of NCAA tournament bracket formation.  The NCAA tournament will run from Saturday, April 24 to Sunday, May 16.  The exact overall scheduling and game locations for the tournament are yet to be announced, but the NCAA Board of Governors has directed that all tournament game sites will be pre-determined and the number of sites will be reduced, for health and safety and operational management purposes.

Ratings and Ranks Reliability.  So far as RPI ratings and ranks are concerned, they will not be meaningful during the Fall.

As a general rule, the more games teams play the more reliable are mathematical rating systems like the RPI.  For most rating systems to function well, teams must play between 25 and 30 games.  As the numbers of games decrease, the ratings’ reliability decreases.

The NCAA has not published information on the minimum number of games it believes teams need to play in order for the RPI to have an acceptable level of reliability.  Regarding football, however, it has said that "a [Football Championship Subdivision] RPI would be very difficult to use" because of the small number of games FCS teams play.  This is why the NCAA does not use the RPI in selecting at large teams for the FCS NCAA Tournament.  FCS teams play up to 12 pre-NCAA Tournament games per year.  Given this, we know the NCAA believes 12 games per year is not enough for the RPI to be reliable.

To determine how many games teams need to play for the RPI to be reliable, I used the 2019 season as a case study.  In 2019, teams played an average of 18.74 games.  I started with teams’ actual end of season RPI ranks.  I then deleted all games from the first weekend of the season, which reduced the number of games teams played to 17.04 games.  All of the deleted games were non-conference.  I then recalculated the RPI and determined teams’ revised ranks.  After that, I repeated the process for successive weeks, for each week determining the new number of games per team and teams’ revised RPI ranks.  At the bottom of this post is a table that shows the results of this project.  The table shows the Top 250 teams, in order, with their actual end-of-season RPI ranks and their ranks after each successive deletion of games.  It also shows the teams’ NCAA tournament seeds and which unseeded teams got at large selections.  Using this table, I looked to see the number of games at which the RPI no longer was able to produce rankings that were reasonably close to teams’ actual end-of-season ranks.  I highlighted in green team RPI ranks that seemed unreasonably poor compared to their actual final ranks and in orange team ranks that seemed unreasonably good.  My conclusion from the table is that even slightly over 15 games per team produces too many unreasonable rankings for the RPI to be usable.  (In considering the table, it is important to know that historically, teams ranked 58 and poorer have not gotten at large selections; and the poorest rank getting a #1 seed has been 8, #2 has been 14, #3 has been 23, and #4 has been 26.)

Thus, for Fall 2020’s games alone, the planned 9.4 games per team is not enough for the RPI to be usable.

In addition, as stated above, historically 54.0% of games have been in-conference and 46.0% non-conference.  As shown in detail at the RPI: Non-Conference RPI page of the RPI for Division I Women’s Soccer website, even with this percentage of non-conference games, the RPI has trouble ranking teams from different conferences properly in relation to each other: It tends to rate teams from stronger conferences too poorly and teams from weaker conferences too well. When teams significantly reduce their numbers of non-conference games, this problem gets worse.  You can see this in the table at the bottom of this post, where as I deleted successive numbers of non-conference games, the rankings of teams from the strongest conferences tend to get poorer and the rankings of teams from other conferences tend to get better.  You can see this problem from a different perspective in the following table based on the 2019 season:


In this table, the second and third columns from the left show the 2019 actual conference average ARPI ratings and the resulting conference ranks.  The next column to look at is the fifth from the left, which shows the conference average ARPI ratings derived only from the conference regular season competition.  As you can see, all conferences have an identical 0.5000 rating except for the ACC at 0.4999.  The only reason for the ACC difference is that Syracuse v Virginia got canceled so that all teams did not play the same number of games.  So long as all teams play the same number of games in conference regular season competition, conferences always will have 0.5000 ratings if the ratings are based on only their conference competition.  In other words, based only on conference regular season competion, the RPI cannot distinguish among the conferences.  If you look at the fourth column from the left, you will see that conference tournaments slightly disrupt this equality.  This, however, has nothing to do with how strong the conferences are, it only has to do with the structure of their conference tournaments.  Finally, if you look at the sixth and seventh columns from the left, you will see the RPI ratings and ranks of the conferences derived only from their non-conference games.  The eighth column shows the difference between these ranks and the actual RPI ranks.  In most cases, and particularly for the stronger conferences, these differences are small.  Thus in-conference games pull conference and conference team ratings to 0.5000, whereas non-conference games pull conference and conference team ratings to the level of their true strength among the entire body of Division I teams.  Because of this, any reduction in the proportion of non-conference games will result in teams from stronger conferences being underrated and teams from other conferences being overrated.  The table at the bottom of this post shows this effect.

In addition, historically 81.9% of games are played within geographic regional groups and 18.1% are non-regional.  As shown on the RPI: Regional Issues page of the RPI website, this is not a big enough proportion of non-regional games for the RPI to be able to properly rate and rank teams from one regional group in relation to teams from other regional groups.  Thus if teams move to more region-based non-conference schedules to reduce travel, it will cause the RPI to discriminate against some teams from some geographic regions -- most notably the West -- and in favor of others even more than it already does.

Usable Ratings and Ranks in the Spring.  As I look at the Spring season, I see some possibilities for scheduling that will make RPI ratings and ranks at least somewhat usable:

1.  The ACC, Big 12, SEC, and Sun Belt are playing their conference schedules in the Fall, so they can play non-conference schedules in the Spring.  This will be especially critical for the ACC, Big 12, and SEC to do since they are playing almost entirely conference-only Fall schedules.

2.  The Spring game playing season is 11 weeks.  This will give conferences and teams not playing in the Fall the opportunity to play close to full conference and non-conference schedules.

Thus it is possible, by the end of the Spring game playing season, that teams will have played close to their normal schedules.  If that happens, then the RPI will work more or less as it normally does.

I see two issues, however, that may cause problems.  One will be if teams significantly reduce the numbers of games they play, in particular reducing their proportions of non-conference games.  The second will be if teams shift to scheduling more regionally close non-conference opponents, significantly reducing their proportions of non-regional games.

NCAA Tournament Bracket Formation.  If either of these problems occurs, the Womens Soccer Committee will need to be aware of it and do the best it can, with the available data, to compensate for the RPI’s weaknesses.  If either occurs to the extreme, it will create a big problem for the Committee.  Since I believe there will be the opportunity for significant numbers of non-conference games, my biggest concern is about the possibility that we will see almost entirely region-based non-conference scheduling.  If that occurs, then the Committee will need to recognize that although the RPI may be good for rating and ranking teams within a geographic region in relation to each other, it will have little value when it comes to rating and ranking teams from different regions in relation to each other.  In particular, it will tend to underrate teams from the West.  I am not sure whether or how the Committee will be able to deal with this problem.



Wednesday, April 15, 2020

THE COMMITTEE IS CONSIDERING CHANGES TO THE RPI FORMULA

According to the minutes of its January 2020 meeting, the Women’s Soccer Committee is working with the NCAA staff on possible changes to the RPI formula.  Because of that, and notwithstanding the timing due to world events, I have sent the Committee a memo with some background information.  Here is a link to the memo: Background Information for Possible RPI Changes.

Update:  If you tried to access the memo and were not able, my apologies.  I now have updated the sharing settings for the memo and it should be fully accessible.