Monday, December 26, 2022

ANOTHER WAY TO LOOK AT HOW RPI DEFECTS RELATE TO SCHEDULING

In my work on non-conference scheduling in relation to the RPI and the team profile factors important during the NCAA Tournament bracket formation process, I occasionally have made this observation:

From an RPI perspective, and also from the perspective of wanting to play opponents the Committee will consider strong, you should consider the following:

The top teams from strong conferences are good opponents, both from an RPI perspective and to impress the Committee;

Once you get beyond the top teams from strong conferences, the teams lower in those conferences’ standings can be poor teams to play from an RPI perspective and because the Committee will think those teams are weaker than they really are;

On the other hand, the top teams from second and third tier conferences can be good teams to play from an RPI perspective and the Committee will think those teams are better than they really are; and

If you have two teams with RPI ranks about the same, one from a strong conference and the other from a second or third tier conference, you virtually always are better off playing the team from a second or third tier conference. 

To test this observation, I did a study as follows:

For my data base, I used data from 2013 to the present (but excluding the 2020 Covid year).  I started at 2013 because that was the first year following completion of the last major realignment of conference teams.

For each conference, for each final standing position within the conference, I determined:

1.  Average RPI Rank for that position using the current NCAA formula; 

2.  Average rank for that position, under the current NCAA RPI formula, as a contributor to opponents’ strengths of schedule.

3.  Average Massey Rank for that position; and

4.  Average Improved RPI Rank for that position.

The Improved RPI is my revised version of the RPI.  It produces ratings and ranks that are superior to the RPI by all measures I can think of (but with a more complicated formula).  Massey likewise produces ratings superior to the RPI.  The Massey and Improved RPI Ranks for teams generally are quite similar, but not identical.  For Division I women’s soccer, the Massey and Improved RPI ranks are the best measures available of true team strength as demonstrated by team performance.

With those numbers, I then determined for each standing position in each conference:

5.  The difference between average RPI Rank and average rank as a strength of schedule contributor;

6.  The difference between average Massey Rank (actual strength) and RPI Rank (perceived strength per the NCAA rating system); and

7.  The difference between average Improved RPI Rank (actual strength) and RPI Rank (perceived strength per the NCAA rating system).

Finally, I determined for each standing position within each conference:

8.  The sum of 5 and 6; and

9.  The sum of 5 and 7.

Putting all of these numbers produces the following table for the ACC:


 Looking at the #1 standing position in the ACC, the Average ARPI Less SoS Contribution Ranks Difference column shows that you can expect its rank as a strength of schedule contributor to be 3.1 positions poorer than its RPI Rank says it should be.  Its Average Improved RPI less ARPI Ranks Difference column shows that you can expect its Improved RPI Rank (true strength) to be 0.3 positions better than the RPI says.  Its Average Massey less ARPI Ranks Difference column shows that you can expect its Massey Rank (true strength) to be 0.4 positions better than the RPI says.

If you put the strength of schedule contributor difference together with the Improved RPI difference, you get a total of -2.8.  Put together with the Massey difference you get -2.7.  These numbers are pretty small.  What this means is that on balance, if you play the ACC #1 team in any year, their RPI Rank will be pretty close to both their true strength and what they will do for your strength of schedule portion of the RPI formula.

On the other hand, take a look at the #9 ACC team.  For this team, the Average ARPI Less SoS Contribution Ranks Difference column shows that you can expect its rank as a strength of schedule contributor to be 45.9 positions poorer than its RPI Rank says it should be!  Its Average Improved RPI less ARPI Ranks Difference column shows that you can expect its Improved RPI Rank (true strength) to be 14.3 positions better than the RPI says!  Its Average Massey less ARPI Ranks Difference column shows that you can expect its Massey Rank (true strength) to be 16.3 positions better than the RPI says!

If you put the strength of schedule contributor difference together with the Improved RPI difference, you get a total of -60.2.  Put together with the Massey difference you get -62.2.  In other words, you can expect the #9 ACC team to be significantly stronger than the RPI says it is and you can expect its rank as a strength of schedule contributor within the RPI formula to be significantly poor than even its RPI says it should be.

In the table, I have used the color highlighting to show, for each conference, which standing positions within the conference are "desirable" opponents from the combined perspective of these numbers and which are "undesirable."  Green highlighting indicates desirable and orange undesirable.  I have designated a combined number of +10 or more as desirable and of -10 or less as undesirable.  For standing positions between these two benchmarks, there is no color coding which means that there is no significant advantage or disadvantage, from the perspective of these numbers, in playing those teams -- they are Neutral from this perspective.  In the Conference Teams by Rank in Conference column, the green and orange color coding appear only when both right-hand columns are that color.  If only one is that color, the rank position will have no color and thus be a neutral team in terms of opponent desirability.

[NOTE:  Although the table may show teams as desirable or undesirable, this is only from the perspective discussed in this article.  For other reasons, including from an NCAA Tournament perspective, it may be good to play a team identified here as undesirable -- such as to play a team the RPI ranks in the Top 50.]

Thus for the ACC, if you play the #1 or #2 team in the conference, the lack of highlighting means it will be pretty much be a neutral "What you see is what you get" in terms of their RPI Rank in relation to their true strength and their rank as a strength of schedule contributor.  For #3 or poorer in the conference, however, the orange highlighting means you can expect their RPI Rank and their rank as a strength of schedule contributor combined will understate their true strength.

With that explanation, the table below covers all of the conferences.  It shows, in stark clarity, how the RPI discriminates in relation to conferences and their teams.  You will have to scroll to the right to see the entire table.  Below this table will be another that applies to geographic regions.


The next table is for teams in the four geographic areas I have identified based on the states where schools are located.  The table shows averages by rank among the teams in the region.  Most stark in this table is the West region.  It has only a handful of rank positions that are neutral.  All other positions are undesirable as opponents under the considerations I am addressing here (although they nevertheless might be desirable opponents based on other NCAA Tournament-related considerations).  Again, you will have to scroll to the right to see the entire table.




Monday, December 19, 2022

REVIEW OF DECISIONS THE NCAA D1 WOMEN’S SOCCER COMMITTEE MADE FOR THE 2022 NCAA TOURNAMENT

Here is a review of some Committee #1 through #4 seeds and at large selections for this year’s NCAA Tournament.  I have not yet analyzed the new #5 through #8 seeds that the Committee did for the first time this year, so I am not yet able to discuss those decisions.

#1 Seeds

Based on the factor analysis I do, there was one question the Committee needed to answer: As between North Carolina and Alabama, who should get a #1 seed?

Based on my factor standards, North Carolina met 2 standards that said "Yes," a team that meets this standard always has gotten a #1 seed.  It met no standards that said "No," a team that meets this standard never has gotten a #1 seed.  Thus ordinarily it would have gotten a #1 seed.

But, Alabama met 4 Yes and 2 No standards.  What this means is Alabama had a profile the Committee has not seen before (with "before" meaning over the years going back to 2007).  Thus the circumstances forced the Committee to make a decision it has not had to make before.

The Committee gave Alabama the #1 seed.  This leads me to look at the No standards Alabama met -- which in the future no longer will be No standards due to the Alabama #1 seed decision.  The two No standards were for paired factors.  A paired factor involves two of the factors the Committee considers.  Each factor has a scoring system -- either the NCAA-specified system or, if there is no NCAA-specified system, a system I have created.  To determine a paired factor score, I apply a formula that combines the two individual factor scores so that each individual factor has a 50% weight.  The two No standard paired factors were Alabama’s (1)Non-Conference RPI and Poor Results Score and (2) NCRPI Rank and Poor Results Score.  Thus both involved poor results.  My poor results scoring system considers ties and losses against teams ranked #56 or poorer as poor results and assigns negative values for those results depending on whether the game was a tie or a loss and on the rank tier in which the opponent falls.  Alabama’s poor results, in my scoring system, were a loss at #75 Miami FL and a tie at #61 Utah.

Looking at the Yes standards North Carolina and Alabama met also may add some insight.  The two North Carolina Yes standards were for paired factors: (1) RPI and Conference Rank and (2) Conference Standing and Conference RPI.  Thus both standards involved conference strength.  The four Alabama Yes standards also were for paired factors.  Two of these involved conference strength.  Two of them, however, did not: (1) RPI Rank and Top 60 Head to Head Results Score and (2) RPI Rank and Top 60 Common Opponents Results Score.

Since the Committee gave the #1 seed to Alabama, this suggests to me that the Committee may not have given as heavy weight to the Alabama tie with Utah poor result as it might have in the past.  This would make some sense, as the NCAA rule change this year to no overtimes during the regular season means there will be more ties and, in fact, the number of ties this year was double the number in prior years.  This was exactly as expected based on the number of games historically decided by golden goals in overtime but now ending in ties.  With more tie games, one would expect teams to receive greater numbers of tie game poor results.  This, in turn, might cause the Committee to be more tolerant of tie game poor results in its decision-making process.

If this is what happened in relation to the Alabama poor results, then it simply was a matter of which team had better Yes characteristics and the Committee could have concluded Alabama had the better characteristics.  

#2 Seeds

Penn State likewise had a profile the Committee has not seen before.  It met 6 Yes and 25 No standards.  That looks like a lot of No standards, but 24 of them involve poor results: a loss to #58 Nebraska and ties with #169 Indiana and #119 Iowa.  The Committee gave Penn State a #2 seed.  Coupled with the Alabama decision, this further suggests the Committee being more tolerant of tie game poor results.

St. Louis had another profile the Committee has not seen before, meeting 3 Yes and 23 No standards.  All of the no standards involved its conference RPI or its conference rank.  For context, historically, no teams from a conference ranked #9 or poorer has gotten a #1 or #2 seed.  This year, the Atlantic 10 was ranked #10.  The Committee clearly decided conference strength did not carry enough weight to bar a #2 seed.

On the other side, Stanford met 6 Yes and 0 No standards.  Interestingly, 2 of the Yes standards were for paired factors that included Conference Rank.  Considered together with North Carolina’s 2 Yes #1 seed standards having involved conference strength, this may indicate that conference strength does not carry as much weight as my conference strength standards previously have suggested.  Stanford’s other Yes standards involved poor results, in this case it having had no poor results.

As for Penn State’s 6 Yes standards, they all involved its Non-Conference RPI, which was the best of all teams.  This apparently was enough to get it a #2 seed ahead of Stanford.

In terms of St. Louis’s 3 Yes standards, they all involved its having had no poor results, which matched Stanford’s 4 Yes standards involving no poor results.  This left Stanford with 2 additional standards involving conference strength.  Given that Stanford did not get a #2 seed, and coupled with St. Louis getting a #2 seed and North Carolina not getting a #1 seed, this means that I will have to change my standards related to conference strength so that conference strength is less helpful (or hurtful) than my standards previously indicated.  This does not necessarily mean a change in the Committee’s treatment of conference strength.  It may simply be a matter of the Committee having to make decisions on how much it values different factors when given team profiles it has not seen before.  Whatever the explanation, at least as to #1 and #2 seeds, it appears conference strength is not as important a consideration as my standards previously indicated.

#3 Seeds

It looks like the last #3 seed spot came to a choice between Arkansas, Pittsburgh, and Michigan State.  The Committee gave the spot to Arkansas, which met no Yes and no No standards.

Pittsburgh, on the other hand, met 6 Yes and no No standards, with only two of the Yes standards involving conference strength.  The 4 other Yes standards were for paired factors involving its RPI, its RPI Rank, its Non-Conference RPI, and its NCRPI Rank, each paired with its results against Top 50 opponents.  The Top 50 Results factor assigns teams scores for positive results (wins and ties) against Top 50 opponents on a sliding scale based on opponent rank, with the scoring heavily skewed towards positive results against highly ranked opponents.

As between Arkansas and Pittsburgh, Arkansas’s good Top 50 results were a win over Michigan State #15, a tie at BYU #20, a win over Auburn #50, a win at South Carolina #12, a win over Texas A&M #40, a win over Vanderbilt #49, a tie with LSU #33, and a neutral site tie with Vanderbilt #49.  Pittsburgh’s good Top 50 results were a win over Liberty #48, a win at Virginia Tech #46, a win at Notre Dame #4, a tie with Clemson #19, and a tie at Notre Dame #4.  And, the Committee gave Notre Dame a #1 seed.  In my scoring system, the Pittsburgh good results against #4 Notre Dame are very valuable and put it well ahead of Arkansas in Top 50 Results Score and Rank.  Nevertheless, the Committee gave the #3 seed to Arkansas, which may be a departure from its historic patterns.

As between Arkansas and Michigan State, Michigan State met 1 Yes and 0 No standards.  The Yes standard was for the paired factor of Top 50 Results Rank and Top 60 Head to Head Score.  Michigan State scored better than Arkansas on each factor in that factor pair.  In this case, however, they played each other at Arkansas, and Arkansas won, which could account for the Committee preferring it over Michigan State.

#4 Seeds

Pittsburgh and Michigan State were clear #4 seeds.  For the remaining slots, the candidates looked to me like Georgetown with 0 Yes and 0 No standards, Southern California with 1 Yes and 1 No standard (new profile for the Committee), and TCU with 0 Yes and 0 No standards.  The Committee gave one slot to Southern California, but gave the other slot to Northwestern with 0 Yes and 2 No standards.

Each Northwestern No standard involved poor results.  Its poor results were a tie at Oakland #106, a loss at Kansas #108, and a loss against Iowa #119.

The Southern California No standard likewise involved poor results.  Its poor results were ties at Nebraska #58, at Utah #61, and at Colorado #65 and a loss at Purdue #182.  Its Yes standard was for the paired factor of Conference Rank and Top 60 Head to Head Results Rank.  It is worth noting that Southern California also had home wins over #3 UCLA (a #1 seed), #11 Stanford (a #3 seed), and #21 TCU (one of the #4 seed candidates).  It appears that the Committee valued the Southern California good results and discounted its poor results.

Altogether, it appears the Committee again discounted poor tie results and, having given Southern California a #4 seed based on its good results, was left with a choice of Northwestern, Georgetown, or TCU for the last #4 seed slot, with all relatively equal once it had discounted the Northwestern poor results.  It gave Northwestern the #4 seed and gave Georgetown and TCU #5 seeds.

Summary Comments on Seeds

It is reasonably clear that for seeds the Committee discounted poor tie results from how they would have counted in the past.  This seems appropriate given the doubling of the number of ties due to elimination of overtimes during the regular season.

The Committee also had to address how much weight to assign to conference strength, making decisions suggesting that for seeds it does not assign as much weight to conference strength as might previously have appeared to be the case.

At Large

Almost all of the Committee at large selections were consistent with its historic patterns.  Only two selections merit discussion.  The teams that the Committee selected that need some discussion are NC State with 6 Yes and 3 No standards, Xavier with 15 Yes and 4 No standards, Utah Valley with 0 Yes and 1 No standard, and Virginia Tech with 8 Yes and 2 No standards.  The teams the Committee did not select that need some discussion are Arizona with 1 Yes and 0 No standards and Wisconsin with 0 Yes and 0 No standards.

NC State, with its 6 Yes and 3 No standards, had a profile new to the Committee and got an at large position.  Its standing in the ACC was #11.  All three of the No standards it met involved its standing within the conference.  Forced to make a decision on how much weight to assign its #11 conference position, the Committee assigned less weight than past precedent might have suggested and gave NC State the position.  In fact, apparently based on the strength of its Yes standards, the Committee gave it a #8 seed.

Xavier, with 15 Yes and 4 No standards, had an RPI rank of #25.  Historically, all teams with RPI ranks of #30 or better have gotten at large selections.  A way to look at this is: If the Committee were not to give an at large position to a team ranked #25, and instead were to give it to a team ranked in the 50s, it essentially would be a disavowal of the RPI as a useful tool.  Although in theory the Committee could make such a decision, it would be extremely difficult to do it.  So, for practical purposes Xavier was going to get an at large selection, whatever its negative characteristics.  In this particular case, Xavier had plenty of positive characteristics to outweight the negatives, but it is likely its RPI rank was determinative.

Virginia Tech, with its 8 Yes and 2 No standards, played a weak non-conference schedule but had some good conference results -- particulary a win over #5 North Carolina and a tie with #10 Virginia.  It finished at #8 in the ACC.  Both of its No standards involved its non-conference RPI and its standing in the ACC.  The Committee valued its positive characteristics enough to discount its negatives.

For Utah Valley, with 0 Yes and 1 No standard, the No standard was the paired factor Conference Rank (#12) and Poor Results.  Its poor results were tie at Boise State #155, tie at New Mexico State #67, loss at Seattle #107, and loss at New Mexico State #67.  Once again, it seems likely the Committee discounted ties in relation to poor results or discounted poor conference rank (as with St. Louis in relation to its #2 seed) or discounted both.  Plus, Utah Valley had a win over #20 BYU and a tie with #22 UCF.

For Arizona, on the other hand, with 1 Yes and 0 No standards, its Yes standard was the factor pair Conference Rank (#2) and Poor Results Rank.  Its only outstanding result was a win at #18 Southern California.  My best guess is that the Committee, in preferring Utah Valley over Arizona, did not assign a lot of weight to their conference strength difference, did not give much weight to poor results, and valued Utah Valley’s two good results higher than Arizona’s one good result.

As for Wisconsin, its only really good result was a win at#21 TCU.  As with Arizona, it appears the Committee valued Utah Valley’s two good results higher than that one good result.

Overall for at larges, the Committee decisions all appear reasonable.  They also appear consistent with what I saw with seeds in relation to the discounting of poor results as a factor and less weight being assigned to conference strength than one previously might have expected.