Wednesday, November 9, 2016

NCAA Tournament Bracket: Did an NCAA Data Error Affect the Committee's Decisions?

One of my big concerns, the last week of the season before the NCAA Tournament, is that the NCAA will end up with a data error that affects, in a significant way, the "Selection" reports the NCAA staff provides to the Women's Soccer Committee.  These reports are the primary resource the Committee has when it is making its seeding and at large selection decisions.  It's a concern to me because the NCAA does not publish these reports as soon as they have entered the final games data into their system.  Rather, they publish them a few days after the Committee has made its Tournament decisions.  Thus there's no way for anyone to identify any data errors and give the staff a chance to correct the reports before the Committee has made its decisions, or at least tip the Committee off as to what the error is and of what its effects are.

On reviewing the NCAA's just-released reports -- two days after the Committee made its decisions -- it turns out there was such an error.

The error itself may seem inconsequential, on first glance.  A SWAC semi-final tournament game, between Howard and Arkansas Pine Bluff, got reported into the NCAA's system as a 3-1 win by Arkansas Pine Bluff.  In fact, the game was a 0-0 tie, with Arkansas Pine Bluff advancing on Kicks From the Mark.  As a result of the error, the NCAA's data show Arkansas Pine Bluff with a record of 9-7-1 (0.5588), rather than the correct 8-7-2 (0.5294).  They show Howard with a record of 13-5-2 (0.7000), rather than the correct 13-4-3 (0.7250).  What this means, in RPI terms, is that Arkansas Pine Bluff's opponents got more strength of schedule credit than they should have; and Howard's opponents got less.  And this effect rippled through the two teams' opponents' opponents' ratings.

So, whom did these teams play, outside their SWAC games?

  • Arkansas Pine Bluff played Oral Roberts, Central Arkansas, UALR, and, Oklahoma State.
  • Howard played Cleveland State, Radford, Mt. St. Mary's , Longwood, George Mason, Princeton, Robert Morris, VMI, and Navy.
And what effect did the data error have on the ARPI ratings and rankings of key teams?  What follows shows, in the middle, an ARPI rank.  On the left is the team the NCAA, with its data error, gave that rank.  On the right is the team the correct data would have given that rank.

Notre Dame  8  Connecticut
Connecticut  9  Notre Dame

The Committee gave Notre Dame a #2 seed and Connecticut no seed.  Question:  With the correct data, would the Committee have done the seeding differently?  Specifically, would it have seeded Connecticut and/or given Notre Dame a lower seed.

Oklahoma  12  Virginia
Virginia  13  Oklahoma  12

The Committee gave Virginia a #3 seed and Oklahoma no seed.  The correct rankings would not have changed this.

Florida State  15  Duke
Duke  16  Auburn
Auburn  17  Florida State

The Committee gave Florida State and Duke #3 seeds and Auburn a #4 seed.  Question:  With the correct data, would the Committee have done the seeding differently?  Specifically, would it have seeded Auburn ahead of Florida State?

Marquette  24  Ohio State
Ohio State  25  Marquette

This ranking error is inconsequential.  Both were at large selections.

Wisconsin  35  Kent State
Kent State  36  Wisconsin

This ranking error is inconsequential.  Wisconsin got an at large selection and Kent State was an Automatic Qualifier.

Missouri  43  South Alabama
South Alabama  44  Missouri

This ranking error is inconsequential.  Missouri got an at large selection and South Alabama was an Automatic Qualifier.

Loyola Marymount  51  Ball State
Ball State  52  Loyola Marymount

This ranking error is inconsequential.  Neither team got an at large selection.

DePaul  53  UCF
Northeastern 54  DePaul
UCF  55 Oklahoma State
Oklahoma State  56  Northeastern

Northeastern was an Automatic Qualifier.  DePaul and UCF did not get at large selections.  Oklahoma got an at large selection.  In my opinion, it's very unlikely that UCF's moving up two positions would have made a difference.  Thus it's most likely that this ranking error was inconsequential.

Thus overall, the data error may have had an effect on seeds.  We'll never know.  It seems very unlikely it had an effect on at large selections.  Nevertheless, sooner or later, an error like this will have an effect.  The NCAA has had data errors in the past, at one of least might have been significant, affecting #1 seeds.  It would be wise for the NCAA to develop a system for vetting its final data.  Otherwise, sooner or later the NCAA is going to be in a very embarrassing position.

No comments:

Post a Comment