Wednesday, August 24, 2016

2016 Season Simulation

As an experiment, I've created a simulation of the entire 2016 season, including conference tournaments, as well as of the seeds and at large selections for the NCAA Tournament.  I've done it mostly for fun, but it's turned out to provide good information on some issues.  So, I'll spend time, early in the season, writing about it here.

As a resource for the simulation, I've used Chris Henderson's pre-season "in conference" ratings of teams.  They're published at HERO Sports.  Just click on the link, and you'll get to a list of articles that include Chris's conference-by-conference ratings of teams.  Chris uses a number of factors for evaluating teams, and the ratings he produces all are data driven.  The "in conference" ratings I've used don't exactly match what's published at HERO Sports, as he did those before having final word on whether some players will be red-shirting due to the U20 World Cup and on one or two pre-season season-ending injuries.  Rather, the "in conference" ratings I've used are adjusted ones he did to take those additional bits of information into account.

I then combined Chris's "in conference" ratings with the average NCAA Adjusted RPIs of teams within the conferences, to come up with a "projected" ARPI for each team.  So, suppose Chris has assigned ratings to 8 teams within a conference, with the teams thus ranked 1 through 8.  I then determine the average ARPIs of the teams ranked 1 through 8 in the conference over the 2014 and 2015 seasons.  And, I assign those ARPI ratings -- 1 through 8 -- to the teams as Chris has ranked them 1 through 8.  I do this for each conference, as well as for the few independent teams, and this gives me a "projected" ARPI rating for each team.  Sometimes, Chris's "in conference" ratings have two teams with the same rating.  For example, the #1 and 2 teams might have the same rating.  For these teams, I determine the average of the ratings of the conference teams ranked #1 and 2 during 2014 and 2015, and the two teams receive that ARPI rating -- they thus have the same "projected" rating.

With an ARPI rating for each team, and with the entire 2016 regular season schedule in my data base, I then produce game results for all games.  In determining game results, I consider game locations and I allow for ties.  In relation to game locations, I know from other work I've done that home field advantage is worth an 0.0120 ARPI advantage.  In relation to ties, I know that when two opponents' ratings (adjusted for home field advantage) are within 0.0134 of each other, the chance of the higher rated team winning is less than 50% -- it is more likely to tie or lose than it is to win.  So, for games within that rating difference, I treat the games as ties.  Then, with those considerations in mind, for each game I compute the rating difference (adjusted for game location) between the two teams and determine whether the result for a team is a win, a loss, or a tie.

Once I've produced all the regular season results, I turn to the conference tournaments.  Based on the "in conference" regular season simulated results, I assign teams to bracket positions in the tournaments.  Then, I go through the same process as for the regular season to get the conference tournament results.  For the conference tournaments, however, I have two new issues to take care of: (1) what bracket positions to assign to two teams that are tied in their conference regular season standings, and (2) who is the winner of tie games that go to shootouts.  One of the things I do, once I have all the game results, is determine teams' Adjusted RPIs for the simulated season, just as I would do for a real season.  These "simulated" ARPIs can be different than the "projected" ARPIs that I ordinarily use to determine game results.  For the two "tie" situations I have to deal with for conference tournaments, I use these "simulated" ARPIs as the tiebreaker.

Once I have entered the results for all of the conference tournaments, I calculate final "simulated" ARPIs, and other related numbers, for all the teams.  I then enter these data into my bracket formation system.  The bracket formation system tells me what decisions the Committee will make on seeds and at large selections, if its decisions are to be consistent with all of its decisions over the last 9 years.  When this is done, I have simulated ratings and rankings for the entire season plus simulated seeds, automatic qualifiers, and at large selections for the NCAA Tournament.

There are lots of limitations to what I'm doing.  (1) Chris Henderson's "in conference" ratings are intended as just that.  By blending them into what I'm doing, I'm going beyond what his ratings are for.  (2) The way I identify ties is going to result in my system treating games as ties that won't really end up as ties and treating games as wins/losses that will end up in ties.  (3)  My system for setting game results assumes that there will be no upsets.  But history says there are upsets (in relation to the ratings), in Division 1 women's soccer, in the range of about 15 to 25% of all games.  So, the simulation really is for fun and to get a very general idea of how the season might play out.

In my next post, I'll set out the simulated final ARPI ratings and rankings my system produced.

No comments:

Post a Comment