As those who follow this blog know, this year I did a pre-season simulation of the regular season's game results and what teams' Adjusted RPI ratings would be following completion of the regular season -- including the conference tournaments. As we progressed through the season, I replaced simulated results with actual results and updated the final simulated ARPIs accordingly. And, once we had gotten through enough of the season that teams' actual ARPIs were relatively realistic, I changed the simulation from using the pre-season simulated ratings to determine the outcomes of future games and instead used the then-current actual ARPI ratings.
A question I had, when I set up the 2016 simulation, was what to use as teams' pre-season simulated ratings. Since Chris Henderson does pre-season rankings of the teams in each conference, I decided to use his rankings within a conference, take the average ratings of the teams so ranked in that conference over the last two years, and assign those ratings to the teams so-ranked by Chris. Chris didn't design his system to be used that way, but I thought it would be interesting to try it, so I did.
Now that the season is over, I've done a study to determine what really is the best way to come up with pre-season simulated ratings. The balance of this post will be a detailed description of the results of the study.
Basic Method
I set the 2016 actual end-of-regular-season ARPI ratings as the measuring stick for seeing the accuracy of a simulated set of ratings. For each team, I compared its actual 2016 ARPI rating to its simulated rating and determined the difference between the two of them. To make the work easier, and for reliability reasons, I considered only teams that have had women's soccer teams throughout the 2007 through 2016 seasons, in other words over the last 10 years, a total of 312 teams. Once I had the actual versus simulation differences for all 312 teams, I then determined the average and the median of those differences.
I determined the average and the median for each of a number of simulation methods. I then compared the methods' averages and medians to see what simulation method seems the best: the lower the average difference from teams' actual 2016 ratings the better, and the lower the median difference the better.
Results: Treating All 312 Teams as One Group
I started out with the relatively simple approach of treating all 312 teams as one group. This produced the following results:
Method Using the Henderson "In Conference" Rankings (described above):
Average Difference from Actual: .0513
Median Difference from Actual: .0445
Method Using Teams' End-of-Prior Season (2015) ARPIs:
Average Difference from Actual: .0391
Median Difference from Actual: .0342
(Using teams' end-of-prior season ratings is what a typical elo-type system would do.)
Comparing these two, using teams' end-of-prior season actual ARPIs is a better way of doing the simulation. The average difference from teams' actual 2016 ARPI ratings is smaller and the median difference is smaller.
I also tested two other methods. One was to use the average of teams' ARPIs over a number of prior seasons -- the Average ARPI method. The other was to chart each team's ARPIs from year to year over the last 9 years (2007 through 2015), have my computer generate a straight line trend line for the ARPIs, and also have my computer generate the trend line formula. Using the trend line formula, I then computed what the team's ARPI would be in 2016 if the trend continued -- the Trended ARPI method. For each of these methods, I looked at the last two years' ARPIs -- in other words, the average of the most recent two years' ARPIs, and the trend line for the most recent two years' ARPIs; the same for the most recent three years; the same for the most recent four years; ... all the way to the same for the last nine years (the entire period for which I have data). This produced the following results:
Method Using Teams' Average ARPIs:
2-Year, Average Difference from Actual: .
0368
2-Year, Median Difference from Actual: .
0317
3-Year, Average Difference from Actual: .
0370
3-Year, Median Difference from Actual: .0329
4-Year, Average Difference from Actual: .
0367
4-Year, Median Difference from Actual: .0342
5-Year, Average Difference from Actual: .
0368
5-Year, Median Difference from Actual: .0336
6-Year, Average Difference from Actual: .
0365
6-Year, Median Difference from Actual: .
0307
7-Year, Average Difference from Actual: .
0366
7-Year, Median Difference from Actual: .
0317
8-Year, Average Difference from Actual: .
0367
8-Year, Median Difference from Actual: .
0315
9-Year, Average Difference from Actual: .
0373
9-Year, Median Difference from Actual: .
0314
In this table, the highlighting means that I consider the differences between the numbers relatively small. Thus all of the average differences are pretty similar; and the median differences for the 2, 6, 7, 8, and 9 year medians are pretty similar. Of the double-highlighted methods, the 6-year average ARPI method looks the best.
These results surprised me, in their showing little difference in relation to how many years I used for averaging. I thought 2-years would be too few and that the higher number of years might include years too remote in time for them to be relevant today. I was wrong.
Method Using Teams' Trended ARPIs:
2-Year, Average Difference from Actual: .0652
2-Year, Median Difference from Actual: .0529
3-Year, Average Difference from Actual:
.0529
3-Year, Median Difference from Actual: .0462
4-Year, Average Difference from Actual:
.0490
4-Year, Median Difference from Actual: .0435
5-Year, Average Difference from Actual: .0461
5-Year, Median Difference from Actual: .0416
6-Year, Average Difference from Actual: .0444
6-Year, Median Difference from Actual: .0409
7-Year, Average Difference from Actual: .0428
7-Year, Median Difference from Actual: .0393
8-Year, Average Difference from Actual: .0415
8-Year, Median Difference from Actual: .0385
9-Year, Average Difference from Actual: .0406
9-Year, Median Difference from Actual: .0366
Here for the Trended ARPI method, there is a distinct pattern, unlike for the Average ARPI method. The more years in the trend line, the closer the trended ratings come to teams' actual ratings. It is quite clear that for the Trended method, using the 9-year period is the best. Indeed, if I had more years in my data base, adding more years to the trend line might make the method even better, although there appears to be a diminishing improvement as I add years and it may turn out that the 9-year trended ARPI method produces the best of the Trended ARPI results.
And, as among all four of these methods, the Average ARPI method, using a 6-year period, appears to be the best. This is another result that surprised me, as I expected the Trended ARPI method would be the best.
Results: Taking Coaching Changes Into Account
It occurred to me that coaching changes might affect teams' ARPI trajectories. If so, then I felt it might be possible to use the dates of coaching changes to refine either the Average or the Trended ARPI method and get better results.
To test the hypothesis that refinements based on the dates of coaching changes would get better results, I assembled a list of the current coaches and the seasons they were hired -- except that for teams with the same coach since 2007, I listed them simply as coaches hired in 2007 or earlier. I then separated the teams into groups: teams whose current coach was hired in 2007 or earlier, teams whose current coach was hired in 2008, and so on for all the other years of my 2007-2016 data set. Then, within each group, I determined the average and trended ARPI differences from actual, just as I did for the entire group of 312 teams as described above.
One of the things I found immediately was that except for the "2007 or earlier" hire group, the single year groups were too small as data sets to produce reliable results. So, after looking at results for 9 separate groups, one group for each year, I then looked at results for 3 groups: 2007 and earlier hires; 2008 through 2012 hires; and 2013 through 2015 hires. This gave me 3 groups that are relatively equal in size and the following "best" results:
Method Using Teams' Average ARPIs
2007 or earlier hire group:
8-Year, Average Difference from Actual: .
0363
8-Year, Median Difference from Actual: .
0288
2008 to 2012 hire group:
6-Year, Average Difference from Actual: .
0363
6-Year, Median Difference from Actual: .
0305
2013 to 2015 hire group:
2-Year, Average Difference from Actual: .
0357
2-Year, Median Difference from Actual: .
0310
Method Using Teams' Trended ARPIs
2007 or earlier hire group:
9-Year, Average Difference from Actual: .
0380
9-Year, Median Difference from Actual: .
0341
2008 to 2012 hire group:
9-Year, Average Difference from Actual: .
0422
9-Year, Median Difference from Actual: .
0401
2013 to 2015 hire group:
9-Year, Average Difference from Actual: .
0425
9-Year, Median Difference from Actual: .
0465
Here too, the Average ARPI method produces better results than the Trended ARPI method. Thus using teams' average ARPIs over time is a better method for simulating teams' ARPIs for next year. And, when using teams' average ARPIs over time, it is better to break teams down into groups based on coach hiring dates:
- Teams with current coaches hired 9 or more years ago -- use average ARPI over the last 8 years
- Teams with current coaches hired 4 to 8 years ago -- use average ARPI over the last 6 years
- Teams with current coaches hired 1 to 3 years ago -- use average ARPI over the last 2 years
Since these Average ARPI ratings come closest to what the actual ARPI ratings for the next year (year 10) will be, these are the simulated ratings I'll use when I do my 2017 pre-season simulation next August.
Coaching Changes and ARPI Trend Changes
A side benefit of my study was the information it provided on the relationship between coaching changes and teams' ARPI trends. The study results suggest two things:
- It takes a long time for a team's ARPI to become free of the effect of past seasons. If you look above, under the Treating All 312 Teams as One Group caption, at the Method Using Teams' Trended ARPIs table, you can see this: A trend line based on the last 9 years' ARPIs is better at predicting next year's ARPI than are any of the trend lines using shorter time spans. And, this also is true when breaking teams into groups by coach hire date. Thus next year's team's fortunes are affected by the team's history over at least the last 9 years. This is an important factor to keep in mind in evaluating a coach who has taken over a program. It will not be his or her program completely until at least 11 years out from his or her hire; until that period has expired predecessor coaches' results will have a lingering effect for good or for ill.
- The longer a coach has been in place, the more reliable the 9-year trend line is at predicting how the team will do next year. If you look under the Taking Coaching Changes Into Consideration caption, at the Method Using Teams' Trended ARPIs table, you can see this: The trend lines for the coaches with the most longevity are the best at predicting teams' ARPIs next year; and the lines for the coaches with the least longevity are the poorest. Nevertheless, as the prior bullet point says, regardless of coach hire date, the team's history well back into the past will affect the team's results well into the future.