Originally posted by: b0mbrman
Forgive me if it seems obvious to everyone else, but what are your algorithms?
The algorithm is below. We want to know a team's rating, R. It is calculated by solving the following equation:
- R = A*W + (B-1)*L + sum[C(SD)*SD] + sum[D(R)*R]
where
- R is the rating for a team.
- A is a weighting factor for winning.
- W is the number of wins that the team had.
- B is a weighting factor for losing.
- L is the number of losses that the team had.
- C(SD) is a weighting factor for the score difference. I like to make this a function of the score difference. Use any formula you like for this function. This is where you can be quite creative. Or just use a constant. If score is meaningless to you, then C(SD) = 0.
- SD is the score difference in each game. If Ohio St. beat Iowa by 31 to 6, then for Ohio St. use a SD of 31-6 = 25, for Iowa use 6-31 = -25.
- D(R) is a weighting factor for the schedule difficulty. For simplicity, start with D(R) = 1.
- R is the rating for the teams you played.
The B-1 is a fudge factor to please Anandtech. They wanted me to weight losses more than I was. So I put in the -1 part. I get better predictions by just using B*L, but then it ranks a lot of 1 loss teams above 0 loss teams for example. People get pissed. I fudged for Anandtech and people were happier with the results. I guess all that matters really is that you have 0 or 1 loss in the season. Nothing else is important since we have a 1 game playoff for the championship.
There are multiple ways to solve that equation. The easiest way to describe is to assume a rating, R, of all teams. Then plug it into the equation for each team to find an updated rating, R. Repeat until converged and the rating values don't change anymore.
To get the weighting factors - A, B, C, and D - I do a little optimization. In fact, I do two completely different optimizations.
1) Win Rating. I vary the weighting factors until the results in the poll have the largest percent of teams ranked ahead of teams that they beat in the last game. Since there are many combinations of the weighting factors that give the same result, I choose one specific combination. That specific combination has the smallest error in the score of the games from last week that still maintains my main goal of having the largest percent of teams ranked ahead of teams they beat.
2) Score Rating. I just toss out the whole "winners must be ahead of teams they beat" idea. And I choose the combinations of A, B, C, and D that simply gives scores that most closely match last week's scores. This later rating is easier in that generally there is just one combination that is optimal. In the case of a tie, I go with the option that also has the highest winning percentage.
So the tie breaker for #1 is the main goal of #2. The tie breaker for #2 is the main goal for #1.