14 Comments

While I don't contest that home field advantage is real, it sounds like you're looking at results of all games assuming that won't have an inherit bias. This misses the fact that Alabama will never play a road game against a Western Kentucky - those will always be paid home games, and that game was won by 63 points by the home team. Margin of victory in conference games would likely be a much less biased measure, as those should alternate between home and road for any given matchup.

Expand full comment

You're right, those paid home games run up the average. Per Neil Paine (https://substack.com/home/post/p-152800438) the average margin for both intraconference and power conference games was about 3.0.

Still, I think small margins tend to get ignored, and if lambda is small the margins don't matter as much anyway.

Expand full comment
3dEdited

1. Could you incorporate both home advantage and a win being a win by just weighting away wins more somehow (but a win is always still positive for you - and a loss negative)? Perhaps just have 2 lambdas, one for home team wins, one for away team wins.

2. I’m curious what lambda minimises the error function of the linear regression? I wonder how the rankings would “feel” with this lambda.

Expand full comment

Both great questions. I think addressing these might be worthy of a sequel article...

Expand full comment

Is your method equivalent to Bradley-Terry with the team winning by x assigned a fractional win y and a fractional loss 1-y (where y is given by the logistic function)?

https://en.wikipedia.org/wiki/Bradley–Terry_model

Expand full comment

Oops, not quite, y goes from -1 to 1, so it should be a win fraction of (1+y)/2, which goes from 0 to 1.

Expand full comment

As is so often the case, there is an xkcd about this: https://xkcd.com/904/

That comic is how I view all sports matches: an output to a weighted random number generator. Presumably each team has some probability distribution of quality of play (perhaps representing their ability to score and/or prevent the other team from scoring), and the team who rolls the better number that day wins. This is grossly over-simplistic, but all compound probabilities should be able to be rolled up into a single distribution.

The real challenge becomes, for each team, either estimating the expected value of that distribution, or somehow estimating the entire distribution. Unfortunately, apart from baseball, it is unlikely that any leagues play enough games to even get a rough estimate of that expected value.

I like your approach. It seems to be getting at what I describe, albeit without such an explicit appeal to probability and estimation theory.

One of my pet peeves in sports rankings is when the top two teams are undefeated and play each other, the loser is inevitably knocked down several positions. To me, this is nonsense. If they were the top two teams going into the game, then they should remain the top two teams coming out, regardless of the outcome. I wonder how your model would perform in that scenario. I guess if the match is close then the rankings would keep them in the top two spots, but if it's a blowout then maybe the losing team drops.

A final thought: regardless of what is true for the Moore Penrose pseudoinverse, pseudoinverses are not actually unique. Weighted least squares (https://en.wikipedia.org/wiki/Weighted_least_squares) produces a different result than an unweighted least squares. In estimation problems that I have seen, the weighting function is typically the inverse of the covariance matrix of the measurements, which is necessary to make the estimate into a maximum likelihood estimate. I am not sure how one would apply that type of analysis here.

Expand full comment

I don’t really agree with your point about undefeated teams. If team 2 loses to team 1, it should update your idea of team 2’s “index” to be lower than you thought it was before the game. If there were other teams close to but behind team 2, say other undefeated teams, it’s totally understandable that this loss could impact the relative “index” of team 2 and other teams that weren’t involved in the game.

Expand full comment

If the index is correct then the two to teams playing each other will tell us nothing about how those two teams rank relative to each other. But that is a big assumption, plus you are right that something could happen elsewhere in the league that would revise another team up into the top two, as opposed to a top two team' index decreasing.

It's possible that any ranking will have situations where this happens. That would mean that something like Arrow's theorem for voting applies to team rankings as well.

Expand full comment

How sensitive is this model to running up the score or garbage time comebacks? E.g. if you gave Boise State an extra 14 points in every game where Ashton Jeanty was removed at halftime, how much would their ranking change? In theory, this seems like something that human voters should handle better than computers (even with drive-by-drive data, it seems hard to separate these things out), although I’m not sure how true this is in practice.

Expand full comment

It's pretty sensitive to changes in lambda (when lambda is small).

But since I set lambda to 1, running up a score from 14 to 28 to 56 won't make much of a difference.

Expand full comment

Oh, interesting. Thank you. I know you mentioned it briefly, but what went wrong when you chose larger values of lambda? I would have thought that a low lambda might penalize teams for playing weaker opponents (it seems like Oregon is guaranteed to be penalized if they play any team outside the top 48 in the rankings, although most of their opponents are in the top 48 anyway).

Expand full comment

I was interested in how many teams we'd need to be 90% we had the best FBS team, similar to your linked work at FiveThirtyEight. So I divided all your probabilities by 0.936 (to account for removing Montana State and Idaho). It looks like you just need Oregon, Ohio State and Notre Dame. Have I done this correctly? If so, do you have any thoughts on what might cause the difference between the answer of 3 found here and 12 in the FiveThirtyEight work, beyond the key points that "this is a different season" and "this is a different method"? What it is about this season or this method that makes the result so different?

Expand full comment

It's a great question. I'll apply this regression to the 2021 season to see how it compares to Elo. Will circle back once I have that result!

I'm also making a lot of assumptions here about the uncertainty (like assuming normality) that aren't totally kosher. I'm sure that part of the analysis can be improved.

Expand full comment