Reasonable Rankings for College Football
I've had enough of the subjective rankings in college football. So I'm proposing something better, using math (of course).
I don’t really watch American football anymore. Something about endorsing the injuries through my viewership really bugs me. But every year, I’m still fascinated by mathematics behind college football programs are (or could be) ranked. It’s an annual tradition to ask variations of the same old question: How do you compare a superior record (e.g., 11–1) against weaker competition with an inferior record (e.g., 9–3) against stronger competition?
Various methods have been proposed to answer this and related questions over the years, resulting in both numerical values and ordinal rankings. These gained notoriety with the Bowl Championship Series (BCS) rankings. From 1998 to 2003, teams were ranked by adding ordinal rankings (ugh) based on human expert polling, computer averages (with additional ordinal adjustments), and strength of schedule (which resembled college basketball’s ratings percentage index). Needless to say, this formula was a complicated mess that resulted in various controversies and angry fanbases. For example, in 2003, the top two teams by BCS were Oklahoma and LSU, neither of which was the #1 team in the country per the expert AP poll (USC).
In a sport where there are more than a hundred programs, each of which only plays a dozen games, it’s virtually impossible to have certainty over which two teams are the “best.” The expansion of the playoff from two teams to four, and now to twelve, won’t completely eliminate the debates over which teams are most deserving. But arguing over which team is 12th best certainly feels a lot less tense than arguing over which team is first or second.
These days, the rankings are entirely determined by a human panel, with no direct input from numerical ranking systems or strength of schedule formulas. While humans are more likely to produce rankings that other humans will be satisfied with, human bias remains a lingering issue. In any given season, big-name programs like Alabama and Clemson are likely to have inflated rankings compared to their body of work that year, given their decorated history.
Frustrated with the abandoned attempts at objectivity, I’m seeking to make sense of college football by proposing a simple, understandable ranking system that produces results that appear reasonable to humans.
This isn’t my first rodeo. Three years ago I proposed ranking teams by their probability of being the best team in the country rather than their expected ranking. Later that year, I found that about a dozen teams should be included in the playoff for this total probability to exceed 90 percent. Both of these analyses started with metrics developed by others: ESPN’s football power index and FiveThirtyEight’s Elo rankings.
This time around, I’m proposing my own rankings. I’m not aiming for originality here—most of the ideas presented here have been proposed before. But my first priority is simplicity, followed closely by attaining a plausible correlation with human expert rankings.
Methods
To align with human rankings, it’s worth considering what human rankings actually care about. In particular, the importance of home field advantage, wins, scoring margins (also known as point differentials)—things like that.
First, let’s look at home field advantage. In college football, it’s definitely a real thing. Among the 854 games not played at neutral sites this season, the point differential was approximately 8.9 points in favor of the home team. The average margin of victory for the home team was 21.3 points, while the average margin of victory for the visiting team was 13.5 points. The average margin at a neutral site was in the middle at 17.8 points.
But does any of this matter? I don’t think so. Case in point: Week 7 this season featured a showdown in which undefeated #2 Ohio State (5–0) visited undefeated #3 Oregon (5–0). Home team Oregon squeaked by, 32-31. On a neutral site, that was equivalent to an Ohio State victory by about a field goal. But when the rankings came out several days later, Oregon moved up to #2, while Ohio State was relegated to #4. There are countless other examples demonstrating that while home field advantage is real, human rankings tend to ignore it. Later on, I’ll show how home field advantage affects my own analysis, but at the end of the day I’m going to ignore it.
The importance of scoring margins is another seemingly eternal debate. Scoring margins do matter when experts say they do (for “style points”). But scoring margins don’t matter when the experts say they don’t (“wins are wins”). Early BCS computer ranking systems incorporated scoring margins, but later on these systems were forbidden from using them.
I sidestep this issue by seeking a simple, tunable metric for the outcome of a game. It should be somewhere between a pure scoring margin and a binary outcome (either 1 vs. 0 or 1 vs. -1). Previous models have employed piecewise functions that use scoring margin up to some saturation value, beyond which all margins are treated the same. But I think the function should be smooth, so that information isn’t completely discarded (meaning a 30-point margin always counts a little more than a 20-point margin).
Therefore, I use a logistic function:
A tie (which never happens in college football) yields an output of 0, whereas a lopsided win gives a value very close to 1. The value of 𝜆 determines the importance of scoring margin. When 𝜆 is large, the margin is more important; when 𝜆 is small, just getting the win is more important.
This straightforward methodology resolves another matter of frequent debate. Suppose A and B are believed to be stronger teams and C is believed to be a weaker team. Further suppose that A absolutely crushes C, but narrowly loses to B. Meanwhile, B loses to C in an unlikely upset. Given this information, which team should be ranked higher, A or B? Pundits would argue that B won the head-to-head, while other pundits would argue that A had a “good loss” to B, whereas B had a “bad loss” to C. While the exact scores and the value of 𝜆 would affect the details here, our system will generally rank team A ahead of B. Lose to a weak team at your own peril.
So here’s the plan: We’ll represent each game as an equation with a true “index” for each team—a number that is meant to indicate that team’s relative strength. For example, suppose Oregon and Ohio State have respective indices of iOregon and iOhioState. The scoring margin in their game was 1 in favor of Oregon, so we should have:
We can set up 872 such equations, one for each of the 872 games played so far this season. We can pick a value of 𝜆 and use linear regression (with pseudoinverse matrices) to compute the indices that best explain the results of the season. And boom—college football rankings!
Rankings
Without further ado, here are the top 50 teams according to my “Reasonable Rankings” when 𝜆 was set to 1:
There’s a lot to unpack here. But first, let’s talk a little more about 𝜆. As I said before, this could be anything from very small (“I only care about wins, baby!”) to very large (“Points matter!”). I tried out a few different values, expecting 3 and 7 to be solid candidates. I landed on 1 based on “feel,” which, yes, was very subjective. But it’s the only tunable parameter in this whole process. If you care more about points, you can run the exact same analysis with a different value of 𝜆.
Next up are the values that appear in the “index” column. I took the outputs from the regression and multiplied them by 100 for readability. I suppose you could compute an expected scoring margin by plugging these back into the logistic formula, but the result would actually be a combination of the scoring margin and the probability of winning, so I wouldn’t recommend it.
You’ll see decent agreement between my “Reasonable Rankings” and the Week 16 CFP rankings. There are some notable differences, which have various explanations. But the bottom line is this: Pundit says Team X is good because they beat the formiddable Team Y. But how do they know Y was good in the first place? There’s a circular dependency here that demands a regression (or a similar deeper analysis) to untangle it all.
Now, to some specific divergences between my rankings and the CFP rankings:
Ohio State: I have Ohio State as #2. They crushed some decent teams and only narrowly lost to Oregon (my #1) and Michigan (my #22).
Michigan: Okay, let’s talk about last year’s champion. They fell from grace and were only 7–5 this year, unranked by the CFP committee. That meant their defeat of Ohio State pulled Ohio State down in the rankings to a fair degree. But Michigan’s losses were to Oregon (my #1), Texas (my #4), Indiana (my #6), Illinois (my #18) and Washington (my #42). Meanwhile, they beat Minnesota (my #35), USC (my #40) and yes, Ohio State (my #2). When you untangle the matrix of wins and losses, both Ohio State and Michigan look better than the pundits think.
Georgia: They were once again the champion of the vaunted SEC, earning #2 in the CFP rankings. But they had losses to an overrated Alabama team (my #14) and Ole Miss (my #21). Texas technically had “better losses,” which were both to Georgia, one of which went into overtime. I understand the temptation to rank Georgia ahead of Texas due to two head-to-head wins. But unless you want to weight some games more than others—something I’m not doing due to complexity and further subjectivity (i.e., what should that weighting be?)—Texas should still be ranked ahead.
South Carolina, Alabama, and Ole Miss: All three teams went 9–3 in the SEC. The pundits over at the CFP committee have consistently had Alabama ahead of the other two. They provided reasons, but to me the undercurrent seemed to be a bias in favor of Alabama and its six national titles under Nick Saban. Instead, I found South Carolina (my #10) should have been ahead of both Alabama (my #14) and Ole Miss (my #21).
Tennessee: The CFP committee really liked two-loss Tennessee, ranking them #7, whereas I have them as #23. A two-loss SEC team is nothing to scoff at, but let’s take a closer look at Tennessee’s resume. They beat Alabama (my #14), Florida (my #36), and Oklahoma (my #47). Their losses came to Georgia (my #7) and Arkansas (my #69, yikes). Yes, Tennessee is a good team, but having just two losses seemed more a product of their schedule than their ability.
Memphis: The CFP rankings rewarded their impressive-sounding 10–2 record with #25 in their poll. But their best win came against 9–3 Tulane, who was similarly overrated. If you want a high ranking, it’s not enough to beat other “ranked” teams—you have to beat teams that have proven themselves in a self-consistent ranking scheme.
Louisville: The CFP rankings had zero four-loss teams in its top 25, which I find a little too cute. Louisville’s four losses, all by seven points or less, came to Notre Dame (my #3), SMU (my #8), Miami (my #17), and Stanford (my #100—okay, not great). But Louisville did defeat Clemson (my #15), Georgia Tech (my #39), Boston College (my #48), and Pittsburgh (my #49). That’s a stronger resume than that of two-loss Memphis.
Uncertainty
In my previous football-related writings at FiveThirtyEight, I proposed advancing teams to the playoff that had the highest probabilities of being the best in the country amid all the uncertainty in the rankings—a distinct notion from advancing teams that had the greatest expected strength.
This is perhaps better understood with a concrete example. Suppose team A has an index of 90 ± 10, while team B has an index of 80 ± 30. Because team A has the greater mean index, it’s probably the superior team of the two. Now, suppose you’re comparing them both to team C, which we know with absolute certainty has an index of 100. There’s a 16 percent chance that A’s true index exceeds 100. Meanwhile, there’s a 25 percent chance that B’s true index exceeds 100. So with the inclusion of C, B is actually more likely to be the best team than A is. In fact, for the numbers given, A has a 13 percent chance of being the best, B has a 24 percent chance of being the best, and C has a 63 percent chance of being the best. (Yes, it’s still more likely than not that A is better than B. While counterintuitive, this is not a contradiction.)
If we pull standard errors for the indices out of the regression, we can get a sense of their respective uncertainties. These standard errors, scaled up by a factor 100, ranged from 4 to 15. From there, I ran a million simulations, pulling each index from its corresponding distribution and recording how many times each college program came out ranked on top. That’s the final “p(best)” column in the table above—the probability that a program is the best in the country.
To no one’s surprise, Oregon came out on top according to this metric, with a 76.8 percent chance of actually having the highest index. To be clear, this is not the probability of Oregon beating some other team, but rather the probability that its index—that objective, mathematical measure of quality—is indeed the highest in the nation. Ohio State, Notre Dame, Texas, Penn State, Indiana, and Georgia all had probabilities of at least 0.4 percent.
But the big surprise came lower down in the rankings: Montana State, ranked #32 on my list. They went undefeated (13–0) in the FCS (a tier down from the FBS) Big Sky Conference. My matrix of game results only included FBS matches, and the reason Montana State appeared in my analysis at all was because they notched a single victory over an FBS program, having defeated New Mexico 35-31. New Mexico was a team that went 5–7 in the Mountain West conference.
Montana State’s victory wasn’t so impressive in itself, but the fact that Montana State played only one FBS opponent (rather than 11) greatly enhanced the uncertainty of their index, resulting in an inordinate 5.9 percent chance that they were the best team in the nation.
So if you were tasked with picking the 12 best teams for the playoff, you definitely wouldn’t include Montana State. But if you wanted to be absolutely sure you weren’t overlooking the best team—and a 12-team field certainly gave you the opportunity to be inclusive here—you should have given them a shot. Yes, I realize that allowing FCS teams would require yet another change in how the College Football Playoff works.
Home Field Advantage
As I already mentioned, I didn’t include home field advantage in my analysis because, at the end of the day, no one really seems to care about it. If a team wins by a single point at home, you wouldn’t discredit that win. You might say it’s not such a bad loss for the visiting team, but football is supposed to be a zero-sum game—you can’t have two teams simultaneously benefitting from a single match against each other.
Anyway, when I did include home field advantage (using historical values of +3 points for the visitor and -3 for for the home team, still with 𝜆 = 1), the rankings of course shifted a bit. FCS team Abilene Christian, who lost by 1 point to their lone FBS opponent in Texas Tech, looked great as a result. Among the top FBS teams, Boise State and Ole Miss moved up. Georgia, Alabama, and and Arizona State moved down. Texas A&M, Texas Tech, USC, and Marshall all moved into the top 25.
But again, when it comes to rankings, no one really seems to care about home field advantage. So that’s that.
Conclusion
The College Football Playoff currently allows for the five highest-ranked conference champions and the next seven highest-ranked teams. These rankings are subjective, and previous objective metrics were poorly calibrated with human sensibilities, leading to distrust of numerical ranking systems.
Here, I propose a ranking system that is reasonable (in my humble opinion) and that transforms scoring margins using a logistic function with a single parameter, 𝝀. I set this parameter to 1 and used a linear regression to determine each program’s “index.” Here’s who would have made the playoff this year under my proposed ranking scheme:
Oregon (#1, Big Ten champion)
Georgia (#7, SEC champion)
Arizona State (#9, Big 12 champion)
Boise State (#12, Mountain West champion)
Ohio State (#2)
Notre Dame (#3)
Texas (#4)
Penn State (#5)
Indiana (#6)
SMU (#8)
South Carolina (#10)
Clemson (#15, ACC champion)
This list almost perfectly matches the group of teams selected for this year’s inaugural 12-team playoff. The one exception was my inclusion South Carolina over Tennessee.
College football should pick a value of 𝝀 and switch over to an objective system like this one. Enough with the subjectivity. Let’s solve this problem, which is highly mathematical in nature, with some actual math. As designed, the end result wouldn’t be totally out of whack with the opinions of experts and pundits. But it’s a result that could be trusted more and argued less.
While I don't contest that home field advantage is real, it sounds like you're looking at results of all games assuming that won't have an inherit bias. This misses the fact that Alabama will never play a road game against a Western Kentucky - those will always be paid home games, and that game was won by 63 points by the home team. Margin of victory in conference games would likely be a much less biased measure, as those should alternate between home and road for any given matchup.
1. Could you incorporate both home advantage and a win being a win by just weighting away wins more somehow (but a win is always still positive for you - and a loss negative)? Perhaps just have 2 lambdas, one for home team wins, one for away team wins.
2. I’m curious what lambda minimises the error function of the linear regression? I wonder how the rankings would “feel” with this lambda.