Expanded Methodology and Accuracy of DebateDrills's LD Rankings

Inko Bovenzi | Dec 25, 2021
5 min read

Note—a lot of the information here is similar to the PF rankings. If you already read that article, the differences between LD and PF are marked with italics. The accuracy section has LD data.

Methodology

DebateDrills’s LD rankings rank teams based on an elo system. In elo systems, after every game/round occurs, the winning teams takes a certain number of points from the losing team based on the arithmetic difference of their scores, such that beating a strong team yields more points and beating a weaker team fewer points. The number of points is determined by the following formula:

Formula to determine S (the number of points)

where S represents the shift in Elo, K is a semi-arbitrary factor (more on this later), mv is the margin of victory, and wp is the probability that the team that won the round would have won based on their elo. The variable wp is calculated as follows:

Formula to find wp (the probability the winning team wins based on their elo)

where ed represents the elo difference between the two teams. For example, if a team with an elo of 1900 debates a team with an elo of 1500, the probability per this system that the higher ranked team wins is 10/11, or 91%. If two teams have the same elo, then the fraction wp becomes simply .5, which makes sense. Let’s take a look at how K and mv are calculated.

The default value for K in our rankings is 80. This means that, ignoring mv, the most points you can gain from one round is 80 (by defeating an infinitely stronger team). We chose to cap the number of points you can lose in one round because a) people always have rounds off and b) judging can be very inconsistent. However, we weight tournaments by their bid level. In order to do this, we weight octos bids fully, but reduce the influence of each following tournament by a factor of the square root of 2 (roughly 1.4). That means that quarters bids have K=56.5, and so on. This is to avoid idiosyncratic results from small finals bids influencing the rankings too much. This is a methodology change, as the rankings used to run with K=25. We found that this higher value of K made the rankings more dynamic and reduced the bias towards frequent competition. This methodology is different from PF because small LD tournaments tend to have higher quality judging comparatively than small PF ones. The higher K value is because LD judging on the whole tends to be better.

Now let’s talk about mv. In a prelim round, mv is always 1, because you can’t win a prelim by more than one ballot. In an outround, however, we use the formula

Formula for mv (he margin of victory)

where bw is the number of ballots won and bl the number of ballots lost. For example, winning an elim on a 3-2 decision yields mv=1.67. This formula is designed to weight elim wins based on the agreement of the judges as a 2-1 decision is less decisive than a 3-0. Finally, the winning team of all outrounds receive a bonus of e elo points, where e=b/2, where b is the number of bids available at the tournament.

It is worth noting that this last piece of methodology is the only place where the rankings reward frequency of competition, and it is nowhere near as important as the normal elo shifts, because they are far larger. In general, elo systems favor consistency over frequency of competition.

Accuracy

To assess the accuracy of our rankings, we checked their Brier Scores and Brier Skill Scores for the last three tournaments, namely Ridge, Strake, and Blake. Brier Scores are a widely-accepted statistical tool to assess the accuracy of probabilistic forecasts, like ours. The formula for the Brier Score is as follows:

Formula for the Brier Score

where f is the probability given by our elo function and o is the actual outcome. For example, if our model says that team 1 has an 60% chance of winning a round and they go on to win, we add 0.42=0.16 to the Brier Score sum. The lower the Brier Score, the more accurate the model. A model that assumes every round is a coin toss would achieve a Brier Score of .25 on average. Here are the Brier Scores for these tournaments:

Table of Brier Scores for the last 3 LD tournaments

To assess the quality of the Brier Scores, we can convert them to Brier Skill Scores using the following formula:

Formula to convert Brier Scores to Brier Skill Scores

We used the value .25 because that is the Brier Score we’d expect a random forecast to achieve. Unlike with Brier Scores, higher Brier Skill Scores are better. Here are the scores:

Table of Brier Skill Scores for the last 3 LD tournaments

Since we based our elo forecast on FiveThirtyEight’s, we can compare our Brier Skill Scores with theirs. In general, the rankings are roughly in the middle of the pack when compared to FiveThirtyEight’s sports forecasts, which is exactly where we’d expect to be, considering we based our model off these forecasts. Their politics forecasts do perform better than ours, which makes sense because the vast majority of elections are not close and thus easy to forecast (ex. the presidential race in Alabama).

Especially considering that the season is still young and our rankings do not have a lot of data on most teams, this is a very good result. From our data, we see that in general the rankings perform quite well, but particularly well in elimination rounds. This makes sense because these rounds typically see more experienced teams that our rankings have more data points on as well as experienced panels of judges.

The Opinions Expressed In This Blog Post Are Solely Those of the Author And Not Necessarily Those Of DebateDrills

Related Articles