What purpose do speaker points serve? To some judges, they're a reflection of a debater's presentation and speaking skills. To others, they endorse a debater's worthiness for elimination rounds. A win (or loss) 30 can mean anything from an all-time great performance to a reward for bringing a judge their favorite snack. While all ways to evaluate debates are inevitably subjective, the nature and significance of speaker points should put them under scrutiny.
Online debate makes speaker points more important than ever. Tournaments are facing their highest attendance in history while compressing schedules to account for differing time zones and buffer time for technical delays. Many tournaments which have reliably broken all debaters with down-2 records in years past can no longer make the same guarantee. Debaters must consistently attain high speaker points to ensure they advance to elimination rounds.
Despite their significance, one would be hard pressed to find anyone who thinks speaker points provide more than the roughest approximation of a debaters' talent. Different judges assign speaker points according to different scales. A 29.2 can be unattainable in front of some judges and average for some others. Different regions have experienced different norms and amounts of speaker point "inflation." Judges experienced in traditional debate often approach points differently from those who regularly judge on the circuit. Additionally, judges who specialize in different events like policy and parliamentary debate have expressed difficulty aligning their speaker point scales with Lincoln-Douglas norms. While rubrics produced by the community are useful, adherence to them is rare even when provided and remains reliant on unreliable evaluations of a debater's performance.
Even points assigned by the same judge are an unreliable metric. Perceptions of a debater's performance can be affected by a judge's mood, how they performed relative to their opponent, and many other factors entirely out of a debater's control. The decimal precision of speaker points can give the illusion of accuracy. Yet in my experience judging, I remain unconvinced that I, or most other judges, could articulate or reliably distinguish a performance worthy of a 28.4 and one deserving a 28.6. When the difference between who breaks and who doesn't can come down to a tenth of a speaker point, the gut-check decision of what points to give matters more than we acknowledge.
The high-precision, low-accuracy nature of speaker points creates what is perhaps the most concerning impact of speaker points: implicit and explicit biases have an outsized impact on success. Incredibly robust analysis has repeatedly found statistically significant gaps in speaker-point assignment between male and female identifying debaters1. Ishan Bhatt recently published an analysis specific to Lincoln-Douglas debate which finds (with statistical significance) that female-identifying competitors receive lower speaker points on average. While I was unable to find statistical analysis regarding other axes of bias, debaters report consistent anecdotal accounts of discrimination and similar results would be unsurprising. The community should seriously consider the implications of using a method of evaluation we know puts marginalized groups at a disadvantage with significantly significant effects. While anecdotal, it is noticeable how many tournaments recently have had non-males disproportionately make up the pool of 4-2 debaters who do not advance. It is not uncommon to see top speaker awards almost exclusively go towards male identifying debaters. Breaking at tournaments provides prestige, more experience, and an opportunity to bid. When speaker points are disparately impacting the competitive outcomes of disadvantaged groups, the community should seriously consider alternatives to speaker points.
The work of finding an alternative is certainly difficult. In an ideal world, I think tournaments should do their best to ensure all debaters with a certain record or better can break. It can be crushing to battle your way to a winning record only to find out you will not break. Many alternative "tiebreak" metrics like Z-score and opponent wins pose similar issues to those I've highlighted in this post (I plan to explore those further in future posts). However, there is one idea that deserves consideration as a baseline: random seeding. If a tournament can't break all 4-2s, we should consider if using a low-accuracy, potentially biased metric is better than giving all debaters an equal shot.
1 While this analysis is unfortunately limited by existing research to male/female binaries, this post does not intend to endorse the gender binary or erase the impact of these barriers on gender-minorities.