您的毒性是我的毒性吗？探索评估者身份对毒性注释的影响

论文标题

您的毒性是我的毒性吗？探索评估者身份对毒性注释的影响

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

论文作者

Goyal, Nitesh, Kivlichan, Ian, Rosen, Rachel, Vasserman, Lucy

论文摘要

机器学习模型通常用于检测在线对话中的毒性。这些模型是在人类评估者注释的数据集上培训的。我们探索评估者的自称身份如何影响他们在在线评论中注释毒性的方式。我们首先定义了专业评估池的概念：根据评估者的自我描述的身份而不是随机形成的评估池。我们为这项研究的研究池组成了三个这样的评估池 - 来自美国的评估者的评估者池，他们被认定为非裔美国人，LGBTQ，而那些都不认可的人。这些评估者中的每一个都注释了相同的评论，其中包含许多对这些身份组的参考。我们发现，评估者身份是评估者如何注释与身份相关注释的毒性的统计学意义因素。利用初步内容分析，我们检查了评论，并在评估者池之间最分歧，发现毒性注释的细微差异。接下来，我们在每个不同评估者池的注释上训练了模型，并在几个测试集的评论中比较了这些模型的分数。最后，我们讨论如何使用自我识别的评论者与评论主题可以创建更多包容性的机器学习模型，并提供比随机评估者更细微的评级。

Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We formed three such rater pools for this study--specialized rater pools of raters from the U.S. who identify as African American, LGBTQ, and those who identify as neither. Each of these rater pools annotated the same set of comments, which contains many references to these identity groups. We found that rater identity is a statistically significant factor in how raters will annotate toxicity for identity-related annotations. Using preliminary content analysis, we examined the comments with the most disagreement between rater pools and found nuanced differences in the toxicity annotations. Next, we trained models on the annotations from each of the different rater pools, and compared the scores of these models on comments from several test sets. Finally, we discuss how using raters that self-identify with the subjects of comments can create more inclusive machine learning models, and provide more nuanced ratings than those by random raters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题