跨域转移的大规模仇恨言论检测

论文标题

跨域转移的大规模仇恨言论检测

Large-Scale Hate Speech Detection with Cross-Domain Transfer

论文作者

Toraman, Cagri, Şahinuç, Furkan, Yilmaz, Eyup Halit

论文摘要

仇恨言语检测模型的性能取决于对模型的训练数据集。现有的数据集大部分是由有限数量的实例或定义仇恨主题的仇恨域准备的。这阻碍了大规模的分析和转移有关仇恨领域的学习。在这项研究中，我们构建了大规模的推文数据集，以用英语和低资源语言（土耳其语）进行仇恨言论检测，每个人都由每个标签的100K推文组成。我们的数据集旨在在五个域上具有相等数量的推文数量。统计测试支持的实验结果表明，基于变压器的语言模型的表现优于传统词袋和神经模型的英语至少5％，而土耳其语则优于大规模仇恨言语检测。该性能也可扩展到不同的训练规模，在使用20％的培训实例时，将回收98％的英语表现，而在土耳其语中的97％。我们进一步研究了仇恨领域之间跨域转移的概括能力。我们表明，对于其他英语的其他领域，平均有96％的目标域性能恢复，而土耳其语为92％。性别和宗教更成功地推广到其他领域，而体育运动最大。

The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5% in English and 10% in Turkish for large-scale hate speech detection. The performance is also scalable to different training sizes, such that 98% of performance in English, and 97% in Turkish, are recovered when 20% of training instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96% of the performance of a target domain in average is recovered by other domains for English, and 92% for Turkish. Gender and religion are more successful to generalize to other domains, while sports fail most.

下载PDF全文

下载文献需遵守相关版权规定

论文标题