与Hatexplain和Bert一起探索仇恨言论检测

论文标题

与Hatexplain和Bert一起探索仇恨言论检测

Exploring Hate Speech Detection with HateXplain and BERT

论文作者

Subramaniam, Arvind, Mehra, Aryan, Kundu, Sayani

论文摘要

仇恨言论以贬义的评论采用多种形式来针对社区，并使人类退后一步。 Hatexplain是最近出版的第一个数据集，它以理由的形式使用带注释的跨度，以及语音分类类别和有针对性的社区，以使分类更具人性化，可以解释，准确和偏见。我们调整BERT以理由和阶级预测的形式执行此任务，并比较我们对跨精度，解释性和偏见的不同指标的性能。我们的新颖性是三倍。首先，我们尝试具有不同重要性值的合并理由类损失。其次，我们对理由的地面真相注意值进行了广泛的实验。随着保守和宽大的关注，我们比较了模型在Hatexplain上的表现并检验我们的假设。第三，为了改善模型中的意外偏见，我们使用目标社区单词的掩盖，并注意偏见和解释性指标的改善。总体而言，我们成功地实现了模型的解释性，偏差删除和对原始BERT实施的几个增量改进。

Hate Speech takes many forms to target communities with derogatory comments, and takes humanity a step back in societal progress. HateXplain is a recently published and first dataset to use annotated spans in the form of rationales, along with speech classification categories and targeted communities to make the classification more humanlike, explainable, accurate and less biased. We tune BERT to perform this task in the form of rationales and class prediction, and compare our performance on different metrics spanning across accuracy, explainability and bias. Our novelty is threefold. Firstly, we experiment with the amalgamated rationale class loss with different importance values. Secondly, we experiment extensively with the ground truth attention values for the rationales. With the introduction of conservative and lenient attentions, we compare performance of the model on HateXplain and test our hypothesis. Thirdly, in order to improve the unintended bias in our models, we use masking of the target community words and note the improvement in bias and explainability metrics. Overall, we are successful in achieving model explanability, bias removal and several incremental improvements on the original BERT implementation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题