禁止或不禁止：可靠的仇恨言论检测的贝叶斯注意网络

论文标题

禁止或不禁止：可靠的仇恨言论检测的贝叶斯注意网络

To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

论文作者

Miok, Kristian, Skrlj, Blaz, Zaharie, Daniela, Robnik-Sikonja, Marko

论文摘要

仇恨言论是管理用户生成内容的重要问题。要删除进攻性内容或禁止行为不当的用户，内容主持人需要可靠的仇恨言语探测器。最近，基于变压器体系结构（例如（多语言）BERT模型）的深度神经网络在许多自然语言分类任务（包括仇恨言论检测）中实现了卓越的性能。到目前为止，这些方法还无法根据可靠性量化其产出。我们提出了一种在变压器模型的注意层中使用蒙特卡洛辍学的贝叶斯方法，以提供良好的可靠性估计值。我们评估和可视化拟议方法的仇恨言语检测问题的结果。此外，我们测试情感维度是否可以增强仇恨言语分类中BERT模型提取的信息。我们的实验表明，Monte Carlo辍学为变压器网络的可靠性估计提供了可行的机制。在BERT模型中使用，它是最先进的分类性能，并可以检测到较少的信任预测。同样，可以观察到，使用Sentic Computing方法提取的情感维度可以为解释仇恨言论所涉及的情绪的解释提供见解。我们的方法不仅改善了最先进的多语言BERT模型的分类性能，而且计算出的可靠性得分在检查案例和重新注册活动的检查中还大大减少了工作量。提供的可视化有助于理解边界结果。

Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test if affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it ofers state-of-the-art classification performance and can detect less trusted predictions. Also, it was observed that affective dimensions extracted using sentic computing methods can provide insights toward interpretation of emotions involved in hate speech. Our approach not only improves the classification performance of the state-of-the-art multilingual BERT model but the computed reliability scores also significantly reduce the workload in an inspection of ofending cases and reannotation campaigns. The provided visualization helps to understand the borderline outcomes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题