进攻性语言检测：比较分析

论文标题

进攻性语言检测：比较分析

Offensive Language Detection: A Comparative Analysis

论文作者

T, Vyshnav M, S, Sachin Kumar, P, Soman K

论文摘要

在互联网社区中，进攻行为变得普遍。个人在网络世界中的匿名优势，沉迷于现实生活中可能不考虑的进攻性交流。政府，在线社区，公司等正在社交媒体中预防进攻行为内容。解决这个神秘问题的最有效解决方案之一是使用计算技术来识别进攻性内容并采取行动。当前的工作着重于在英语推文中检测进攻性语言。用于实验的数据集可从Semeval-2019 Task 6获得，以识别和分类社交媒体中的进攻性语言（Incresseval）。该数据集包含14,460个注释的英文推文。本文提供了比较分析和基于随机的厨房水槽（RKS）的进攻性语言检测方法。我们探讨了Google句子编码器，FastText，动态模式分解（DMD）功能的有效性和进攻语言检测的随机厨房水槽（RKS）方法。从实验和评估中，我们观察到具有FastEtxt的RK取得了竞争的结果。所使用的评估措施是准确性，精度，召回，F1得分。

Offensive behaviour has become pervasive in the Internet community. Individuals take the advantage of anonymity in the cyber world and indulge in offensive communications which they may not consider in the real life. Governments, online communities, companies etc are investing into prevention of offensive behaviour content in social media. One of the most effective solution for tacking this enigmatic problem is the use of computational techniques to identify offensive content and take action. The current work focuses on detecting offensive language in English tweets. The dataset used for the experiment is obtained from SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The dataset contains 14,460 annotated English tweets. The present paper provides a comparative analysis and Random kitchen sink (RKS) based approach for offensive language detection. We explore the effectiveness of Google sentence encoder, Fasttext, Dynamic mode decomposition (DMD) based features and Random kitchen sink (RKS) method for offensive language detection. From the experiments and evaluation we observed that RKS with fastetxt achieved competing results. The evaluation measures used are accuracy, precision, recall, f1-score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题