论文标题
学习突尼斯情绪分析的单词表示
Learning Word Representations for Tunisian Sentiment Analysis
论文作者
论文摘要
社交媒体上的突尼斯人倾向于使用拉丁文(Tunizi)在本地方言中表达自己。这给探索和识别在线意见的过程带来了另一个挑战。迄今为止,由于缺乏培训自动化系统的资源,很少的工作已经解决了Tunizi情感分析。在本文中,我们专注于社交媒体上使用的突尼斯方言情感分析。以前的大多数工作都使用了机器学习技术与手工制作的功能相结合。最近,深层神经网络被广泛用于此任务,尤其是对于英语。在本文中,我们探讨了各种无监督的单词表示(Word2Vec,bert)的重要性,并研究了卷积神经网络和双向长期短期记忆的使用。在没有使用任何手工制作的功能的情况下,我们在两个公开可用数据集上的实验结果表现出与其他语言的可比性能。
Tunisians on social media tend to express themselves in their local dialect using Latin script (TUNIZI). This raises an additional challenge to the process of exploring and recognizing online opinions. To date, very little work has addressed TUNIZI sentiment analysis due to scarce resources for training an automated system. In this paper, we focus on the Tunisian dialect sentiment analysis used on social media. Most of the previous work used machine learning techniques combined with handcrafted features. More recently, Deep Neural Networks were widely used for this task, especially for the English language. In this paper, we explore the importance of various unsupervised word representations (word2vec, BERT) and we investigate the use of Convolutional Neural Networks and Bidirectional Long Short-Term Memory. Without using any kind of handcrafted features, our experimental results on two publicly available datasets showed comparable performances to other languages.