论文标题
区分类似的北欧语言
Discriminating Between Similar Nordic Languages
论文作者
论文摘要
自动语言标识是一个具有挑战性的问题。区分密切相关的语言特别困难。本文提出了一种用于北欧语言自动语言识别的机器学习方法,该方法通常会因现有最新工具而遭受错误的分类。具体而言,我们将专注于六种北欧语言之间的歧视:丹麦语,瑞典语,挪威语(Nynorsk),挪威语(Bokmål),法罗斯(Faroese)和冰岛语。
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.