论文标题
LowResourceeval-2019:低资源语言形态分析的共同任务
LowResourceEval-2019: a shared task on morphological analysis for low-resource languages
论文作者
论文摘要
本文描述了关于俄罗斯语言的形态分析的第一个共享任务的结果,即伊斯基,卡雷利安,塞尔库普和VEP。对于相关的语言,只有小型语料库可用。这些任务包括形态分析,单词形式产生和词素分割。四个团队参加了共同的任务。他们中的大多数使用机器学习方法,表现优于现有的基于规则的方法。本文描述了为共享任务准备的数据集,并包含对参与者解决方案的分析。具有不同格式的语言语料库被转换为conll-U格式。通用格式使数据集可与其他语言corpura相提并论,并在其他NLP任务中使用它们促进。
The paper describes the results of the first shared task on morphological analysis for the languages of Russia, namely, Evenki, Karelian, Selkup, and Veps. For the languages in question, only small-sized corpora are available. The tasks include morphological analysis, word form generation and morpheme segmentation. Four teams participated in the shared task. Most of them use machine-learning approaches, outperforming the existing rule-based ones. The article describes the datasets prepared for the shared tasks and contains analysis of the participants' solutions. Language corpora having different formats were transformed into CONLL-U format. The universal format makes the datasets comparable to other language corpura and facilitates using them in other NLP tasks.