论文标题
Rusemshift:俄罗斯历史词汇更改的数据集
RuSemShift: a dataset of historical lexical semantic change in Russian
论文作者
论文摘要
我们提出了Rusemshift,这是一个大规模的手动注释测试集,用于在两个长期时期对俄罗斯的语义变化建模任务:从苏维埃到苏维埃时代,从苏维埃到苏维埃。目标词由多个众源工人注释。注释过程是按照Durel框架组织的,是基于从俄罗斯国家语料库中提取的句子上下文。此外,我们报告了Rusemshift上几种分配方法的性能,从而取得了令人鼓舞的结果,同时,其他研究人员为其他研究人员提供了改进的空间。
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.