论文标题

Rudsi:基于图的单词感官诱导数据集

RuDSI: graph-based word sense induction dataset for Russian

论文作者

Aksenova, Anna, Gavrishina, Ekaterina, Rykov, Elisey, Kutuzov, Andrey

论文摘要

我们介绍了Rudsi,这是俄罗斯语言感官诱导(WSI)的新基准。该数据集是使用单词用法图(WUGS)的手动注释和半自动聚类创建的。与俄罗斯的先前WSI数据集不同,Rudsi完全由数据驱动(基于俄罗斯国家语料库的文本),而没有对注释者强加的外部单词感官。根据图群集的参数,可以从原始注释中产生不同的导数数据集。我们报告了几种基线WSI方法在Rudsi上获得的绩效,并讨论了改善这些分数的可能性。

We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). Unlike prior WSI datasets for Russian, RuDSI is completely data-driven (based on texts from Russian National Corpus), with no external word senses imposed on annotators. Depending on the parameters of graph clustering, different derivative datasets can be produced from raw annotation. We report the performance that several baseline WSI methods obtain on RuDSI and discuss possibilities for improving these scores.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源