lemmed：短上下文窗口的快速有效的神经形态分析

论文标题

lemmed：短上下文窗口的快速有效的神经形态分析

LemMED: Fast and Effective Neural Morphological Analysis with Short Context Windows

论文作者

Makazhanov, Aibek, Goldwater, Sharon, Lopez, Adam

论文摘要

我们提出了Lemmed，这是一种用于上下文形态分析（联合诱饵和标记）的字符级编码器描述器。 Lemmed扩展并以另外两个基于注意力的模型命名，即情境障碍剂Lematus和形态（RE）拐点模型Med。我们的方法不需要训练单独的诱饵和标记模型，也不需要其他资源和工具，例如形态词典或传感器。此外，Lemmed仅依赖角色级表示和本地环境。尽管该模型原则上可以说明句子级别上的全局上下文，但我们的实验表明，在每个目标单词周围仅使用一个上下文的单词不仅在计算上是可行的，而且还能产生更好的结果。我们评估了Simgmorphon-2019共享任务的框架中的LEMMED，该任务在联合捕捉和标记方面进行了评估。就平均性能而言，LEMMED在13个系统中排名第五，仅由使用上下文化嵌入的提交而击败。

We present LemMED, a character-level encoder-decoder for contextual morphological analysis (combined lemmatization and tagging). LemMED extends and is named after two other attention-based models, namely Lematus, a contextual lemmatizer, and MED, a morphological (re)inflection model. Our approach does not require training separate lemmatization and tagging models, nor does it need additional resources and tools, such as morphological dictionaries or transducers. Moreover, LemMED relies solely on character-level representations and on local context. Although the model can, in principle, account for global context on sentence level, our experiments show that using just a single word of context around each target word is not only more computationally feasible, but yields better results as well. We evaluate LemMED in the framework of the SIMGMORPHON-2019 shared task on combined lemmatization and tagging. In terms of average performance LemMED ranks 5th among 13 systems and is bested only by the submissions that use contextualized embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题