论文标题

改善选区解析,并注意

Improving Constituency Parsing with Span Attention

论文作者

Tian, Yuanhe, Song, Yan, Xia, Fei, Zhang, Tong

论文摘要

选区解析是自然语言理解的一项基本和重要任务,在这种情况下,良好的上下文信息可以帮助这项任务。 N-grams是上下文信息的常规功能类型,已被证明在许多任务中很有用,因此,如果对选区进行适当的建模,也可能对选区解析有益。在本文中,我们提出跨越关注基于神经图的选区解析,以利用N-Gram信息。考虑到具有基于变压器的编码器的当前基于图表的解析器代表跨度在跨度边界处的隐藏状态的减法,这可能会导致信息丢失,尤其是对于长跨度,我们将n-grams根据对解析过程的贡献进行加权,将n-grams纳入跨度表示。此外,我们提出分类范围的关注,以通过在不同的长度类别中加权n-gram来进一步增强模型,从而使长期解析受益。三个广泛使用基准数据集的实验结果证明了我们方法在解析阿拉伯语,中文和英语中的有效性,在这些方法中,我们的方法对所有方法都获得了最先进的性能。

Constituency parsing is a fundamental and important task for natural language understanding, where a good representation of contextual information can help this task. N-grams, which is a conventional type of feature for contextual information, have been demonstrated to be useful in many tasks, and thus could also be beneficial for constituency parsing if they are appropriately modeled. In this paper, we propose span attention for neural chart-based constituency parsing to leverage n-gram information. Considering that current chart-based parsers with Transformer-based encoder represent spans by subtraction of the hidden states at the span boundaries, which may cause information loss especially for long spans, we incorporate n-grams into span representations by weighting them according to their contributions to the parsing process. Moreover, we propose categorical span attention to further enhance the model by weighting n-grams within different length categories, and thus benefit long-sentence parsing. Experimental results on three widely used benchmark datasets demonstrate the effectiveness of our approach in parsing Arabic, Chinese, and English, where state-of-the-art performance is obtained by our approach on all of them.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源