多个结构先验指导自我关注网络以了解语言理解

论文标题

多个结构先验指导自我关注网络以了解语言理解

Multiple Structural Priors Guided Self Attention Network for Language Understanding

论文作者

Qi, Le, Zhang, Yu, Yin, Qingyu, Liu, Ting

论文摘要

在最近的NLP研究中，自我注意力网络（SAN）已被广泛使用。与CNN或RNN不同，标准SAN通常是独立于位置的，因此无法捕获单词序列之间的结构先验。现有研究通常对SANS采用一种单一掩码策略来纳入结构先验，同时未能建模文本的更丰富的结构信息。在本文中，我们旨在将多种类型的结构先验引入SAN模型中，并提出多个结构性先验的指导性自我注意力网络（MS-SAN），将不同的结构先验转变为不同的注意力头，并使用一种新型的基于多掩膜的多掩膜多掩膜多头注意机制。特别是，我们整合了两类结构先验，包括顺序顺序和单词的相对位置。为了捕获文本的潜在层次结构，我们不仅从单词上下文中提取这些信息，而且还从依赖性语法树中提取这些信息。两项任务的实验结果表明，MS-SAN针对其他强大的基线实现了重大改进。

Self attention networks (SANs) have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, standard SANs are usually position-independent, and thus are incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on SANs for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into SAN models, proposing the Multiple Structural Priors Guided Self Attention Network (MS-SAN) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent hierarchical structure of the texts, we extract these information not only from the word contexts but also from the dependency syntax trees. Experimental results on two tasks show that MS-SAN achieves significant improvements against other strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题