EIT：增强的交互式变压器

论文标题

EIT：增强的交互式变压器

EIT: Enhanced Interactive Transformer

论文作者

Zheng, Tong, Li, Bei, Bao, Huiwen, Xiao, Tong, Zhu, Jingbo

论文摘要

两个原则：互补原则和共识原则在多视图学习的文献中得到了广泛认可。但是，当前的多头自我注意力设计是多视图学习的实例，在忽略共识的同时优先考虑互补性。为了解决这个问题，我们提出了增强的多头自我注意力（EMHA）。首先，为了满足互补原则，EMHA删除了多个子空间中查询和键之间的一对一映射约束，并允许每个查询都用于多个键。最重要的是，我们通过引入两个相互作用模型，即内 - 空间的相互作用和跨空间相互作用，开发一种完全鼓励头部共识的方法。对广泛的语言任务（例如，机器翻译，抽象摘要和语法校正，语言建模）进行了广泛的实验，显示出其优越性，模型大小的增加非常小。我们的代码将提供：https：//github.com/zhengkid/eit-enhanced-interactive-transformer。

Two principles: the complementary principle and the consensus principle are widely acknowledged in the literature of multi-view learning. However, the current design of multi-head self-attention, an instance of multi-view learning, prioritizes the complementarity while ignoring the consensus. To address this problem, we propose an enhanced multi-head self-attention (EMHA). First, to satisfy the complementary principle, EMHA removes the one-to-one mapping constraint among queries and keys in multiple subspaces and allows each query to attend to multiple keys. On top of that, we develop a method to fully encourage consensus among heads by introducing two interaction models, namely inner-subspace interaction and cross-subspace interaction. Extensive experiments on a wide range of language tasks (e.g., machine translation, abstractive summarization and grammar correction, language modeling), show its superiority, with a very modest increase in model size. Our code would be available at: https://github.com/zhengkid/EIT-Enhanced-Interactive-Transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题