论文标题

EIT:增强的交互式变压器

EIT: Enhanced Interactive Transformer

论文作者

Zheng, Tong, Li, Bei, Bao, Huiwen, Xiao, Tong, Zhu, Jingbo

论文摘要

两个原则:互补原则和共识原则在多视图学习的文献中得到了广泛认可。但是,当前的多头自我注意力设计是多视图学习的实例,在忽略共识的同时优先考虑互补性。为了解决这个问题,我们提出了增强的多头自我注意力(EMHA)。首先,为了满足互补原则,EMHA删除了多个子空间中查询和键之间的一对一映射约束,并允许每个查询都用于多个键。最重要的是,我们通过引入两个相互作用模型,即内 - 空间的相互作用和跨空间相互作用,开发一种完全鼓励头部共识的方法。对广泛的语言任务(例如,机器翻译,抽象摘要和语法校正,语言建模)进行了广泛的实验,显示出其优越性,模型大小的增加非常小。我们的代码将提供:https://github.com/zhengkid/eit-enhanced-interactive-transformer。

Two principles: the complementary principle and the consensus principle are widely acknowledged in the literature of multi-view learning. However, the current design of multi-head self-attention, an instance of multi-view learning, prioritizes the complementarity while ignoring the consensus. To address this problem, we propose an enhanced multi-head self-attention (EMHA). First, to satisfy the complementary principle, EMHA removes the one-to-one mapping constraint among queries and keys in multiple subspaces and allows each query to attend to multiple keys. On top of that, we develop a method to fully encourage consensus among heads by introducing two interaction models, namely inner-subspace interaction and cross-subspace interaction. Extensive experiments on a wide range of language tasks (e.g., machine translation, abstractive summarization and grammar correction, language modeling), show its superiority, with a very modest increase in model size. Our code would be available at: https://github.com/zhengkid/EIT-Enhanced-Interactive-Transformer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源