论文标题
Lamemo:使用look-apavemement的语言建模
LaMemo: Language Modeling with Look-Ahead Memory
论文作者
论文摘要
尽管具有完全连接的自我参与的变压器对建模长期依赖性具有强大的功能,但他们却努力地扩展到具有数千个语言建模单词的长文本。解决方案之一是为模型配备复发记忆。但是,现有方法直接从以前的段中重用隐藏状态,该段以单向方式编码上下文。结果,这禁止内存与当前上下文动态交互,该上下文为令牌预测提供了最新信息。为了解决这个问题,我们提出了往来的记忆(LAMEMO),通过逐步参与右侧令牌来增强复发记忆,并与旧内存状态进行插值以维持历史上的长期信息。 Lamemo包含双向关注和段重复,并具有附加的计算开销,仅与记忆长度成正比。广泛使用语言建模基准的实验证明了其优于配备不同类型内存的基线。
Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.