关于不完整解码的经常性语言模型的一致性

论文标题

关于不完整解码的经常性语言模型的一致性

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

论文作者

Welleck, Sean, Kulikov, Ilia, Kim, Jaedeok, Pang, Richard Yuanzhe, Cho, Kyunghyun

论文摘要

尽管在各种任务上表现出色，但已证明接受最大似然训练的神经序列模型表现出长度偏见和退化重复等问题。我们研究了使用常见解码算法时从复发语言模型接收无限长度序列的相关问题。为了分析此问题，我们首先定义了解码算法的不一致性，这意味着该算法可以产生无限长度序列，该序列在模型下的概率为零。我们证明，尽管经过培训以产生有限长度的序列，但通常使用的不完整解码算法 - 贪婪搜索，梁搜索，TOP -K采样和核采样 - 不一致。基于这些见解，我们提出了两种解决不一致的补救措施：TOP-K和Nucleus采样的一致变体，以及一个自我终止的复发语言模型。经验结果表明，实践中发生不一致，并且提出的方法阻止了不一致。

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题