为什么甘斯对NLP过度杀伤

论文标题

为什么甘斯对NLP过度杀伤

Why GANs are overkill for NLP

论文作者

Alvarez-Melis, David, Garg, Vikas, Kalai, Adam Tauman

论文摘要

这项工作提供了一种新颖的理论观点，即为什么尽管尝试了许多尝试，但对于某些一代任务，尤其是诸如自然语言生成的任务，例如在其他方面，诸如计算机视觉之类的依从性模型（例如，gans）的对抗性方法并不那么受欢迎。特别是，在诸如文本之类的顺序数据上，最大可能的方法比gan所使用。我们表明，虽然似乎最大化的可能性与最大程度地降低了区分性本质上不同，但这种区别在很大程度上是人为的，仅适用于有限的模型。我们认为，最大程度地减少KL差异（即最大化可能性）是一种更有效的方法，可以有效地最大程度地减少对抗性模型寻求优化的相同可区分性标准。减少表明，最小化可区分性可以看作是仅仅为某些模型家族（包括N-gram模型和具有软磁性输出层的神经网络）的可能性而言，可以简单地提高可能性。为了实现完整的多项式时间减少，考虑了一种新颖的下一步可区分性模型。

This work offers a novel theoretical perspective on why, despite numerous attempts, adversarial approaches to generative modeling (e.g., GANs) have not been as popular for certain generation tasks, particularly sequential tasks such as Natural Language Generation, as they have in others, such as Computer Vision. In particular, on sequential data such as text, maximum-likelihood approaches are significantly more utilized than GANs. We show that, while it may seem that maximizing likelihood is inherently different than minimizing distinguishability, this distinction is largely artificial and only holds for limited models. We argue that minimizing KL-divergence (i.e., maximizing likelihood) is a more efficient approach to effectively minimizing the same distinguishability criteria that adversarial models seek to optimize. Reductions show that minimizing distinguishability can be seen as simply boosting likelihood for certain families of models including n-gram models and neural networks with a softmax output layer. To achieve a full polynomial-time reduction, a novel next-token distinguishability model is considered.

下载PDF全文

下载文献需遵守相关版权规定

论文标题