持续强化学习的动态对话政策

论文标题

持续强化学习的动态对话政策

Dynamic Dialogue Policy for Continual Reinforcement Learning

论文作者

Geishauser, Christian, van Niekerk, Carel, Lubis, Nurul, Heck, Michael, Lin, Hsien-Chin, Feng, Shutong, Gašić, Milica

论文摘要

持续学习是人类学习的关键组成部分之一，也是人工智能的必要要求。由于对话可能会无限地涵盖许多主题和任务，因此以任务为导向的对话系统必须具有不断学习的能力，并动态适应了新的挑战，同时保留了它已经获得的知识。尽管很重要，但对对话政策的持续强化学习在很大程度上仍未得到解决。到目前为止，缺乏具有培训协议，基线模型和合适指标的框架，这阻碍了这一方向的研究。在这项工作中，我们精确地填补了这一空白，从而使对话政策优化的研究从静态学习到动态学习。我们提供一种持续的学习算法，基线体系结构和指标，用于评估持续学习模型。此外，我们提出了动态对话策略变压器（DDPT），这是一种新型的动态体系结构，可以无缝集成新的知识，能够处理较大的状态空间并在暴露于看不见的域而没有网络参数大小的任何增长的情况下，都能获得大量的零弹性性能。

Continual learning is one of the key components of human learning and a necessary requirement of artificial intelligence. As dialogue can potentially span infinitely many topics and tasks, a task-oriented dialogue system must have the capability to continually learn, dynamically adapting to new challenges while preserving the knowledge it already acquired. Despite the importance, continual reinforcement learning of the dialogue policy has remained largely unaddressed. The lack of a framework with training protocols, baseline models and suitable metrics, has so far hindered research in this direction. In this work we fill precisely this gap, enabling research in dialogue policy optimisation to go from static to dynamic learning. We provide a continual learning algorithm, baseline architectures and metrics for assessing continual learning models. Moreover, we propose the dynamic dialogue policy transformer (DDPT), a novel dynamic architecture that can integrate new knowledge seamlessly, is capable of handling large state spaces and obtains significant zero-shot performance when being exposed to unseen domains, without any growth in network parameter size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题