通过预估计的LM骨架建造马尔可夫生成体系结构，以实现高效的任务对话框系统

论文标题

通过预估计的LM骨架建造马尔可夫生成体系结构，以实现高效的任务对话框系统

Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

论文作者

Liu, Hong, Cai, Yucheng, Ou, Zhijian, Huang, Yi, Feng, Junlan

论文摘要

最近，基于变压器的验证语言模型（PLM）（例如GPT2和T5）已被利用来构建以任务为导向的生成性对话框（TOD）系统。现有基于PLM的模型的缺点是它们的非马尔科夫架构跨回合，即，整个历史记录都用作每个回合的条件输入。首先，这使记忆和计算中的效率低下。此外，使用整个历史记录增加了模型的复杂性，并可能损害训练效率，尤其是在面对少量标记的训练数据（低资源设置）时。在本文中，以观察到的对话状态可以看作是马尔可夫说的，我们建议在PLM骨架上建造马尔可夫生成体系结构（MGA），以提高TOD系统。在Multiwoz2.1上进行的实验表明，在丰富的资源环境中，拟议的Markov模型降低了记忆和时间成本而不会降级；在低资源环境中，马尔可夫模型的训练效率更为重要。

Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using the whole history increases model complexity and may hurt the training efficiency, especially when facing small amounts of labeled training data (the low-resource setting). In this paper, motivated by the observation that dialog states could be viewed as Markov states, we propose to build Markovian Generative Architectures (MGA) over PLM backbones for efficient TOD systems. Experiments on MultiWOZ2.1 show that in the rich-resource setting, the proposed Markov models reduce memory and time costs without performance degradation; in the low-resource setting, the training efficiency of the Markov models is more significant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题