一个快速注意网络，用于连接意图检测和插槽填充边缘设备

论文标题

一个快速注意网络，用于连接意图检测和插槽填充边缘设备

A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

论文作者

Huang, Liang, Liang, Senjie, Ye, Feiyang, Gao, Nan

论文摘要

意图检测和插槽填充是自然语言理解的两个主要任务，并在以任务为导向的对话系统中起着至关重要的作用。两项任务的联合学习都可以提高推理的准确性，并且在最近的作品中很受欢迎。但是，大多数联合模型都忽略了推理潜伏期，无法满足在边缘部署对话系统的需求。在本文中，我们提出了一个快速注意网络（FAN），以进行联合意图检测和插槽填充任务，以确保准确性和延迟。具体而言，我们引入了一个干净的参数精确的注意模块，以增强意图和插槽之间的信息交换，从而提高语义精度超过2％。风扇可以在不同的编码器上实现，并在每个速度级别提供更准确的模型。我们在Jetson Nano平台上的实验表明，粉丝每秒的精度下降了15个话语，表明其在边缘设备上的有效性和效率。

Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题