极端模型压缩，用于设备自然语言理解

论文标题

极端模型压缩，用于设备自然语言理解

Extreme Model Compression for On-device Natural Language Understanding

论文作者

Sathyendra, Kanthashree Mysore, Choudhary, Samridhi, Nicolich-Henkin, Leah

论文摘要

在本文中，我们提出并尝试对神经自然语言理解（NLU）模型极端压缩的技术进行实验，使其适合在资源受限的设备上执行。我们提出了一种任务意识，端到端的压缩方法，该方法与NLU任务学习共同执行单词插入压缩。我们在一个大规模的商业NLU系统上展示了我们的结果，该系统以各种词汇量的意图进行了训练。我们的方法表现优于一系列基线，并达到97.4％的压缩率，预测性能降解小于3.7％。我们的分析表明，下游任务的信号对于有效的压缩至关重要，而性能最小。

In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题