论文标题

极端模型压缩,用于设备自然语言理解

Extreme Model Compression for On-device Natural Language Understanding

论文作者

Sathyendra, Kanthashree Mysore, Choudhary, Samridhi, Nicolich-Henkin, Leah

论文摘要

在本文中,我们提出并尝试对神经自然语言理解(NLU)模型极端压缩的技术进行实验,使其适合在资源受限的设备上执行。我们提出了一种任务意识,端到端的压缩方法,该方法与NLU任务学习共同执行单词插入压缩。我们在一个大规模的商业NLU系统上展示了我们的结果,该系统以各种词汇量的意图进行了训练。我们的方法表现优于一系列基线,并达到97.4%的压缩率,预测性能降解小于3.7%。我们的分析表明,下游任务的信号对于有效的压缩至关重要,而性能最小。

In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源