论文标题
极端模型压缩,用于设备自然语言理解
Extreme Model Compression for On-device Natural Language Understanding
论文作者
论文摘要
在本文中,我们提出并尝试对神经自然语言理解(NLU)模型极端压缩的技术进行实验,使其适合在资源受限的设备上执行。我们提出了一种任务意识,端到端的压缩方法,该方法与NLU任务学习共同执行单词插入压缩。我们在一个大规模的商业NLU系统上展示了我们的结果,该系统以各种词汇量的意图进行了训练。我们的方法表现优于一系列基线,并达到97.4%的压缩率,预测性能降解小于3.7%。我们的分析表明,下游任务的信号对于有效的压缩至关重要,而性能最小。
In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.