学习模型压缩的低排名表示

论文标题

学习模型压缩的低排名表示

Learning Low-Rank Representations for Model Compression

论文作者

Zhu, Zezhou, Zhou, Yucong, Zhong, Zhao

论文摘要

矢量量化（VQ）是一种吸引人的模型压缩方法，可获得精确损失较小的微型模型。虽然已经对固定聚类维度下获得更好的代码簿和代码的方法进行了广泛的研究，但不仔细考虑对媒介的优化以促进聚类性能，尤其是通过降低向量维度。本文报告了我们在维数压缩和向量量化的组合上的最新进展，提出了低级别表示矢量量化（$ \ text {lr}^2 \ text {vq} $），该方法胜过各种任务和体系结构中先前的VQ算法。 $ \ text {lr}^2 \ text {vq} $与子向量群集加入低秩代表，以构建一种新型的构建块，该构建块通过对任务损失而直接通过端到端培训进行了直接优化。我们提出的设计模式介绍了三个超参数，簇$ k $的数量，子向量$ m $的大小和聚类维度$ \ tilde {d} $。在我们的方法中，压缩比可以由$ m $直接控制，最终精度仅由$ \ tilde {d} $确定。我们将$ \ tilde {d} $识别为低级别近似错误和聚类误差之间的权衡，并同时进行理论分析和实验观察，以赋予微调之前的适当$ \ tilde {d} $的估计。有了适当的$ \ tilde {d} $，我们评估了Imagenet分类数据集上的$ \ text {lr}^2 \ text {vq} $，带有resnet-benet-resnet-50，以实现2.8 \％/1.0 \％的TOP-1 $ 31 $ $/$ 31 $ 43 $ 43 $ 43 因素。

Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations of the vectors in favour of clustering performance are not carefully considered, especially via the reduction of vector dimensionality. This paper reports our recent progress on the combination of dimensionality compression and vector quantization, proposing a Low-Rank Representation Vector Quantization ($\text{LR}^2\text{VQ}$) method that outperforms previous VQ algorithms in various tasks and architectures. $\text{LR}^2\text{VQ}$ joins low-rank representation with subvector clustering to construct a new kind of building block that is directly optimized through end-to-end training over the task loss. Our proposed design pattern introduces three hyper-parameters, the number of clusters $k$, the size of subvectors $m$ and the clustering dimensionality $\tilde{d}$. In our method, the compression ratio could be directly controlled by $m$, and the final accuracy is solely determined by $\tilde{d}$. We recognize $\tilde{d}$ as a trade-off between low-rank approximation error and clustering error and carry out both theoretical analysis and experimental observations that empower the estimation of the proper $\tilde{d}$ before fine-tunning. With a proper $\tilde{d}$, we evaluate $\text{LR}^2\text{VQ}$ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8\%/1.0\% top-1 accuracy improvements over the current state-of-the-art VQ-based compression algorithms with 43$\times$/31$\times$ compression factor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题