DYREP：动态重新参数化的自举训练

论文标题

DYREP：动态重新参数化的自举训练

DyRep: Bootstrapping Training with Dynamic Re-parameterization

论文作者

Huang, Tao, You, Shan, Zhang, Bohan, Du, Yuxuan, Wang, Fei, Qian, Chen, Xu, Chang

论文摘要

结构重新参数化（REP）方法在简单的VGG式网络上实现了明显的改进。尽管存在盛行，但当前的代表方法只是将所有操作重新分配到一个增强网络中，包括很少有助于模型性能的网络。因此，支付的价格是操纵这些不必要的行为的昂贵计算开销。为了消除上述警告，我们旨在通过设计动态重新分配方法（DYREP）方法来以最低的成本进行训练，该方法将REP技术编码为动态发展网络结构的训练过程。具体而言，我们的提案可自适应地找到了对网络损失最大的作用，并应用代表来增强其代表性。此外，为了抑制REP引入的嘈杂和多余的操作，我们设计了一种去紧凑的重新分析化的去聚光技术。在这方面，Dyrep比REP更有效，因为它可以平稳地发展给定的网络，而不是构建过度参数化的网络。实验结果证明了我们的有效性，例如，DyRep将Resnet-18的准确性提高了Imagenet的$ 2.04 \％$ $，并在基线上降低了$ 22 \％$运行时。代码可在以下网址找到：https：//github.com/hunto/dyrep。

Structural re-parameterization (Rep) methods achieve noticeable improvements on simple VGG-style networks. Despite the prevalence, current Rep methods simply re-parameterize all operations into an augmented network, including those that rarely contribute to the model's performance. As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to bootstrap the training with minimal cost by devising a dynamic re-parameterization (DyRep) method, which encodes Rep technique into the training process that dynamically evolves the network structures. Concretely, our proposal adaptively finds the operations which contribute most to the loss in the network, and applies Rep to enhance their representational capacity. Besides, to suppress the noisy and redundant operations introduced by Rep, we devise a de-parameterization technique for a more compact re-parameterization. With this regard, DyRep is more efficient than Rep since it smoothly evolves the given network instead of constructing an over-parameterized network. Experimental results demonstrate our effectiveness, e.g., DyRep improves the accuracy of ResNet-18 by $2.04\%$ on ImageNet and reduces $22\%$ runtime over the baseline. Code is available at: https://github.com/hunto/DyRep.

下载PDF全文

下载文献需遵守相关版权规定

论文标题