记忆批准化的双向传播

论文标题

记忆批准化的双向传播

Double Forward Propagation for Memorized Batch Normalization

论文作者

Guo, Yong, Wu, Qingyao, Deng, Chaorui, Chen, Jian, Tan, Mingkui

论文摘要

分批归一化（BN）一直是设计深神经网络（DNN）的标准组成部分。尽管标准BN可以显着加速DNN的训练并提高概括性能，但它具有几种潜在的局限性，可能会妨碍训练和推理的性能。在训练阶段，BN依靠使用单个Minibatch估算数据的平均值和方差。因此，当批处理大小很小或数据采样较差时，BN可能不稳定。在推论阶段，国阵经常使用所谓的移动均值和移动方差，而不是批处理统计，即，国元中的训练和推理规则不一致。关于这些问题，我们提出了一个记忆的批准化（MBN），该批准化考虑了最近的多个批次以获得更准确，更强大的统计数据。请注意，在每批的SGD更新之后，模型参数将更改，并且功能将相应地更改，从而导致在考虑到批处理的更新之前和之后的分布位置。为了减轻此问题，我们在MBN中提出了一个简单的双向方案，可以进一步提高性能。与相关方法相比，所提出的MBN在训练和推理中都表现出一致的行为。经验结果表明，采用双向方案训练的基于MBN的模型大大降低了数据的灵敏度，并显着提高了概括性能。

Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference. In the training stage, BN relies on estimating the mean and variance of data using a single minibatch. Consequently, BN can be unstable when the batch size is very small or the data is poorly sampled. In the inference stage, BN often uses the so called moving mean and moving variance instead of batch statistics, i.e., the training and inference rules in BN are not consistent. Regarding these issues, we propose a memorized batch normalization (MBN), which considers multiple recent batches to obtain more accurate and robust statistics. Note that after the SGD update for each batch, the model parameters will change, and the features will change accordingly, leading to the Distribution Shift before and after the update for the considered batch. To alleviate this issue, we present a simple Double-Forward scheme in MBN which can further improve the performance. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference. Empirical results show that the MBN based models trained with the Double-Forward scheme greatly reduce the sensitivity of data and significantly improve the generalization performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题