论文标题
重新引入直通估计器作为随机二进制网络的原则方法
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks
论文作者
论文摘要
训练具有二进制重量和激活的神经网络是一个具有挑战性的问题,因为缺乏梯度和优化对离散权重的优化。通过经验直通(ST)方法,已经取得了许多成功的实验结果,提出了各种临时规则,以通过非差异性激活和更新离散权重来传播梯度。同时,可以在具有伯努利权重的随机二进制网络(SBN)模型中真正得出ST方法。我们将这些推导推向了更完整和系统的研究。我们分析属性,估计精度,获得不同形式的激活和权重的正确形式的ST估计量,解释现有的经验方法及其缺点,解释在优化概率时,如何从镜下下降方法中产生潜在权重。这允许在经验上以声音近似为例,重新引入ST方法,以清晰的态度应用它们并发展进一步的改进。
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. At the same time, ST methods can be truly derived as estimators in the stochastic binary network (SBN) model with Bernoulli weights. We advance these derivations to a more complete and systematic study. We analyze properties, estimation accuracy, obtain different forms of correct ST estimators for activations and weights, explain existing empirical approaches and their shortcomings, explain how latent weights arise from the mirror descent method when optimizing over probabilities. This allows to reintroduce ST methods, long known empirically, as sound approximations, apply them with clarity and develop further improvements.