半尺寸的背部传播

论文标题

半尺寸的背部传播

Semi-Implicit Back Propagation

论文作者

Liu, Ren, Zhang, Xiaoqun

论文摘要

长期以来，神经网络引起了极大的关注，许多研究人员致力于提高神经网络培训算法的有效性。尽管随机梯度下降（SGD）和其他基于明确的基于梯度的方法被广泛采用，但仍然存在许多挑战，例如梯度消失和小步骤尺寸，这会导致SGD算法的缓慢收敛性和不稳定性。由错误反向传播（BP）和近端方法的动机，我们提出了一种半显微镜后退传播方法，用于神经网络训练。与BP相似，神经元的差异以向后的方式传播，并且参数通过近端映射进行更新。隐藏的神经元和参数的隐式更新允许在训练算法中选择较大的步长。最后，我们还表明，该算法产生的任何收敛序列的任何固定点都是客观损耗函数的固定点。对MNIST和CIFAR-10的实验都表明，与SGD和类似算法ProXBP相比，所提出的半显微BP算法在损失降低和训练/验证准确性方面都可以提高性能。

Neural network has attracted great attention for a long time and many researchers are devoted to improve the effectiveness of neural network training algorithms. Though stochastic gradient descent (SGD) and other explicit gradient-based methods are widely adopted, there are still many challenges such as gradient vanishing and small step sizes, which leads to slow convergence and instability of SGD algorithms. Motivated by error back propagation (BP) and proximal methods, we propose a semi-implicit back propagation method for neural network training. Similar to BP, the difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping. The implicit update for both hidden neurons and parameters allows to choose large step size in the training algorithm. Finally, we also show that any fixed point of convergent sequences produced by this algorithm is a stationary point of the objective loss function. The experiments on both MNIST and CIFAR-10 demonstrate that the proposed semi-implicit BP algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy, compared to SGD and a similar algorithm ProxBP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题