论文标题
事后见网络信用分配
Hindsight Network Credit Assignment
论文作者
论文摘要
我们介绍了事后视网网络信用分配(HNCA),这是一种用于随机神经网络的新颖学习方法,它通过基于其影响网络中直系孩子的输出的方式将信用纳入每个神经元的随机输出而起作用。我们证明,与增强估计量相比,HNCA提供了无偏梯度估计,同时减少方差。我们还在实验中证明了HNCA在上下文强盗版本中的优势比增强剂的优势。 HNCA的计算复杂性类似于反向传播。我们认为,HNCA可以帮助刺激随机计算图中信用分配的新思维方式。
We present Hindsight Network Credit Assignment (HNCA), a novel learning method for stochastic neural networks, which works by assigning credit to each neuron's stochastic output based on how it influences the output of its immediate children in the network. We prove that HNCA provides unbiased gradient estimates while reducing variance compared to the REINFORCE estimator. We also experimentally demonstrate the advantage of HNCA over REINFORCE in a contextual bandit version of MNIST. The computational complexity of HNCA is similar to that of backpropagation. We believe that HNCA can help stimulate new ways of thinking about credit assignment in stochastic compute graphs.