论文标题
使用强化学习培训尖峰神经网络
Training spiking neural networks using reinforcement learning
论文作者
论文摘要
大脑中的神经元通过离散的动作尖峰相互通信,而不是人工神经网络中的连续信号传递。因此,依赖于对激活功能的不同性能的神经网络中参数优化的传统技术不再适用于对大脑中的学习过程进行建模。在这个项目中,我们提出了反向传播的生物学成分替代方案,以促进尖峰神经网络的培训。我们主要专注于研究加强学习(RL)规则的候选资格,以解决空间和时间信用分配问题,以实现复杂任务的决策。在一种方法中,我们将多层神经网络中的每个神经元视为一种独立的RL代理,形成了特征空间的不同表示,而整个网络则构成了复杂策略的表示,以解决手头上的任务。在其他方法中,我们将重新聚集技巧应用于通过尖峰神经网络中的随机转换来差异化。我们通过将两种方法应用于传统的RL域(例如Gridworld,Cartpole和Mountain Car)来比较和对比。此外,我们还提出了变化和增强,以实现该领域的未来研究。
Neurons in the brain communicate with each other through discrete action spikes as opposed to continuous signal transmission in artificial neural networks. Therefore, the traditional techniques for optimization of parameters in neural networks which rely on the assumption of differentiability of activation functions are no longer applicable to modeling the learning processes in the brain. In this project, we propose biologically-plausible alternatives to backpropagation to facilitate the training of spiking neural networks. We primarily focus on investigating the candidacy of reinforcement learning (RL) rules in solving the spatial and temporal credit assignment problems to enable decision-making in complex tasks. In one approach, we consider each neuron in a multi-layer neural network as an independent RL agent forming a different representation of the feature space while the network as a whole forms the representation of the complex policy to solve the task at hand. In other approach, we apply the reparameterization trick to enable differentiation through stochastic transformations in spiking neural networks. We compare and contrast the two approaches by applying them to traditional RL domains such as gridworld, cartpole and mountain car. Further we also suggest variations and enhancements to enable future research in this area.