论文标题
操作员深度Q学习:零射击奖励在加固学习中转移
Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning
论文作者
论文摘要
近年来,由于其在各种应用中的巨大成功,近年来,增强学习(RL)引起了人们的兴趣。但是,标准RL算法只能用于单个奖励功能,并且不能快速适应看不见的奖励功能。在本文中,我们主张一般操作员对加固学习的看法,这使我们能够直接近似从奖励功能到价值功能的操作员。学习操作员的好处是,我们可以将任何新的奖励函数合并为输入,并以零拍的方式获得其相应的价值函数。为了近似这种特殊类型的操作员,我们根据其理论属性设计了许多新型操作员神经网络体系结构。我们的运营商网络设计优于现有方法和通用操作员网络的标准设计,我们在多个任务中证明了操作员深层Q学习框架的好处,包括在一系列任务中进行离线政策评估(OPE)的奖励转让和奖励传输以进行离线策略优化。
Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications. However, standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly. In this paper, we advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function. The benefit of learning the operator is that we can incorporate any new reward function as input and attain its corresponding value function in a zero-shot manner. To approximate this special type of operator, we design a number of novel operator neural network architectures based on its theoretical properties. Our design of operator networks outperform the existing methods and the standard design of general purpose operator network, and we demonstrate the benefit of our operator deep Q-learning framework in several tasks including reward transferring for offline policy evaluation (OPE) and reward transferring for offline policy optimization in a range of tasks.