论文标题
元加强学习,并自主推断子任务依赖性
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies
论文作者
论文摘要
我们提出并解决了一个新颖的几个RL问题,其中一个任务的特征是一个子任务图,该子任务描述了一组子任务及其依赖性,而这些子任务及其依赖性是未知的。在适应阶段,代理需要在几个情节中快速适应该任务,以最大程度地提高测试阶段的回报。我们没有直接学习元元素,而是使用子任务图推理(MSGI)开发了一个元学习者,该元学习器通过与环境进行交互并最大化给定的潜在参数来渗透任务的潜在参数。为了促进学习,我们采用了灵感来自上限(UCB)启发的内在奖励,鼓励有效探索。我们在两个网格世界域和星际争霸II环境上的实验结果表明,所提出的方法能够准确地推断潜在的任务参数,并比现有的元rl和分层RL方法更有效地适应。
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph which describes a set of subtasks and their dependencies that are unknown to the agent. The agent needs to quickly adapt to the task over few episodes during adaptation phase to maximize the return in the test phase. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference(MSGI), which infers the latent parameter of the task by interacting with the environment and maximizes the return given the latent parameter. To facilitate learning, we adopt an intrinsic reward inspired by upper confidence bound (UCB) that encourages efficient exploration. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter, and to adapt more efficiently than existing meta RL and hierarchical RL methods.