论文标题

通过Human-Ai Copilot优化有效学习安全驾驶政策

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

论文作者

Li, Quanyi, Peng, Zhenghao, Zhou, Bolei

论文摘要

人类干预是将人类知识注入强化学习的训练循环的有效方法,这可以带来快速学习并确保培训安全性。鉴于人类干预预算非常有限,设计人类专家何时以及如何与培训中的学习代理人进行互动仍然具有挑战性。在这项工作中,我们开发了一种新型的人类在循环学习方法中,称为Human-Ai Copilot优化(HACO)。为了允许代理在危险环境中的足够探索,同时确保训练安全性,人类专家可以接管控制,并演示如何避免可能危险的情况或琐碎的行为。然后,提出的HACO随后有效地利用了来自反复试验的探索和人类部分演示的数据来训练高性能的药物。 HACO提取从部分人的演示中提取代理状态行动值,并优化了代理以改善代理值的同时减少人类干预措施。实验表明,HACO在安全驾驶基准中实现了显着高的样品效率。 HACO可以通过少数人的干预预算来训练代理商在看不见的交通情况下驾驶,并实现高安全性和可推广性,以优于强化学习和模仿学习基准的差额很大。代码和演示视频可在以下网址提供:https://decisionforce.github.io/haco/。

Human intervention is an effective way to inject human knowledge into the training loop of reinforcement learning, which can bring fast learning and ensured training safety. Given the very limited budget of human intervention, it remains challenging to design when and how human expert interacts with the learning agent in the training. In this work, we develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO).To allow the agent's sufficient exploration in the risky environments while ensuring the training safety, the human expert can take over the control and demonstrate how to avoid probably dangerous situations or trivial behaviors. The proposed HACO then effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent. HACO extracts proxy state-action values from partial human demonstration and optimizes the agent to improve the proxy values meanwhile reduce the human interventions. The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. HACO can train agents to drive in unseen traffic scenarios with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. Code and demo videos are available at: https://decisionforce.github.io/HACO/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源