嵌入合成的非政策经验，以通过零摄入课程自动驾驶

论文标题

嵌入合成的非政策经验，以通过零摄入课程自动驾驶

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

论文作者

Bronstein, Eli, Srinivasan, Sirish, Paul, Supratik, Sinha, Aman, O'Kelly, Matthew, Nikdel, Payam, Whiteson, Shimon

论文摘要

基于ML的运动计划是一种有前途的方法来产生表现出复杂行为并自动适应新型环境的代理。在自主驾驶的背景下，通常可以平等处理所有可用的培训数据。但是，这种方法产生的代理在安全性关键设置中无法牢固地表现，这一问题无法通过简单地在培训集中添加更多数据来解决 - 我们表明，只使用10％的数据子集训练的代理人以及对整个数据集中培训的代理人都可以执行。我们提出了一种方法来预测驾驶状况的固有难度，鉴于从公共道路上部署的一组自动驾驶汽车中收集的数据。然后，我们证明该难度得分可以用于零射击传输中，以生成基于模仿学习的计划代理的课程。与整个无偏见训练数据集的训练相比，我们表明，优先考虑困难的驾驶方案的优先级既可以减少15％，又在闭环评估中将路线依从性增加了14％，同时仅使用10％的培训数据。

ML-based motion planning is a promising approach to produce agents that exhibit complex behaviors, and automatically adapt to novel environments. In the context of autonomous driving, it is common to treat all available training data equally. However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset. We present a method to predict the inherent difficulty of a driving situation given data collected from a fleet of autonomous vehicles deployed on public roads. We then demonstrate that this difficulty score can be used in a zero-shot transfer to generate curricula for an imitation-learning based planning agent. Compared to training on the entire unbiased training dataset, we show that prioritizing difficult driving scenarios both reduces collisions by 15% and increases route adherence by 14% in closed-loop evaluation, all while using only 10% of the training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题