基于概念调制的基于模型的离线加固学习以快速概括

论文标题

基于概念调制的基于模型的离线加固学习以快速概括

Concept-modulated model-based offline reinforcement learning for rapid generalization

论文作者

Ketz, Nicholas A., Pilly, Praveen K.

论文摘要

任何机器学习解决方案的鲁棒性从根本上受到训练的数据的约束。超越原始培训的一种概括是通过对原始数据集的人为信息的增强。但是，不可能指定部署过程中可能发生的所有可能发生的故障案例。为了解决这个限制，我们将基于模型的增强学习和模型纠缠方法结合在一起，提出了一种解决方案，该解决方案会自生，从而以不受监禁的方式学习的环境概念和动力学约束。特别是，代理环境的内部模型以对代理动作敏感的输入空间的低维概念表示为条件。我们在简单的点对点导航任务中在标准逼真的驾驶模拟器中演示了这种方法，在其中，我们显示了与指定失败情况不同实例的一声概括以及与基于模型和无模型和无模型方法相比的类似变化的零弹性概括。

The robustness of any machine learning solution is fundamentally bound by the data it was trained on. One way to generalize beyond the original training is through human-informed augmentation of the original dataset; however, it is impossible to specify all possible failure cases that can occur during deployment. To address this limitation we combine model-based reinforcement learning and model-interpretability methods to propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We demonstrate this method within a standard realistic driving simulator in a simple point-to-point navigation task, where we show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题