论文标题
利用最大遗憾的积极偏好学习
Active Preference Learning using Maximum Regret
论文作者
论文摘要
我们将积极的偏好学习作为直觉指定自主机器人行为的框架。在主动偏好学习中,用户从一组替代方案中选择了首选行为,从中,机器人从中学习了用户的偏好,以参数化的成本函数为模型。先前的方法为用户提供了替代方案,可最大程度地减少成本函数参数的不确定性。但是,不同的参数可能导致相同的最佳行为。结果,解决方案空间比参数空间更结构。我们通过提出一个查询选择来利用这一点,该查询选择贪婪地降低了解决方案空间的最大误差比。在模拟中,我们证明所提出的方法在学习效率和对用户的查询方面均优于其他最新技术。最后,我们表明,基于解决方案的相似性评估学习,而不是权重的相似性可以更好地预测不同情况。
We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots. In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences, modeled as a parameterized cost function. Previous approaches present users with alternatives that minimize the uncertainty over the parameters of the cost function. However, different parameters might lead to the same optimal behaviour; as a consequence the solution space is more structured than the parameter space. We exploit this by proposing a query selection that greedily reduces the maximum error ratio over the solution space. In simulations we demonstrate that the proposed approach outperforms other state of the art techniques in both learning efficiency and ease of queries for the user. Finally, we show that evaluating the learning based on the similarities of solutions instead of the similarities of weights allows for better predictions for different scenarios.