预期在线学习控制的长期效果

论文标题

预期在线学习控制的长期效果

Anticipating the Long-Term Effect of Online Learning in Control

论文作者

Capone, Alexandre, Hirche, Sandra

论文摘要

使用在线收集的测量数据学习的控制方案对于控制复杂和不确定系统的控制越来越有希望。但是，在这种方法的大多数方法中，学习被视为一种副作用，可以通过更新系统动力学模型来被动地改善控制性能。确定如何在控制综合中积极利用学习绩效的改善仍然是一个开放的研究问题。在本文中，我们介绍了鹿角，这是一种基于学习的控制定律的设计算法，即预期学习，即明确考虑不确定的动态设置中未来学习的影响。鹿角使用非参数概率模型表达系统不确定性。鉴于成本函数来衡量控制性能，鹿角选择了控制参数，以便将闭环系统的预期成本大约最小化。我们表明，鹿角与概率一个任意准确地近似最佳解决方案。此外，我们将鹿角应用于非线性系统，与没有预期的学习相比，该系统会产生更好的结果。

Control schemes that learn using measurement data collected online are increasingly promising for the control of complex and uncertain systems. However, in most approaches of this kind, learning is viewed as a side effect that passively improves control performance, e.g., by updating a model of the system dynamics. Determining how improvements in control performance due to learning can be actively exploited in the control synthesis is still an open research question. In this paper, we present AntLer, a design algorithm for learning-based control laws that anticipates learning, i.e., that takes the impact of future learning in uncertain dynamic settings explicitly into account. AntLer expresses system uncertainty using a non-parametric probabilistic model. Given a cost function that measures control performance, AntLer chooses the control parameters such that the expected cost of the closed-loop system is minimized approximately. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one. Furthermore, we apply AntLer to a nonlinear system, which yields better results compared to the case where learning is not anticipated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题