论文标题
基于观察者的逆增强学习中的非唯一性和融合授予等效解决方案
Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning
论文作者
论文摘要
在线和实时解决确定性逆增强学习(IRL)问题的关键挑战是存在多种解决方案。非独立性需要研究等效溶液的概念,即导致成本功能不同但相同反馈矩阵的解决方案以及与此类溶液的收敛。虽然在文献中已经开发了导致与等效解决方案的脱机算法,但在线实时技术无法解决非唯一性。在本文中,开发了一个正规历史记录堆栈观察者,该观察者会收敛于IRL问题的大致等效解决方案。开发了新的数据富度条件以促进分析,并提供了模拟结果以证明已开发技术的有效性。
A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
