逆MDP模型的独特性和复杂性

论文标题

逆MDP模型的独特性和复杂性

Uniqueness and Complexity of Inverse MDP Models

论文作者

Hutter, Marcus, Hansen, Steven

论文摘要

三个步骤的动作序列AA'A“可能是从州s到州S”（从州S）达到状态的什么？解决此类问题在因果推理和加强学习中很重要。可以使用逆“ MDP”型号P（AA'A“ | SS”）来回答它们。 In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural与正向情况相似的问题是，1步逆模型P（A | SS'）加上策略π（A | S）确定多步逆模型甚至整个动力学，可以从逆模型中推断出逆向模型，甚至可以解决此问题。

What is the action sequence aa'a" that was likely responsible for reaching state s"' (from state s) in 3 steps? Addressing such questions is important in causal reasoning and in reinforcement learning. Inverse "MDP" models p(aa'a"|ss"') can be used to answer them. In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural question, analogous to the forward case is to which extent 1-step inverse model p(a|ss') plus policy π(a|s) determine the multi-step inverse models or even the whole dynamics. In other words, can forward models be inferred from inverse models or even be side-stepped. This work addresses this question and variations thereof, and also whether there are efficient decision/inference algorithms for this.

下载PDF全文

下载文献需遵守相关版权规定

论文标题