论文标题
逆MDP模型的独特性和复杂性
Uniqueness and Complexity of Inverse MDP Models
论文作者
论文摘要
三个步骤的动作序列AA'A“可能是从州s到州S”(从州S)达到状态的什么?解决此类问题在因果推理和加强学习中很重要。可以使用逆“ MDP”型号P(AA'A“ | SS”)来回答它们。 In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural与正向情况相似的问题是,1步逆模型P(A | SS')加上策略π(A | S)确定多步逆模型甚至整个动力学,可以从逆模型中推断出逆向模型,甚至可以解决此问题。
What is the action sequence aa'a" that was likely responsible for reaching state s"' (from state s) in 3 steps? Addressing such questions is important in causal reasoning and in reinforcement learning. Inverse "MDP" models p(aa'a"|ss"') can be used to answer them. In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural question, analogous to the forward case is to which extent 1-step inverse model p(a|ss') plus policy π(a|s) determine the multi-step inverse models or even the whole dynamics. In other words, can forward models be inferred from inverse models or even be side-stepped. This work addresses this question and variations thereof, and also whether there are efficient decision/inference algorithms for this.