线性动力学系统的黑盒控制

论文标题

线性动力学系统的黑盒控制

Black-Box Control for Linear Dynamical Systems

论文作者

Chen, Xinyi, Hazan, Elad

论文摘要

我们考虑了从单个黑盒交互链中控制未知线性时间不变的动力系统的问题，而无需访问重置或离线模拟。假设系统是可控的，我们给出了第一个有效的算法，该算法能够在在线非策略控制的情况下在单个轨迹中获得sublrinear后悔。这解决了随机LQR问题的开放问题，并且在更具挑战性的环境中，可以进行对抗扰动和对抗性选择并改变凸损失功能。我们的算法给出了有限的遗憾界限，按$ 2^{\ tilde {o}（\ Mathcal {l}）} + \ tilde {o}（\ text {poly}（\ natercal {l}）t^{l}） $ 2^{\ tilde {o}（\ Mathcal {l}）} + \ tilde {o}（\ \ text {poly}（\ Mathcal {l}）\ sqrt {t} {t}）$ for black-box lqr，where $ \ mathcal {where $ \ m mathcal {lqal {lqal {wher至关重要的步骤是一种新的系统识别方法，可对对抗性噪声稳健，但会产生指数成本。要完成图片，我们对在线黑框控制问题的复杂性进行了研究，并给出了$ 2^{ω（\ Mathcal {l}）} $的匹配下限，这表明额外的指数成本是不可避免的。即使在无噪声设置中，该下限也可以保持，并适用于任何随机或确定性的黑框控制方法。

We consider the problem of controlling an unknown linear time-invariant dynamical system from a single chain of black-box interactions, with no access to resets or offline simulation. Under the assumption that the system is controllable, we give the first efficient algorithm that is capable of attaining sublinear regret in a single trajectory under the setting of online nonstochastic control. This resolves an open problem on the stochastic LQR problem, and in a more challenging setting that allows for adversarial perturbations and adversarially chosen and changing convex loss functions. We give finite-time regret bounds for our algorithm on the order of $2^{\tilde{O}(\mathcal{L})} + \tilde{O}(\text{poly}(\mathcal{L}) T^{2/3})$ for general nonstochastic control, and $2^{\tilde{O}(\mathcal{L})} + \tilde{O}(\text{poly}(\mathcal{L}) \sqrt{T})$ for black-box LQR, where $\mathcal{L}$ is the system size which is an upper bound on the dimension. The crucial step is a new system identification method that is robust to adversarial noise, but incurs exponential cost. To complete the picture, we investigate the complexity of the online black-box control problem, and give a matching lower bound of $2^{Ω(\mathcal{L})}$ on the regret, showing that the additional exponential cost is inevitable. This lower bound holds even in the noiseless setting, and applies to any, randomized or deterministic, black-box control method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题