论文标题

在线合同设计的样本复杂性

The Sample Complexity of Online Contract Design

论文作者

Zhu, Banghua, Bates, Stephen, Yang, Zhuoran, Wang, Yixin, Jiao, Jiantao, Jordan, Michael I.

论文摘要

我们在在线环境中研究隐藏行动主体代理问题。在每回合中,校长发布了根据每个结果指定向代理商付款的合同。然后,代理商做出了一种战略性的行动选择,可以最大限度地提高自己的效用,但是该行动并不能直接被委托人观察到。校长观察结果,并从代理商选择的行动中获得实用性。根据过去的观察,校长以最大化她的效用的目的动态调整了合同。 我们介绍了一种在线学习算法,并在其阶梯遗憾中提供了上限。我们表明,当合同空间为$ [0,1]^m $时,Stackelberg的遗憾是由$ \ widetilde o(\ sqrt {m} \ cdot t^{1-1/(2m+1)})$限制的,并且由$ω(t^{1-1/(m+2)$ o o.该结果表明,指数级样本足以学习近乎最佳的合同,从而解决了关于在线合同设计的硬度的开放问题。此外,当合同仅限于某些子集$ \ MATHCAL {F} \ subset [0,1]^m $时,我们定义了$ \ Mathcal {f} $的内在维度,取决于空间中的球体代码的覆盖码,并限制了这种内在的变化。当$ \ Mathcal {f} $是线性合同的家族时,我们表明stackelberg遗憾的是$θ(t^{2/3})$。 合同设计问题是具有挑战性的,因为公用事业功能是不连续的。在此设置中界定离散误差是一个开放的问题。在本文中,我们确定了一组有限的方向,其中效用函数是连续的,使我们能够设计一种新的离散方法并限制其错误。这种方法可以使第一个上限无限制,对合同和行动空间没有限制。

We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's choice of action. Based on past observations, the principal dynamically adjusts the contracts with the goal of maximizing her utility. We introduce an online learning algorithm and provide an upper bound on its Stackelberg regret. We show that when the contract space is $[0,1]^m$, the Stackelberg regret is upper bounded by $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$, and lower bounded by $Ω(T^{1-1/(m+2)})$, where $\widetilde O$ omits logarithmic factors. This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design. Moreover, when contracts are restricted to some subset $\mathcal{F} \subset [0,1]^m$, we define an intrinsic dimension of $\mathcal{F}$ that depends on the covering number of the spherical code in the space and bound the regret in terms of this intrinsic dimension. When $\mathcal{F}$ is the family of linear contracts, we show that the Stackelberg regret grows exactly as $Θ(T^{2/3})$. The contract design problem is challenging because the utility function is discontinuous. Bounding the discretization error in this setting has been an open problem. In this paper, we identify a limited set of directions in which the utility function is continuous, allowing us to design a new discretization method and bound its error. This approach enables the first upper bound with no restrictions on the contract and action space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源