论文标题

通过智能预测到优越的方法在线上下文决策

Online Contextual Decision-Making with a Smart Predict-then-Optimize Method

论文作者

Liu, Heyuan, Grigas, Paul

论文摘要

我们研究了在线上下文决策问题,并具有资源约束。在每个时间段,决策者首先根据给定上下文向量预测奖励向量和资源消耗矩阵,然后解决下游优化问题以做出决定。决策者的最终目标是最大程度地利用资源消耗的奖励和效用总结,同时满足资源限制。我们提出了一种算法,该算法将基于“智能预测 - 优化(SPO)”方法的预测步骤与基于镜像下降的双重更新步骤。我们证明了遗憾的界限,并证明了我们方法的总体收敛率取决于$ \ MATHCAL {O}(t^{ - 1/2})$在线镜像下降的收敛以及用于学习预测模型的替代损失功能的风险范围。我们的算法和遗憾界限适用于一般凸的可行区域,可用于资源限制,包括硬和软资源约束案例,它们适用于广泛的预测模型,与线性上下文模型或有限策略空间的传统设置相比。我们还进行数值实验,以与传统的仅限预测方法相比,在多维背包和最长的路径实例上,与传统的仅预测方法相比,我们提出的SPO型方法的强度。

We study an online contextual decision-making problem with resource constraints. At each time period, the decision-maker first predicts a reward vector and resource consumption matrix based on a given context vector and then solves a downstream optimization problem to make a decision. The final goal of the decision-maker is to maximize the summation of the reward and the utility from resource consumption, while satisfying the resource constraints. We propose an algorithm that mixes a prediction step based on the "Smart Predict-then-Optimize (SPO)" method with a dual update step based on mirror descent. We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $\mathcal{O}(T^{-1/2})$ convergence of online mirror descent as well as risk bounds of the surrogate loss function used to learn the prediction model. Our algorithm and regret bounds apply to a general convex feasible region for the resource constraints, including both hard and soft resource constraint cases, and they apply to a wide class of prediction models in contrast to the traditional settings of linear contextual models or finite policy spaces. We also conduct numerical experiments to empirically demonstrate the strength of our proposed SPO-type methods, as compared to traditional prediction-error-only methods, on multi-dimensional knapsack and longest path instances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源