顺序决策问题，反馈较弱

论文标题

顺序决策问题，反馈较弱

Sequential Decision Problems with Weak Feedback

论文作者

Verma, Arun

论文摘要

本文考虑了顺序的决策问题，在观察到的反馈中不能推断出通过选择动作而产生的损失/奖励。本论文的主要部分集中在无监督的顺序选择问题上，其中无法推断从观察到的反馈中选择动作所产生的损失。我们还引入了一个名为审查的半强盗的新设置，在某些条件下可以观察到选择动作的损失。最后，我们研究了通信网络中的渠道选择问题，在这种情况下，只有在没有其他玩家选择该动作以在一轮比赛中发挥作用的情况下才能观察到一项动作的奖励。这些问题在许多领域中都发现了应用程序，例如医疗保健，众包，安全性，适应性资源分配等。该论文旨在通过利用这些问题表现出的特定结构来解决上述顺序决策问题。我们为每个设置中的每一种都开发出具有弱反馈的每个设置的最佳算法，并在合成和真实数据集中衍生出的不同问题实例上验证其经验性能。

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem, where one can not infer the loss incurred for selecting an action from observed feedback. We also introduce a new setup named Censored Semi Bandits, where the loss incurred for selecting an action can be observed under certain conditions. Finally, we study the channel selection problem in the communication networks, where the reward for an action is only observed when no other player selects that action to play in the round. These problems find applications in many fields like healthcare, crowd-sourcing, security, adaptive resource allocation, among many others. This thesis aims to address the above-described sequential decision problems by exploiting specific structures these problems exhibit. We develop provably optimal algorithms for each of these setups with weak feedback and validate their empirical performance on different problem instances derived from synthetic and real datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题