论文标题
BLOB:一个结合有机和匪徒信号的推荐概率模型
BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals
论文作者
论文摘要
推荐系统的一项常见任务是从浏览历史记录中的项目中建立用户利益的利益,然后从同一目录向用户推荐项目。用户的行为由两个部分组成:他们在没有干预的情况下查看的项目序列(有机部分)以及推荐给他们及其结果的项目序列(匪徒部分)。在本文中,我们提出了贝叶斯潜在有机匪徒模型(BLOB),这是一种概率方法,用于结合“ Or-Gail”和“ Bandit”信号,以提高建议质量的估计。 Bandit信号很有价值,因为它直接提供了推荐性能的反馈,但是信号质量非常不平衡,因为它高度集中在过去版本的ROCOM-MENDER系统中所认为的最佳建议上。相比之下,有机信号通常很强,并且涵盖了大多数项目,但并不总是与建议任务有关。为了利用有机信号在贝叶斯模型中稳定地学习匪徒信号,我们确定了三种基本类型的距离,即动作历史,动作行动和历史历史距离。我们使用变异自动编码器和局部重新参数的技巧实现了完整模型的可扩展近似。我们使用广泛的仿真研究表明,我们的方法表现出色或匹配两种基于有机的有机建议算法的价值,以及在有机和匪徒环境中的基于匪徒的方法(无论是价值和基于策略)的价值。
A common task for recommender systems is to build a pro le of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part). In this paper, we propose Bayesian Latent Organic Bandit model (BLOB), a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality. The bandit signal is valuable as it gives direct feedback of recommendation performance, but the signal quality is very uneven, as it is highly concentrated on the recommendations deemed optimal by the past version of the recom-mender system. In contrast, the organic signal is typically strong and covers most items, but is not always relevant to the recommendation task. In order to leverage the organic signal to e ciently learn the bandit signal in a Bayesian model we identify three fundamental types of distances, namely action-history, action-action and history-history distances. We implement a scalable approximation of the full model using variational auto-encoders and the local re-paramerization trick. We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms, and of bandit-based methods (both value and policy-based) both in organic and bandit-rich environments.