我应该发送此通知吗？通过建模未来来优化推送通知决策

论文标题

我应该发送此通知吗？通过建模未来来优化推送通知决策

Should I send this notification? Optimizing push notifications decision making by modeling the future

论文作者

O'Brien, Conor, Wu, Huasen, Zhai, Shaodan, Guo, Dalin, Shi, Wenzhe, Hunt, Jonathan J

论文摘要

大多数推荐系统都是近视，也就是根据用户的直接响应来优化。这可能会与真正的目标不一致，例如创造长期用户满意度。在这项工作中，我们专注于移动推送通知，其中推荐系统决策的长期影响可能特别强大。例如，发送太多或无关的通知可能会使用户烦恼并导致他们禁用通知。但是，近视系统将始终选择发送通知，因为将来发生了负面影响。通常使用启发式方法来缓解这种情况。但是，启发式方法可能很难推理或改进，每次更改系统时都需要重新调整，并且可能是最佳的。为了应对这些缺点，对直接优化长期价值（LTV）的推荐系统引起了重大兴趣。在这里，我们描述了一种通过使用基于模型的增强学习（RL）来决定是否发送推送通知的方法来最大化LTV的方法。我们对发送通知对用户未来行为的影响进行建模。应用RL最大化推荐系统中的LTV的许多先前工作都集中在基于会话的优化上，而在本工作中进行通知决策的时间范围会在几天内扩展。我们在主要社交网络的A/B测试中测试了这种方法。我们表明，通过优化有关推送通知的决策，我们能够发送较少的通知并获得比基线系统更高的开放率，同时在平台上与现有的基于启发式的，基于启发式的系统产生相同级别的用户参与度。

Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may annoy a user and cause them to disable notifications. However, a myopic system will always choose to send a notification since negative effects occur in the future. This is typically mitigated using heuristics. However, heuristics can be hard to reason about or improve, require retuning each time the system is changed, and may be suboptimal. To counter these drawbacks, there is significant interest in recommender systems that optimize directly for long-term value (LTV). Here, we describe a method for maximising LTV by using model-based reinforcement learning (RL) to make decisions about whether to send push notifications. We model the effects of sending a notification on the user's future behavior. Much of the prior work applying RL to maximise LTV in recommender systems has focused on session-based optimization, while the time horizon for notification decision making in this work extends over several days. We test this approach in an A/B test on a major social network. We show that by optimizing decisions about push notifications we are able to send less notifications and obtain a higher open rate than the baseline system, while generating the same level of user engagement on the platform as the existing, heuristic-based, system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题