通过随机自我注意的顺序推荐

论文标题

通过随机自我注意的顺序推荐

Sequential Recommendation via Stochastic Self-Attention

论文作者

Fan, Ziwei, Liu, Zhiwei, Wang, Alice, Nazari, Zahra, Zheng, Lei, Peng, Hao, Yu, Philip S.

论文摘要

顺序建议对用户以前行为的动态进行建模，以预测下一个项目，并引起了很多关注。基于变压器的方法将项目嵌入向量并使用点产生自我注意来衡量项目之间的关系，它表明了现有的顺序方法之间的较高功能。但是，用户的现实世界顺序行为是\ textIt {\ textbf {不确定}}，而不是确定性，对当前技术构成了重大挑战。我们进一步建议，基于点产生的方法无法完全捕获\ textIt {\ textbf {crocralitation transitivity}}}，该方法可以在序列内的项目 - 项目过渡中得出，并且对冷启动项目有益。我们进一步认为，BPR损失对正面和采样负面项目没有限制，这误导了优化。我们提出了一个小说\ textbf {sto} Chastic \ textbf {s} elf- \ textbf {a} ttention〜（stosa）以克服这些问题。尤其是Stosa，将每个项目嵌入为随机高斯分布，其协方差编码不确定性。我们设计了一个新颖的Wasserstein自我发场模块，以按序列表征项目项目位置的关系，从而有效地将不确定性纳入模型训练中。 Wasserstein的注意力还启发了协作性传递性学习，因为它满足了三角形的不平等。此外，我们将新颖的正规化术语引入了排名损失，这确保了正项目和负面项目之间的差异。在五个实际基准数据集上进行的广泛实验证明了所提出的模型优于最先进的基线，尤其是在冷启动项目上。该代码可在\ url {https://github.com/zfan20/stosa}中获得。

Sequential recommendation models the dynamics of a user's previous behaviors in order to forecast the next item, and has drawn a lot of attention. Transformer-based approaches, which embed items as vectors and use dot-product self-attention to measure the relationship between items, demonstrate superior capabilities among existing sequential methods. However, users' real-world sequential behaviors are \textit{\textbf{uncertain}} rather than deterministic, posing a significant challenge to present techniques. We further suggest that dot-product-based approaches cannot fully capture \textit{\textbf{collaborative transitivity}}, which can be derived in item-item transitions inside sequences and is beneficial for cold start items. We further argue that BPR loss has no constraint on positive and sampled negative items, which misleads the optimization. We propose a novel \textbf{STO}chastic \textbf{S}elf-\textbf{A}ttention~(STOSA) to overcome these issues. STOSA, in particular, embeds each item as a stochastic Gaussian distribution, the covariance of which encodes the uncertainty. We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences, which effectively incorporates uncertainty into model training. Wasserstein attentions also enlighten the collaborative transitivity learning as it satisfies triangle inequality. Moreover, we introduce a novel regularization term to the ranking loss, which assures the dissimilarity between positive and the negative items. Extensive experiments on five real-world benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art baselines, especially on cold start items. The code is available in \url{https://github.com/zfan20/STOSA}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题