论文标题
通过因果干预,深度脱口可查的基于内容的标签建议
Deep Deconfounded Content-based Tag Recommendation for UGC with Causal Intervention
论文作者
论文摘要
传统的基于内容的标签推荐系统直接了解用户生成的内容(UGC)和基于收集的UGC标签对的标签之间的关联。但是,由于UGC上传器同时创建了UGC并选择相应的标签,因此她的个人喜好不可避免地会偏见标签选择,这阻止了这些推荐人学习UGCS内容特征对标签的因果影响。在本文中,我们提出了一个基于内容的深度污染标签推荐系统,即dectag,以解决上述问题。我们首先建立一个因果图来表示上传器,UGC和TAG之间的关系,其中上传器被识别为假冒的混杂因子,这些混杂因子微不足道地关联了UGC和标签选择。具体而言,为了消除混杂的偏见,通过后门调整在图表中的UGC节点上进行了因果干预,其中可以消除上载器对通过后门路径泄漏的标签的影响以进行因果效应估计。观察到使用DO-Calculus调整因果图需要整合整个上传器空间,这是不可行的,我们设计了一种带有Bootstrap的基于蒙特卡洛(MC)基于Bootstrap的估计器,该估计器可以实现渐近无偏见,规定收集到的UGC的上载器是I.I.D的。来自人口的样本。此外,在训练阶段,MC估计器具有从人群中的假设随机上载来代替偏见的上载器,在该阶段可以以可解释的方式实现反对。最后,我们基于与因果干预的广泛使用的YouTube-8M数据集建立YT-8M-CAUSAL数据集,并相应地提出了评估策略,以无公开评估因果标签建议者。广泛的实验表明,与最先进的因果推荐人相比,dectag对混淆偏见更强大。
Traditional content-based tag recommender systems directly learn the association between user-generated content (UGC) and tags based on collected UGC-tag pairs. However, since a UGC uploader simultaneously creates the UGC and selects the corresponding tags, her personal preference inevitably biases the tag selections, which prevents these recommenders from learning the causal influence of UGCs' content features on tags. In this paper, we propose a deep deconfounded content-based tag recommender system, namely, DecTag, to address the above issues. We first establish a causal graph to represent the relations among uploader, UGC, and tag, where the uploaders are identified as confounders that spuriously correlate UGC and tag selections. Specifically, to eliminate the confounding bias, causal intervention is conducted on the UGC node in the graph via backdoor adjustment, where uploaders' influence on tags leaked through backdoor paths can be eliminated for causal effect estimation. Observing that adjusting the causal graph with do-calculus requires integrating the entire uploader space, which is infeasible, we design a novel Monte Carlo (MC)-based estimator with bootstrap, which can achieve asymptotic unbiasedness provided that uploaders for the collected UGCs are i.i.d. samples from the population. In addition, the MC estimator has the intuition of substituting the biased uploaders with a hypothetical random uploader from the population in the training phase, where deconfounding can be achieved in an interpretable manner. Finally, we establish a YT-8M-Causal dataset based on the widely used YouTube-8M dataset with causal intervention and propose an evaluation strategy accordingly to unbiasedly evaluate causal tag recommenders. Extensive experiments show that DecTag is more robust to confounding bias than state-of-the-art causal recommenders.