Swift：稀疏非负张量的可伸缩剂分解

论文标题

Swift：稀疏非负张量的可伸缩剂分解

SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors

论文作者

Afshar, Ardavan, Yin, Kejing, Yan, Sherry, Qian, Cheng, Ho, Joyce C., Park, Haesun, Sun, Jimeng

论文摘要

现有的张量分解方法假设输入张量遵循一些特定的分布（即Poisson，Bernoulli和Gaussian），并通过最小化基于相应分布定义的一些经验损失函数来解决分解。但是，它遭受了几个缺点：1）实际上，基本分布是复杂且未知的，因此可以通过简单的分布近似近似。 2）输入张量的尺寸之间的相关性并非充分利用，从而导致了次优性能。尽管提出启发式方法将这种相关性纳入了高斯分布下的侧面信息，但它们不能轻易地将其推广到其他分布中。因此，在张量分解模型中使用相关性的一种更有原则的方式仍然是一个开放的挑战。在不假设任何明确分布的情况下，我们将张量分解为Wasserstein距离的最佳运输问题，该距离可以处理非负输入。我们介绍了Swift，它最大程度地减少了沃斯坦的距离，该距离测量了输入张量与重建的距离之间的距离。特别是，我们为广泛使用的张量CP分解定义了n阶张量量损失，并得出最小化它的优化算法。通过利用稀疏性结构和不同的等效公式来优化计算效率，Swift与其他众所周知的CP算法一样可扩展。使用因子矩阵作为特征，Swift可在下游预测任务方面实现高达9.65％和11.31％的相对改进。在嘈杂的条件下，Swift在预测任务中的最佳竞争对手比最佳竞争对手的相对相对提高了15％和17％。

Existing tensor factorization methods assume that the input tensor follows some specific distribution (i.e. Poisson, Bernoulli, and Gaussian), and solve the factorization by minimizing some empirical loss functions defined based on the corresponding distribution. However, it suffers from several drawbacks: 1) In reality, the underlying distributions are complicated and unknown, making it infeasible to be approximated by a simple distribution. 2) The correlation across dimensions of the input tensor is not well utilized, leading to sub-optimal performance. Although heuristics were proposed to incorporate such correlation as side information under Gaussian distribution, they can not easily be generalized to other distributions. Thus, a more principled way of utilizing the correlation in tensor factorization models is still an open challenge. Without assuming any explicit distribution, we formulate the tensor factorization as an optimal transport problem with Wasserstein distance, which can handle non-negative inputs. We introduce SWIFT, which minimizes the Wasserstein distance that measures the distance between the input tensor and that of the reconstruction. In particular, we define the N-th order tensor Wasserstein loss for the widely used tensor CP factorization and derive the optimization algorithm that minimizes it. By leveraging sparsity structure and different equivalent formulations for optimizing computational efficiency, SWIFT is as scalable as other well-known CP algorithms. Using the factor matrices as features, SWIFT achieves up to 9.65% and 11.31% relative improvement over baselines for downstream prediction tasks. Under the noisy conditions, SWIFT achieves up to 15% and 17% relative improvements over the best competitors for the prediction tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题