论文标题
四种算法来解决对称多类型非负矩阵三因素化问题
Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem
论文作者
论文摘要
在本文中,我们考虑了对称的多类非负矩阵三因素化问题(SNMTF),该问题试图同时分解几个对称的非负矩阵。这可以被视为经典非负矩阵三因素问题的概括,并包括一个非凸观目标函数,该目标函数是多元六度多项式,并且A具有凸的可行性集。它在数据科学中具有特殊的重要性,因为它是数据群集中不同数据源融合的数学模型。 我们开发了解决SNMTF的四种方法。它们基于从文献中知道的四种理论方法:固定点法(FPM),具有投影梯度(BCD)的块坐标下降,具有精确线搜索(GM-ELS)的梯度方法和自适应力矩估计方法(ADAM)。对于每种方法,我们提供了一个软件实现:对于以前的两种方法,我们使用MATLAB,对于TensorFlow库的后一个Python。 我们在三个数据集上测试这些方法:我们生成的综合数据集,而其他方法表示不同对象之间的现实生活相似性。 广泛的数值结果表明,使用足够的计算时间,所有四种方法都令人满意,而亚当通常会产生最佳的平方误差($ \ mathrm {MSE} $)。但是,如果计算时间有限,则FPM给出了最佳的$ \ mathrm {MSE} $,因为它在开始时显示了最快的收敛性。 所有数据集和代码均在我们的GitLab配置文件上公开可用。
In this paper, we consider the symmetric multi-type non-negative matrix tri-factorization problem (SNMTF), which attempts to factorize several symmetric non-negative matrices simultaneously. This can be considered as a generalization of the classical non-negative matrix tri-factorization problem and includes a non-convex objective function which is a multivariate sixth degree polynomial and a has convex feasibility set. It has a special importance in data science, since it serves as a mathematical model for the fusion of different data sources in data clustering. We develop four methods to solve the SNMTF. They are based on four theoretical approaches known from the literature: the fixed point method (FPM), the block-coordinate descent with projected gradient (BCD), the gradient method with exact line search (GM-ELS) and the adaptive moment estimation method (ADAM). For each of these methods we offer a software implementation: for the former two methods we use Matlab and for the latter Python with the TensorFlow library. We test these methods on three data-sets: the synthetic data-set we generated, while the others represent real-life similarities between different objects. Extensive numerical results show that with sufficient computing time all four methods perform satisfactorily and ADAM most often yields the best mean square error ($\mathrm{MSE}$). However, if the computation time is limited, FPM gives the best $\mathrm{MSE}$ because it shows the fastest convergence at the beginning. All data-sets and codes are publicly available on our GitLab profile.