论文标题
光盘:特征的差异光谱聚类
DiSC: Differential Spectral Clustering of Features
论文作者
论文摘要
在各种科学领域中,选择区分两个条件的特征子集是一个关键任务。在许多应用中,感兴趣的特征形成了对手头数据的相似效果的群集。为了恢复这样的簇,我们开发了光盘,这是一种数据驱动的方法,用于检测区分条件之间的特征组。对于每种条件,我们构造一个图形,其节点对应于特征,而其权重是该条件之间相似性的功能。然后,我们将光谱方法应用于计算节点的子集,该节点的连通性在条件特定的特征图之间显着不同。在理论方面,我们通过基于随机块模型的玩具示例分析了我们的方法。我们在各种数据集上评估光盘,包括MNIST,高光谱成像,模拟SCRNA-SEQ和TASK FMRI,并证明盘可揭示与竞争方法相比,可以更好地区分条件的特征。
Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a graph whose nodes correspond to the features and whose weights are functions of the similarity between them for that condition. We then apply a spectral approach to compute subsets of nodes whose connectivity differs significantly between the condition-specific feature graphs. On the theoretical front, we analyze our approach with a toy example based on the stochastic block model. We evaluate DiSC on a variety of datasets, including MNIST, hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that DiSC uncovers features that better differentiate between conditions compared to competing methods.