论文标题

通过特征重新恢复因素改善聚类恢复

Improving cluster recovery with feature rescaling factors

论文作者

de Amorim, Renato Cordeiro, Makarenkov, Vladimir

论文摘要

数据预处理阶段对于聚类至关重要。功能可以使用不同的量表来描述实体。为了纠正这一点,通常采用旨在重新缩放特征的特征归一化,因此,没有一个使其他特征在所选聚类算法的目标函数中压倒其他功能。在本文中,我们认为重新制定程序不应相同处理所有功能。相反,它应该偏爱对聚类更有意义的功能。考虑到这一点,我们介绍了一种功能重新缩放方法,该方法考虑了每个功能的集群内部相关性。我们对具有和没有噪声特征的真实和合成数据进行的综合仿真研究清楚地表明,使用所提出的数据归一化策略的聚类方法清楚地表现出了使用传统数据归一化的方法。

The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源