论文标题
多标签数据流的隐式概念漂移检测
Implicit Concept Drift Detection for Multi-label Data Streams
论文作者
论文摘要
许多现实世界的应用程序采用多标签数据流,因为需要算法处理快速变化的数据增加。数据分布的变化(也称为概念漂移)会导致现有的分类模型迅速失去其有效性。为了协助分类器,我们提出了一种新型算法,称为标签依赖性漂移检测器(LD3),这是一种使用数据流中的数据依赖项的隐式(无监督)概念漂移检测器。我们的研究使用标签影响排名方法利用标签之间的动态时间依赖性,该方法利用数据融合算法并使用产生的排名来检测概念漂移。 LD3是多标签分类问题区域中第一个无监督的概念漂移检测算法。在这项研究中,我们通过将LD3与14种普遍的监督概念漂移检测算法进行比较,对LD3进行了广泛的评估,我们使用12个数据集和一个基线分类器对问题区域进行了比较。结果表明,LD3的预测性能比现实世界和合成数据流的可比检测器优于19.8 \%至68.6 \%。
Many real-world applications adopt multi-label data streams as the need for algorithms to deal with rapidly changing data increases. Changes in data distribution, also known as concept drift, cause the existing classification models to rapidly lose their effectiveness. To assist the classifiers, we propose a novel algorithm called Label Dependency Drift Detector (LD3), an implicit (unsupervised) concept drift detector using label dependencies within the data for multi-label data streams. Our study exploits the dynamic temporal dependencies between labels using a label influence ranking method, which leverages a data fusion algorithm and uses the produced ranking to detect concept drift. LD3 is the first unsupervised concept drift detection algorithm in the multi-label classification problem area. In this study, we perform an extensive evaluation of LD3 by comparing it with 14 prevalent supervised concept drift detection algorithms that we adapt to the problem area using 12 datasets and a baseline classifier. The results show that LD3 provides between 19.8\% and 68.6\% better predictive performance than comparable detectors on both real-world and synthetic data streams.