论文标题

学习参数分布以检测数据流中的概念漂移

Learning Parameter Distributions to Detect Concept Drift in Data Streams

论文作者

Haug, Johannes, Kasneci, Gjergji

论文摘要

流媒体环境中的数据分布通常不是静止的。为了始终保持高度的预测质量,在线学习模型需要适应分配变化,这被称为概念漂移。概念漂移的及时且可靠的识别可能很困难,因为我们永远无法访问流数据的真实分布。在这项工作中,我们为检测真实概念漂移(称为Erics)的发现提供了一个新颖的框架。通过将预测模型的参数视为随机变量,我们表明概念漂移对应于最佳参数分布的变化。为此,我们采取了信息理论的共同措施。所提出的框架完全不可静止。通过选择适当的基本模型,ERIC还能够在输入级别检测概念漂移,这比现有方法是一个重要的优势。对几个合成和现实世界数据集的评估表明,所提出的框架比现有的各种作品更有效,更精确地识别概念漂移。

Data distributions in streaming environments are usually not stationary. In order to maintain a high predictive quality at all times, online learning models need to adapt to distributional changes, which are known as concept drift. The timely and robust identification of concept drift can be difficult, as we never have access to the true distribution of streaming data. In this work, we propose a novel framework for the detection of real concept drift, called ERICS. By treating the parameters of a predictive model as random variables, we show that concept drift corresponds to a change in the distribution of optimal parameters. To this end, we adopt common measures from information theory. The proposed framework is completely model-agnostic. By choosing an appropriate base model, ERICS is also capable to detect concept drift at the input level, which is a significant advantage over existing approaches. An evaluation on several synthetic and real-world data sets suggests that the proposed framework identifies concept drift more effectively and precisely than various existing works.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源