通过受污染的数据进行异常检测的潜在异常暴露

论文标题

通过受污染的数据进行异常检测的潜在异常暴露

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

论文作者

Qiu, Chen, Li, Aodong, Kloft, Marius, Rudolph, Maja, Mandt, Stephan

论文摘要

异常检测旨在识别数据点，这些数据点显示了未标记数据集中大多数数据的系统偏差。一个普遍的假设是，可以使用干净的培训数据（没有异常），这在实践中通常会违反。我们提出了一种在存在与广泛模型兼容的未标记异常的情况下训练异常检测器的策略。这个想法是在更新模型参数时将二进制标签共同推断为每个基准（正常与异常）。受到异常暴露的启发（Hendrycks等人，2018年），该暴露考虑合成创建，标记为异常，我们因此使用了两个共享参数的损失的组合：一个用于正常参数，一个用于异常数据。然后，我们对参数和最可能（潜在）标签进行块坐标更新。我们在三个图像数据集上使用多个骨干模型，30个表格数据集以及视频异常检测基准测试的实验表现出对基准的一致和显着改进。

Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary labels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by outlier exposure (Hendrycks et al., 2018) that considers synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our experiments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题