论文标题

数据有效且可解释的表格异常检测

Data-Efficient and Interpretable Tabular Anomaly Detection

论文作者

Chang, Chun-Hao, Yoon, Jinsung, Arik, Sercan, Udell, Madeleine, Pfister, Tomas

论文摘要

异常检测(AD)在众多应用中起重要作用。我们专注于两个研究的AD方面,这些方面对于集成到现实世界应用程序中至关重要。首先,大多数AD方法无法纳入通常以少量数量的标记数据,并且对于实现高广告准确性至关重要。其次,大多数广告方法是不可解释的,这是一种阻止利益相关者理解异常原因的瓶颈。在本文中,我们提出了一个新颖的AD框架,该框架适应了白色框模型类,广义添加剂模型,以使用部分识别目标检测异常,该目标自然处理嘈杂或异构特征。此外,所提出的框架DIAD可以包含少量标记的数据,以进一步增强半监视设置中的异常检测性能。我们证明了使用多种表格数据集的无监督和半监督设置中的框架优越性。例如,通过从未标记的数据中学习AD,标记为异常的5个标记为异常DIAD从86.2 \%提高到89.4 \%AUC。我们还提出了有见地的解释,这些解释解释了为什么Diad认为某些样本是异常。

Anomaly detection (AD) plays an important role in numerous applications. We focus on two understudied aspects of AD that are critical for integration into real-world applications. First, most AD methods cannot incorporate labeled data that are often available in practice in small quantities and can be crucial to achieve high AD accuracy. Second, most AD methods are not interpretable, a bottleneck that prevents stakeholders from understanding the reason behind the anomalies. In this paper, we propose a novel AD framework that adapts a white-box model class, Generalized Additive Models, to detect anomalies using a partial identification objective which naturally handles noisy or heterogeneous features. In addition, the proposed framework, DIAD, can incorporate a small amount of labeled data to further boost anomaly detection performances in semi-supervised settings. We demonstrate the superiority of our framework compared to previous work in both unsupervised and semi-supervised settings using diverse tabular datasets. For example, under 5 labeled anomalies DIAD improves from 86.2\% to 89.4\% AUC by learning AD from unlabeled data. We also present insightful interpretations that explain why DIAD deems certain samples as anomalies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源