论文标题
AIDA:分析隔离和基于距离的异常检测算法
AIDA: Analytic Isolation and Distance-based Anomaly Detection Algorithm
论文作者
论文摘要
我们结合了距离和隔离的指标,以开发分析分离和基于距离的异常(AIDA)检测算法。 AIDA是第一个基于距离的方法,它不依赖最近邻居的概念,使其成为无参数模型。 与流行文献始终通过模拟计算隔离度量的文献不同,我们表明AIDA接受了离群分数的分析表达式,从而为隔离度量提供了新的见解。此外,我们提出了一种基于AIDA的异常解释方法,AIDA是基于钢化的隔离解释(TIX)算法,即使在具有数百个维度的数据集中,它也找到了最相关的离群特征。我们测试了综合和经验数据的算法:我们表明,与其他最新方法相比,AIDA具有竞争力,并且在寻找隐藏在多维特征子空间中的离群值优越。最后,我们说明了TIX算法如何能够在多维特征子空间中找到离群值,并使用这些解释来分析用于异常检测中使用的常见基准。
We combine the metrics of distance and isolation to develop the Analytic Isolation and Distance-based Anomaly (AIDA) detection algorithm. AIDA is the first distance-based method that does not rely on the concept of nearest-neighbours, making it a parameter-free model. Differently from the prevailing literature, in which the isolation metric is always computed via simulations, we show that AIDA admits an analytical expression for the outlier score, providing new insights into the isolation metric. Additionally, we present an anomaly explanation method based on AIDA, the Tempered Isolation-based eXplanation (TIX) algorithm, which finds the most relevant outlier features even in data sets with hundreds of dimensions. We test both algorithms on synthetic and empirical data: we show that AIDA is competitive when compared to other state-of-the-art methods, and it is superior in finding outliers hidden in multidimensional feature subspaces. Finally, we illustrate how the TIX algorithm is able to find outliers in multidimensional feature subspaces, and use these explanations to analyze common benchmarks used in anomaly detection.