我稀有吗？一种识别隐藏异常的智能摘要方法

论文标题

我稀有吗？一种识别隐藏异常的智能摘要方法

Am I Rare? An Intelligent Summarization Approach for Identifying Hidden Anomalies

论文作者

Ghodratnama, Samira, Zakershahrak, Mehrdad, Sobhanmanesh, Fariborz

论文摘要

监视网络流量数据以检测任何隐藏的异常模式是一项具有挑战性且耗时的任务，需要高度计算资源。为此，适当的摘要技术非常重要，它可以代替原始数据。但是，汇总的数据受到删除异常的威胁。因此，创建一个可以反映与原始数据相同的模式的摘要至关重要。因此，在本文中，我们提出了一种智能摘要方法，用于识别被称为Insident的隐藏异常。提出的方法保证将原始数据分配保留在汇总数据中。我们的方法是一种基于聚类的算法，该算法通过每个群集中的本地加权特征将原始特征空间动态映射到新功能空间。因此，在新的特征空间中，相似的样本更接近，因此离群值更可检测到。此外，基于群集大小的代表在汇总数据中保持与原始数据相同的分布。在执行异常检测算法和异常检测算法之前，内置剂可以用作预处理方法。基准数据集上的实验结果证明，数据摘要可以代替异常检测任务中的原始数据。

Monitoring network traffic data to detect any hidden patterns of anomalies is a challenging and time-consuming task that requires high computing resources. To this end, an appropriate summarization technique is of great importance, where it can be a substitute for the original data. However, the summarized data is under the threat of removing anomalies. Therefore, it is vital to create a summary that can reflect the same pattern as the original data. Therefore, in this paper, we propose an INtelligent Summarization approach for IDENTifying hidden anomalies, called INSIDENT. The proposed approach guarantees to keep the original data distribution in summarized data. Our approach is a clustering-based algorithm that dynamically maps original feature space to a new feature space by locally weighting features in each cluster. Therefore, in new feature space, similar samples are closer, and consequently, outliers are more detectable. Besides, selecting representatives based on cluster size keeps the same distribution as the original data in summarized data. INSIDENT can be used both as the preprocess approach before performing anomaly detection algorithms and anomaly detection algorithm. The experimental results on benchmark datasets prove a summary of the data can be a substitute for original data in the anomaly detection task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题