论文标题

关于异常的性质和类型:数据偏差的综述

On the Nature and Types of Anomalies: A Review of Deviations in Data

论文作者

Foorthuis, Ralph

论文摘要

异常是在某种程度上不寻常且不适合一般模式的数据集中的发生。异常的概念通常定义不明,并被视为模糊和域的依赖性。此外,尽管该主题有250年的出版物,但迄今为止,尚未发布有关不同类型异常的全面和具体概述。因此,通过广泛的文献综述,本研究提供了数据异常的第一个理论上原则性和域独立的类型,并介绍了异常类型和亚型的完整概述。为了确定异常及其不同表现的概念,类型学采用了五个维度:数据类型,关系的基础性,异常水平,数据结构和数据分布。这些基本和以数据为中心的维度自然会产生3个广泛的组,9种基本类型和63个亚型异常。类型学促进了对异常检测算法功能能力的评估,有助于解释的数据科学,并提供了有关相关主题(例如本地异常与全球异常)的见解。

Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is typically ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure, and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types, and 63 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源