语义相似性评分的无监督异常检测

论文标题

语义相似性评分的无监督异常检测

Unsupervised Anomaly Detection From Semantic Similarity Scores

论文作者

Rafiee, Nima, Gholamipoor, Rahil, Kollmann, Markus

论文摘要

将样品分类为分布或分布（OOD）是一个挑战性的异常检测问题，并且对分布模型的泛化能力进行了强有力的测试。在本文中，我们提出了一个简单而通用的框架{\ it semsad}，它利用语义相似性得分来执行异常检测。这个想法是要首先找到任何测试示例的训练集中的语义上最接近的示例，其中示例之间的语义关系是通过特征向量之间的余弦相似性来量化的，这些特征向量之间的相似性在变换下使语义不变，例如几何变换（图像），时间移动（音频信号）和同义词替代（文本）。然后，如果与其最近的邻居的语义相似性明显低于分布中测试示例的相应相似性，则使用训练有素的判别器将测试示例分类为OOD。我们能够通过较大的边距在视觉域中胜过以前的异常方法，新颖性或分布式检测的方法。特别是，我们获得的AUROC值接近一个值，即在不利用标签信息的情况下，将CIFAR-10中的示例视为分布的挑战性任务。

Classifying samples as in-distribution or out-of-distribution (OOD) is a challenging problem of anomaly detection and a strong test of the generalisation power for models of the in-distribution. In this paper, we present a simple and generic framework, {\it SemSAD}, that makes use of a semantic similarity score to carry out anomaly detection. The idea is to first find for any test example the semantically closest examples in the training set, where the semantic relation between examples is quantified by the cosine similarity between feature vectors that leave semantics unchanged under transformations, such as geometric transformations (images), time shifts (audio signals), and synonymous word substitutions (text). A trained discriminator is then used to classify a test example as OOD if the semantic similarity to its nearest neighbours is significantly lower than the corresponding similarity for test examples from the in-distribution. We are able to outperform previous approaches for anomaly, novelty, or out-of-distribution detection in the visual domain by a large margin. In particular, we obtain AUROC values close to one for the challenging task of detecting examples from CIFAR-10 as out-of-distribution given CIFAR-100 as in-distribution, without making use of label information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题