为什么归一化流无法检测到分布数据的数据

论文标题

为什么归一化流无法检测到分布数据的数据

Why Normalizing Flows Fail to Detect Out-of-Distribution Data

论文作者

Kirichenko, Polina, Izmailov, Pavel, Wilson, Andrew Gordon

论文摘要

检测到分布（OOD）数据对于健壮的机器学习系统至关重要。正常化的流是灵活的深层生成模型，通常令人惊讶地无法区分分布数据和分发数据：在衣服图片上训练的流程会为手写数字分配更高的可能性。我们研究了为什么对OOD检测的归一化流量差。我们证明，Flow学习本地像素相关性和无特定于目标图像数据集的通用图像到贴边的空间变换。我们表明，通过修改流耦合层的结构，我们可以偏向学习目标数据的语义结构，从而改善OOD检测。我们的研究表明，使流量能够产生高保真图像的特性对OOD检测有害。

Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why normalizing flows perform poorly for OOD detection. We demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image dataset. We show that by modifying the architecture of flow coupling layers we can bias the flow towards learning the semantic structure of the target data, improving OOD detection. Our investigation reveals that properties that enable flows to generate high-fidelity images can have a detrimental effect on OOD detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题