论文标题
用变压器发现对象面具,以进行无监督的语义分割
Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation
论文作者
论文摘要
无监督语义分割的任务旨在将像素聚集到语义上有意义的群体中。具体而言,分配给同一集群的像素应共享高级语义属性(例如其对象或零件类别)。本文介绍了MaskDistill:基于三个关键思想的无监督语义细分的新颖框架。首先,我们提倡一种数据驱动的策略,以生成对象掩模作为语义分割事先的像素分组的对象掩模。这种方法省略了手工制作的先验,这些先验通常是为特定场景组成而设计的,并限制了竞争框架的适用性。其次,MaskDistill将对象掩盖群簇以获取伪地面真相,以训练初始对象分割模型。第三,我们利用该模型过滤出低质量的对象蒙版。这种策略减轻了先前的像素分组中的噪声,并导致了我们用来训练最终分割模型的干净掩模。通过组合这些组件,我们可以在Pascal(+11%MIOU)和COCO(+4%Mask AP50)上的无监督语义分段以前的效果胜过以前的作品。有趣的是,与现有方法相反,我们的框架不在低级图像提示上,也不限于以对象为中心的数据集。代码和模型将可用。
The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to generate object masks that serve as a pixel grouping prior for semantic segmentation. This approach omits handcrafted priors, which are often designed for specific scene compositions and limit the applicability of competing frameworks. Second, MaskDistill clusters the object masks to obtain pseudo-ground-truth for training an initial object segmentation model. Third, we leverage this model to filter out low-quality object masks. This strategy mitigates the noise in our pixel grouping prior and results in a clean collection of masks which we use to train a final segmentation model. By combining these components, we can considerably outperform previous works for unsupervised semantic segmentation on PASCAL (+11% mIoU) and COCO (+4% mask AP50). Interestingly, as opposed to existing approaches, our framework does not latch onto low-level image cues and is not limited to object-centric datasets. The code and models will be made available.