论文标题
通过功能对齐方式无监督的零件发现
Unsupervised Part Discovery via Feature Alignment
论文作者
论文摘要
理解对象的各个部分很重要,因为它可以精确地理解对象的几何结构,并在新颖的姿势或部分遮挡下看到对象时增强对象识别。但是,大规模数据集中零件的手动注释耗时且昂贵。在本文中,我们旨在以无监督的方式发现对象部分,即没有地面真相部分或关键点注释。我们的方法基于直觉,即相似姿势中同一类的对象应在类似的空间位置将其部分对齐。我们利用神经网络特征在很大程度上不变的属性与滋扰变量不变,而同一对象类别的图像之间的主要变化来源是对象姿势。具体而言,在训练图像的情况下,我们找到了一组相似的图像,这些图像通过其相应的特征图的仿射对齐,以同一姿势显示同一对象类别的实例。对齐的特征地图的平均值是对深网骨干的监督培训的伪基真实注释。在推断期间,零件检测是简单快速的,没有任何额外的模块或开销,而不是前馈神经网络。我们在来自不同领域的几个数据集上的实验验证了所提出方法的有效性。例如,我们在车辆上实现了37.8 MAP,该地图至少比以前的方法更好4.2。
Understanding objects in terms of their individual parts is important, because it enables a precise understanding of the objects' geometrical structure, and enhances object recognition when the object is seen in a novel pose or under partial occlusion. However, the manual annotation of parts in large scale datasets is time consuming and expensive. In this paper, we aim at discovering object parts in an unsupervised manner, i.e., without ground-truth part or keypoint annotations. Our approach builds on the intuition that objects of the same class in a similar pose should have their parts aligned at similar spatial locations. We exploit the property that neural network features are largely invariant to nuisance variables and the main remaining source of variations between images of the same object category is the object pose. Specifically, given a training image, we find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps. The average of the aligned feature maps serves as pseudo ground-truth annotation for a supervised training of the deep network backbone. During inference, part detection is simple and fast, without any extra modules or overheads other than a feed-forward neural network. Our experiments on several datasets from different domains verify the effectiveness of the proposed method. For example, we achieve 37.8 mAP on VehiclePart, which is at least 4.2 better than previous methods.