论文标题
通过学习从外部知识转移的多标签零拍摄分类
Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge
论文作者
论文摘要
多标签零射击分类旨在预测输入图像的多个看不见的类标签。它比单个标签对应物更具挑战性。一方面,分配给每个图像的标签数量的不受约束的标签使模型更容易地拟合到那些类别的类别。另一方面,现有的多标签分类数据集中可见的和看不见的类之间存在很大的语义差距。为了解决这些困难的问题,本文通过学习从外部知识转移来引入一个新型的多标签零摄像分类框架。我们观察到Imagenet通常用于预处理特征提取器,并且具有较大且细粒度的标签空间。这促使我们将其作为外部知识来弥合所见阶级和促进概括。具体而言,我们构建了一个知识图,不仅包括目标数据集的类,还包括来自ImageNet的类别。由于目标数据集中没有图像标签,因此我们提出了一个新型的POSVAE模块,以在扩展知识图中推断其初始状态。然后,我们设计一个关系图卷积网络(RGCN),以在类之间传播信息并实现知识转移。两个基准数据集的实验结果证明了该方法的有效性。
Multi-label zero-shot classification aims to predict multiple unseen class labels for an input image. It is more challenging than its single-label counterpart. On one hand, the unconstrained number of labels assigned to each image makes the model more easily overfit to those seen classes. On the other hand, there is a large semantic gap between seen and unseen classes in the existing multi-label classification datasets. To address these difficult issues, this paper introduces a novel multi-label zero-shot classification framework by learning to transfer from external knowledge. We observe that ImageNet is commonly used to pretrain the feature extractor and has a large and fine-grained label space. This motivates us to exploit it as external knowledge to bridge the seen and unseen classes and promote generalization. Specifically, we construct a knowledge graph including not only classes from the target dataset but also those from ImageNet. Since ImageNet labels are not available in the target dataset, we propose a novel PosVAE module to infer their initial states in the extended knowledge graph. Then we design a relational graph convolutional network (RGCN) to propagate information among classes and achieve knowledge transfer. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed approach.