重新思考生成性零拍学习：一种合奏学习视角，用于识别视觉贴片

论文标题

重新思考生成性零拍学习：一种合奏学习视角，用于识别视觉贴片

Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches

论文作者

Chen, Zhi, Wang, Sen, Li, Jingjing, Huang, Zi

论文摘要

零射击学习（ZSL）通常用于解决预测细粒度图像分类和其他任务中看不见类的非常普遍的问题。一个解决方案家族是从辅助语义信息（例如自然语言描述）中学习由生成模型产生的合成的未见视觉样本。但是，对于大多数这些模型，性能都以无关图像背景的形式出现噪声。此外，大多数方法不会将计算出的权重分配给每个语义贴片。然而，在现实世界中，可以量化特征的歧视能力并直接利用以提高准确性并降低计算复杂性。为了解决这些问题，我们提出了一个新颖的框架，称为“多块生成对抗网（MPGAN）”，该框架通过新颖的加权投票策略综合了本地贴片特征，并标记了看不见的类。该过程首先使用多个专业生成模型从嘈杂的文本描述中生成一组预定义的本地补丁的歧视性视觉特征。然后，使用每个贴片为看不见的类合成的功能用于构建各种监督分类器的集合，每个集合对应于一个本地补丁。投票策略的平均分布概率分布从分类器输出，并且鉴于某些补丁比其他补丁更具歧视性，因此基于歧视的注意机制有助于相应地对每个补丁进行加权。广泛的实验表明，MPGAN的精度明显高于最先进的方法。

Zero-shot learning (ZSL) is commonly used to address the very pervasive problem of predicting unseen classes in fine-grained image classification and other tasks. One family of solutions is to learn synthesised unseen visual samples produced by generative models from auxiliary semantic information, such as natural language descriptions. However, for most of these models, performance suffers from noise in the form of irrelevant image backgrounds. Further, most methods do not allocate a calculated weight to each semantic patch. Yet, in the real world, the discriminative power of features can be quantified and directly leveraged to improve accuracy and reduce computational complexity. To address these issues, we propose a novel framework called multi-patch generative adversarial nets (MPGAN) that synthesises local patch features and labels unseen classes with a novel weighted voting strategy. The process begins by generating discriminative visual features from noisy text descriptions for a set of predefined local patches using multiple specialist generative models. The features synthesised from each patch for unseen classes are then used to construct an ensemble of diverse supervised classifiers, each corresponding to one local patch. A voting strategy averages the probability distributions output from the classifiers and, given that some patches are more discriminative than others, a discrimination-based attention mechanism helps to weight each patch accordingly. Extensive experiments show that MPGAN has significantly greater accuracy than state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题