CIGMO：深层生成框架中的分类不变表示

论文标题

CIGMO：深层生成框架中的分类不变表示

CIGMO: Categorical invariant representations in a deep generative framework

论文作者

Hosoya, Haruo

论文摘要

一般对象图像的数据具有两个最常见的结构：（1）给定形状的每个对象都可以在多种不同的视图中渲染，并且（2）对象的形状可以分类，以至于类别的形状多样性比在类别中大得多。现有的深层生成模型通常可以捕获任何一种结构，但不能两者兼而有之。在这项工作中，我们介绍了一种称为CIGMO的新颖的深层生成模型，该模型可以学习代表图像数据中的类别，形状和查看因素。该模型由形状表示的多个模块组成，它们每个模块都专门针对特定类别并从视图表示中解开，并且可以使用基于组的弱监督学习方法来学习。通过实证研究，我们表明，尽管视图差异很大，并且已经定量取代了各种先前的方法，包括最新的不变聚类算法，但我们的模型仍可以有效地发现对象形状的类别。此外，我们表明我们使用类别特殊化的方法可以增强学习的形状表示形式，以更好地执行下游任务，例如一击对象识别以及形状视图删除。

Data of general object images have two most common structures: (1) each object of a given shape can be rendered in multiple different views, and (2) shapes of objects can be categorized in such a way that the diversity of shapes is much larger across categories than within a category. Existing deep generative models can typically capture either structure, but not both. In this work, we introduce a novel deep generative model, called CIGMO, that can learn to represent category, shape, and view factors from image data. The model is comprised of multiple modules of shape representations that are each specialized to a particular category and disentangled from view representation, and can be learned using a group-based weakly supervised learning method. By empirical investigation, we show that our model can effectively discover categories of object shapes despite large view variation and quantitatively supersede various previous methods including the state-of-the-art invariant clustering algorithm. Further, we show that our approach using category-specialization can enhance the learned shape representation to better perform down-stream tasks such as one-shot object identification as well as shape-view disentanglement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题