论文标题

从一个具有形状不变的谎言组变压器的图像序列中解开图案和转换

Disentangling Patterns and Transformations from One Sequence of Images with Shape-invariant Lie Group Transformer

论文作者

Takada, T., Shimaya, W., Ohmura, Y., Kuniyoshi, Y.

论文摘要

建模复杂的现实世界的一种有效方法是将世界视为对象和转换的基本组成部分的组成。尽管人类通过发展了解现实世界的构成,但为机器人配备这种学习机制非常困难。近年来,对使用深度学习的自主学习表现形式进行了重大研究。但是,大多数研究采用了统计方法,这需要大量的培训数据。与这种现有方法相反,我们采用一种新型的代数方法来代表学习,基于一个更简单,更直观的表述,观察到的世界是多种独立模式和转换的组合,这些模式和转换与模式的形状不变。由于图案的形状可以被视为针对对称转换(例如翻译或旋转)的不变特征,因此我们可以期望通过用对称的Lie Group Transformers表达转换并尝试与它们重建场景,从而自然地提取了模式。基于这个想法,我们提出了一个模型,该模型将场景分为模式的最小数量,并通过引入可学习的形状不变的Lie Group Transformers作为变换组件,将场景从仅一个图像序列中进行。实验表明,给定两个对象独立移动的图像序列,提出的模型可以发现隐藏的不同对象和构成场景的多个形状不变的变换。

An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism. In recent years, there has been significant research on autonomously learning representations of the world using the deep learning; however, most studies have taken a statistical approach, which requires a large number of training data. Contrary to such existing methods, we take a novel algebraic approach for representation learning based on a simpler and more intuitive formulation that the observed world is the combination of multiple independent patterns and transformations that are invariant to the shape of patterns. Since the shape of patterns can be viewed as the invariant features against symmetric transformations such as translation or rotation, we can expect that the patterns can naturally be extracted by expressing transformations with symmetric Lie group transformers and attempting to reconstruct the scene with them. Based on this idea, we propose a model that disentangles the scenes into the minimum number of basic components of patterns and Lie transformations from only one sequence of images, by introducing the learnable shape-invariant Lie group transformers as transformation components. Experiments show that given one sequence of images in which two objects are moving independently, the proposed model can discover the hidden distinct objects and multiple shape-invariant transformations that constitute the scenes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源