混合Gromov-Wasserstein嵌入胶囊学习

论文标题

混合Gromov-Wasserstein嵌入胶囊学习

Hybrid Gromov-Wasserstein Embedding for Capsule Learning

论文作者

Shamsolmoali, Pourya, Zareapoor, Masoumeh, Das, Swagatam, Granger, Eric, Garcia, Salvador

论文摘要

胶囊网络（CAPSNETS）的目的是使用涉及零件全部转换和层次组件路由的两步过程将对象，零件及其关系的层次结构解析为层次结构。但是，这种分层关系建模在计算上是昂贵的，尽管它具有潜在的优势，但它限制了CAPSNET的广泛使用。 CAPSNET模型的当前状态主要集中于将其性能与胶囊基准进行比较，而与复杂任务中的深CNN变体相同的水平水平不高。为了解决这一限制，我们提出了一种有效的学习胶囊方法，该方法超过了规范的基线模型，甚至与高性能卷积模型相比，甚至表现出了出色的性能。我们的贡献可以在两个方面概述：首先，我们介绍了一组子封装，投影了输入向量。随后，我们提出了混合的Gromov-Wasserstein框架，该框架最初量化了由子封装模型建模的输入和组件之间的差异，然后通过最佳运输确定其对齐度。这种创新的机制将基于其各自的组件分布的相似性来定义输入和子胶囊之间的一致性的新见解。这种方法增强了Capsnets从复杂的高维数据中学习的能力，同时保留其可解释性和分层结构。我们提出的模型提供了两个不同的优势：（i）其轻质性质有助于胶囊在更复杂的视力任务（包括对象检测）中的应用；（ii）在这些苛刻的任务中，它的表现优于基线方法。

Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relations using a two-step process involving part-whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep CNN variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared to high-performing convolution models. Our contribution can be outlined in two aspects: firstly, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the Hybrid Gromov-Wasserstein framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport. This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets' capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: (i) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; (ii) it outperforms baseline approaches in these demanding tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题