使用多个实例学习构建多模式表示

论文标题

使用多个实例学习构建多模式表示

Using Multiple Instance Learning to Build Multimodal Representations

论文作者

Wang, Peiqi, Wells, William M., Berkowitz, Seth, Horng, Steven, Golland, Polina

论文摘要

图像文本多模式表示学习使跨模式的数据对齐，并实现重要的医学应用，例如图像分类，视觉接地和跨模式检索。在这项工作中，我们建立了多模式表示学习与多个实例学习之间的联系。基于此连接，我们提出了一个通用框架，用于构建置换不变的分数功能，许多现有的多模式表示学习方法作为特殊情况。此外，我们使用该框架来得出一种新颖的对比学习方法，并证明我们的方法实现了最新的方法，从而导致了几个下游任务。

Image-text multimodal representation learning aligns data across modalities and enables important medical applications, e.g., image classification, visual grounding, and cross-modal retrieval. In this work, we establish a connection between multimodal representation learning and multiple instance learning. Based on this connection, we propose a generic framework for constructing permutation-invariant score functions with many existing multimodal representation learning approaches as special cases. Furthermore, we use the framework to derive a novel contrastive learning approach and demonstrate that our method achieves state-of-the-art results in several downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题