论文标题

使用多个实例学习构建多模式表示

Using Multiple Instance Learning to Build Multimodal Representations

论文作者

Wang, Peiqi, Wells, William M., Berkowitz, Seth, Horng, Steven, Golland, Polina

论文摘要

图像文本多模式表示学习使跨模式的数据对齐,并实现重要的医学应用,例如图像分类,视觉接地和跨模式检索。在这项工作中,我们建立了多模式表示学习与多个实例学习之间的联系。基于此连接,我们提出了一个通用框架,用于构建置换不变的分数功能,许多现有的多模式表示学习方法作为特殊情况。此外,我们使用该框架来得出一种新颖的对比学习方法,并证明我们的方法实现了最新的方法,从而导致了几个下游任务。

Image-text multimodal representation learning aligns data across modalities and enables important medical applications, e.g., image classification, visual grounding, and cross-modal retrieval. In this work, we establish a connection between multimodal representation learning and multiple instance learning. Based on this connection, we propose a generic framework for constructing permutation-invariant score functions with many existing multimodal representation learning approaches as special cases. Furthermore, we use the framework to derive a novel contrastive learning approach and demonstrate that our method achieves state-of-the-art results in several downstream tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源