自我监管的多模式多米诺骨牌：寻找阿尔茨海默氏病的生物标志物

论文标题

自我监管的多模式多米诺骨牌：寻找阿尔茨海默氏病的生物标志物

Self-Supervised Multimodal Domino: in Search of Biomarkers for Alzheimer's Disease

论文作者

Fedorov, Alex, Sylvain, Tristan, Geenjaar, Eloy, Luck, Margaux, Wu, Lei, DeRamus, Thomas P., Kirilin, Alex, Bleklov, Dmitry, Calhoun, Vince D., Plis, Sergey M.

论文摘要

来自多个来源的感官输入对于鲁棒和连贯的人类感知至关重要。不同的来源贡献了互补的解释因素。同样，研究经常收集多模式成像数据，每一个都可以提供共享和独特的信息。这一观察结果促使设计强大的多模式自我监督的表示算法。在本文中，我们统一了一个在一个框架下进行多模式自我监督学习的工作。观察到大多数自制方法优化了一组模型组件之间的相似性指标，我们提出了所有合理方法来组织此过程的分类法。我们首先评估玩具多模式MNIST数据集的模型，然后将其应用于与阿尔茨海默氏病患者的多模式神经影像学数据集。我们发现（1）多模式对比度学习在其单峰对应物中具有重大的好处，（2）多种对比目标的特定组成对于下游任务上的性能至关重要，（3）表示形式之间相似性的最大化对神经网络具有正直的效果，这有时会导致下降的下游性能，但仍会显示出多模型的关系。结果表明，该方法的表现优于先前基于规范相关分析（CCA）或具有线性评估协议上的多模式变异自动编码器（MMVAE）的先前自我监督编码器解码器方法。重要的是，我们找到了一种有希望的解决方案，可以通过共享的子空间来揭示模式之间的联系，这可以帮助我们在寻找神经影像生物标志物的搜索方面进行工作。

Sensory input from multiple sources is crucial for robust and coherent human perception. Different sources contribute complementary explanatory factors. Similarly, research studies often collect multimodal imaging data, each of which can provide shared and unique information. This observation motivated the design of powerful multimodal self-supervised representation-learning algorithms. In this paper, we unify recent work on multimodal self-supervised learning under a single framework. Observing that most self-supervised methods optimize similarity metrics between a set of model components, we propose a taxonomy of all reasonable ways to organize this process. We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients. We find that (1) multimodal contrastive learning has significant benefits over its unimodal counterpart, (2) the specific composition of multiple contrastive objectives is critical to performance on a downstream task, (3) maximization of the similarity between representations has a regularizing effect on a neural network, which can sometimes lead to reduced downstream performance but still reveal multimodal relations. Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods based on canonical correlation analysis (CCA) or the mixture-of-experts multimodal variational autoEncoder (MMVAE) on various datasets with a linear evaluation protocol. Importantly, we find a promising solution to uncover connections between modalities through a jointly shared subspace that can help advance work in our search for neuroimaging biomarkers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题