SAM：放射学图像中对像素的解剖学嵌入的自我监督学习

论文标题

SAM：放射学图像中对像素的解剖学嵌入的自我监督学习

SAM: Self-supervised Learning of Pixel-wise Anatomical Embeddings in Radiological Images

论文作者

Yan, Ke, Cai, Jinzheng, Jin, Dakai, Miao, Shun, Guo, Dazhou, Harrison, Adam P., Tang, Youbao, Xiao, Jing, Lu, Jingjing, Lu, Le

论文摘要

放射学图像，例如计算机断层扫描（CT）和X射线术具有内在结构的解剖结构。能够可靠地在不同图像中可靠地定位相同的解剖结构是医学图像分析中的一项基本任务。原则上，可以使用具有里程碑意义的检测或语义分割来完成此任务，但是要奏效，这些都需要大量的标记数据，以解决每个解剖结构和感兴趣的子结构。一种更通用的方法将从未标记的图像中学习内在结构。我们介绍了一种称为自我监督的解剖嵌入（SAM）的方法。 SAM为描述其解剖位置或身体部位的每个图像像素生成语义嵌入。为了产生这种嵌入，我们提出了一个像素级对比度学习框架。粗到最新的策略可确保对全球和局部解剖信息进行编码。负面的样本选择策略旨在增强嵌入式的可区分性。使用SAM可以在模板图像上标记任何兴趣点，然后通过简单的最近邻居搜索将同一身体部分定位在其他图像中。我们证明了SAM在具有2D和3D图像方式的多个任务中的有效性。在具有19个地标的胸部CT数据集上，SAM的表现优于广泛使用的注册算法，而仅需进行0.23秒即可进行推理。在两个X射线数据集上，只有一个标记模板图像的SAM超过了在50个标记的图像上训练的监督方法。我们还将SAM应用于CT中的全身随访匹配，并获得91％的精度。 SAM还可以用于改善图像登记和初始化CNN权重。

Radiological images such as computed tomography (CT) and X-rays render anatomy with intrinsic structures. Being able to reliably locate the same anatomical structure across varying images is a fundamental task in medical image analysis. In principle it is possible to use landmark detection or semantic segmentation for this task, but to work well these require large numbers of labeled data for each anatomical structure and sub-structure of interest. A more universal approach would learn the intrinsic structure from unlabeled images. We introduce such an approach, called Self-supervised Anatomical eMbedding (SAM). SAM generates semantic embeddings for each image pixel that describes its anatomical location or body part. To produce such embeddings, we propose a pixel-level contrastive learning framework. A coarse-to-fine strategy ensures both global and local anatomical information are encoded. Negative sample selection strategies are designed to enhance the embedding's discriminability. Using SAM, one can label any point of interest on a template image and then locate the same body part in other images by simple nearest neighbor searching. We demonstrate the effectiveness of SAM in multiple tasks with 2D and 3D image modalities. On a chest CT dataset with 19 landmarks, SAM outperforms widely-used registration algorithms while only taking 0.23 seconds for inference. On two X-ray datasets, SAM, with only one labeled template image, surpasses supervised methods trained on 50 labeled images. We also apply SAM on whole-body follow-up lesion matching in CT and obtain an accuracy of 91%. SAM can also be applied for improving image registration and initializing CNN weights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题