论文标题
GIGA-SSL:Gigapixel图像的自我监督学习
Giga-SSL: Self-Supervised Learning for Gigapixel Images
论文作者
论文摘要
整个幻灯片图像(WSI)是染色组织幻灯片的显微镜图像,通常准备在医学实践中诊断和治疗选择。 WSI非常大(吉米金斯尺寸)和复杂(由多达数百万个单元组成)。对WSI进行分类的当前最新方法(SOTA)方法将它们细分为瓷砖,通过预训练的网络对其进行编码,并应用多个实例学习(MIL)来训练特定的下游任务。但是,注释的数据集通常很小,通常是几百到几千个WSI,这可能会导致过度拟合和表现不佳的模型。相反,未经注释的WSI的数量正在增加,其中数万个(即将成为数百万)的图像可用的数据集。虽然先前有人建议使用这些未经注释的数据来通过自学学习(SSL)来识别合适的瓷砖表示,但下游分类任务仍然需要完全监督,因为MIL体系结构的某些部分未在瓷砖级别的SSL预训练中训练。在这里,我们提出了一种幻灯片级别SSL的策略,以利用大量WSI而没有注释来推断强大的幻灯片表示。将我们的方法应用于癌症研究中最广泛使用的数据资源之一(16 TB图像数据)之一,我们能够将数据集降低到23 MB,而不会损失预测能力的任何损失:我们表明,在这些嵌入式的线性分类器中,这些嵌入式培训训练或改进了先前的SOTA在各种基础上的SOTA表现,可以在各种基本的WSI分类任务上进行。最后,我们观察到,在所有下游任务上,使用微小的数据集(例如50个幻灯片)对SOTA的培训训练分类器的平均+6.3 AUC点提高了性能。
Whole slide images (WSI) are microscopy images of stained tissue slides routinely prepared for diagnosis and treatment selection in medical practice. WSI are very large (gigapixel size) and complex (made of up to millions of cells). The current state-of-the-art (SoTA) approach to classify WSI subdivides them into tiles, encodes them by pre-trained networks and applies Multiple Instance Learning (MIL) to train for specific downstream tasks. However, annotated datasets are often small, typically a few hundred to a few thousand WSI, which may cause overfitting and underperforming models. Conversely, the number of unannotated WSI is ever increasing, with datasets of tens of thousands (soon to be millions) of images available. While it has been previously proposed to use these unannotated data to identify suitable tile representations by self-supervised learning (SSL), downstream classification tasks still require full supervision because parts of the MIL architecture is not trained during tile level SSL pre-training. Here, we propose a strategy of slide level SSL to leverage the large number of WSI without annotations to infer powerful slide representations. Applying our method to The Cancer-Genome Atlas, one of the most widely used data resources in cancer research (16 TB image data), we are able to downsize the dataset to 23 MB without any loss in predictive power: we show that a linear classifier trained on top of these embeddings maintains or improves previous SoTA performances on various benchmark WSI classification tasks. Finally, we observe that training a classifier on these representations with tiny datasets (e.g. 50 slides) improved performances over SoTA by an average of +6.3 AUC points over all downstream tasks.