论文标题

比较组织病理学数据集的层次结构最佳传输

Hierarchical Optimal Transport for Comparing Histopathology Datasets

论文作者

Yeaton, Anna, Krishnan, Rahul G., Mieloszyk, Rebecca, Alvarez-Melis, David, Huynh, Grace

论文摘要

标记的组织病理学数据的稀缺性限制了深度学习方法对底型癌症类型和标签的适用性。转移学习使研究人员可以通过类似于小型目标数据集的较大数据集上的预训练机器学习模型来克服小数据集的局限性。但是,数据集之间的相似性通常是通过启发性确定的。在本文中,我们提出了基于最佳传输距离的层次概括在组织病理学数据集之间的原则性概念。我们的方法不需要任何培训,对模型类型不可知,并且保留了通过瓷砖施加的组织病理学数据集中的许多层次结构。我们将我们的方法应用于六种不同癌症类型的癌症基因组地图集的H&E染色幻灯片。我们表明,我们的方法在癌症类型的预测任务中优于基线距离。我们的结果还表明,我们的最佳运输距离预测了肿瘤中可传递性的难度与正常的预测设置。

Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets similar to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, we propose a principled notion of distance between histopathology datasets based on a hierarchical generalization of optimal transport distances. Our method does not require any training, is agnostic to model type, and preserves much of the hierarchical structure in histopathology datasets imposed by tiling. We apply our method to H&E stained slides from The Cancer Genome Atlas from six different cancer types. We show that our method outperforms a baseline distance in a cancer-type prediction task. Our results also show that our optimal transport distance predicts difficulty of transferability in a tumor vs.normal prediction setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源