论文标题
从分散的部分标记的医学图像中学习代表性不足的课程
Learning Underrepresented Classes from Decentralized Partially Labeled Medical Images
论文作者
论文摘要
使用分散数据进行联合培训是一个有希望的新兴研究方向,可以减轻医疗领域的数据稀缺性。但是,与在一般对象识别任务中通常看到的大规模完全标记的数据相反,由于高注释成本,本地医疗数据集更有可能仅对一类兴趣类别的图像进行注释。在本文中,我们考虑了一个实用但不足的问题,在该问题中,代表性不足的课程只有很少的标签实例可用,并且仅存在于联合系统的一些客户中。我们表明,标准的联邦学习方法无法学习具有极端类不平衡的强大多标签分类器,并通过提出一个新颖的联邦学习框架FedFew来解决它。 FedFew由三个阶段组成,第一阶段利用联盟的自我监督学习学习班级不可知的表示。在第二阶段,分散的部分标记数据被利用以学习基于能量的多标签分类器的通用类别。最后,根据能量检测到代表性不足的类别,并提出了基于原型的最近邻居模型以进行几次匹配。我们在多标签胸部疾病分类任务上评估了FedFew,并证明它的表现优于联合基层的大幅度。
Using decentralized data for federated training is one promising emerging research direction for alleviating data scarcity in the medical domain. However, in contrast to large-scale fully labeled data commonly seen in general object recognition tasks, the local medical datasets are more likely to only have images annotated for a subset of classes of interest due to high annotation costs. In this paper, we consider a practical yet under-explored problem, where underrepresented classes only have few labeled instances available and only exist in a few clients of the federated system. We show that standard federated learning approaches fail to learn robust multi-label classifiers with extreme class imbalance and address it by proposing a novel federated learning framework, FedFew. FedFew consists of three stages, where the first stage leverages federated self-supervised learning to learn class-agnostic representations. In the second stage, the decentralized partially labeled data are exploited to learn an energy-based multi-label classifier for the common classes. Finally, the underrepresented classes are detected based on the energy and a prototype-based nearest-neighbor model is proposed for few-shot matching. We evaluate FedFew on multi-label thoracic disease classification tasks and demonstrate that it outperforms the federated baselines by a large margin.