论文标题
多中心重症监护研究中的联合学习:使用EICU数据库的系统案例研究
Federated Learning in Multi-Center Critical Care Research: A Systematic Case Study using the eICU Database
论文作者
论文摘要
已提出联合学习(FL)作为一种在不同单元上训练模型而无需交换数据的方法。这在医疗保健领域提供了很大的机会,那里有大量数据集可用,但无法共享以确保患者隐私。我们系统地研究了FL对公开可用的EICU数据集的有效性,以预测每个ICU停留的生存。我们利用联邦平均为FL的主要实用算法,并考虑到客户可能会大大变化,以改变三个关键的超参数来展示其性能如何改变。我们发现,在许多情况下,许多本地培训时期都可以提高性能,同时降低沟通成本。此外,我们概述了哪种设置,只有很少参加每个联合更新回合的医院。当许多患者计数较低的医院涉及到过度拟合的效果时,可以通过减少批次化来避免过度拟合的效果。因此,这项研究有助于确定用于运行分布式算法(例如临床数据集的FL)的合适设置。
Federated learning (FL) has been proposed as a method to train a model on different units without exchanging data. This offers great opportunities in the healthcare sector, where large datasets are available but cannot be shared to ensure patient privacy. We systematically investigate the effectiveness of FL on the publicly available eICU dataset for predicting the survival of each ICU stay. We employ Federated Averaging as the main practical algorithm for FL and show how its performance changes by altering three key hyper-parameters, taking into account that clients can significantly vary in size. We find that in many settings, a large number of local training epochs improves the performance while at the same time reducing communication costs. Furthermore, we outline in which settings it is possible to have only a low number of hospitals participating in each federated update round. When many hospitals with low patient counts are involved, the effect of overfitting can be avoided by decreasing the batchsize. This study thus contributes toward identifying suitable settings for running distributed algorithms such as FL on clinical datasets.