论文标题

表征医疗笔记中信息的价值

Characterizing the Value of Information in Medical Notes

论文作者

Hsu, Chao-Chun, Karnwal, Shantanu, Mullainathan, Sendhil, Obermeyer, Ziad, Tan, Chenhao

论文摘要

机器学习模型取决于输入数据的质量。随着电子健康记录的广泛采用,医疗保健中的数据量正在增长,以及有关医疗票据质量的投诉。我们使用两个预测任务:再入院预测和院内死亡率预测,以表征医疗注释中信息的价值。我们表明,总体而言,医疗笔记仅在再入选预测中提供了对结构化信息的额外预测能力。我们进一步提出了一个探测框架,以选择注释的一部分,该音符可以比使用所有注释更准确的预测,尽管所选信息会导致分配从训练数据发生变化(“所有注释”)。最后,我们证明了经过培训的有价值信息训练的模型实现了更好的预测性能,只有6.8%的令牌用于再入选预测。

Machine learning models depend on the quality of input data. As electronic health records are widely adopted, the amount of data in health care is growing, along with complaints about the quality of medical notes. We use two prediction tasks, readmission prediction and in-hospital mortality prediction, to characterize the value of information in medical notes. We show that as a whole, medical notes only provide additional predictive power over structured information in readmission prediction. We further propose a probing framework to select parts of notes that enable more accurate predictions than using all notes, despite that the selected information leads to a distribution shift from the training data ("all notes"). Finally, we demonstrate that models trained on the selected valuable information achieve even better predictive performance, with only 6.8% of all the tokens for readmission prediction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源