自然语言处理方法以识别具有急性护理高风险的肿瘤学患者，并使用临床注释

论文标题

自然语言处理方法以识别具有急性护理高风险的肿瘤学患者，并使用临床注释

Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes

论文作者

Fanconi, Claudio, van Buchem, Marieke, Hernandez-Boussard, Tina

论文摘要

临床笔记是健康记录的重要组成部分。本文评估了一旦化学疗法开始，自然语言处理（NLP）如何使用肿瘤患者中急性护理的风险（ACU）。使用结构化健康数据（SHD）的风险预测现在是标准的，但是使用自由文本格式的预测很复杂。本文探讨了自由文本注释用于预测ACU而不是SHD的使用。将深度学习模型与手动设计的语言功能进行了比较。结果表明，SHD模型最少胜过NLP模型。具有SHD的L1型logistic回归达到0.748（95％-CI：0.735，0.762），而具有语言功能的相同模型达到了0.730（95％-CI：0.717，0.745），基于变形金属的模型可实现0.702（95％-CI：0.70-CI：0.768，0.71717）本文展示了如何在临床应用中使用语言模型，并强调了不同患者群体的风险偏见是如何不同的，即使仅使用自由文本数据。

Clinical notes are an essential component of a health record. This paper evaluates how natural language processing (NLP) can be used to identify the risk of acute care use (ACU) in oncology patients, once chemotherapy starts. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex. This paper explores the use of free-text notes for the prediction of ACU instead of SHD. Deep Learning models were compared to manually engineered language features. Results show that SHD models minimally outperform NLP models; an l1-penalised logistic regression with SHD achieved a C-statistic of 0.748 (95%-CI: 0.735, 0.762), while the same model with language features achieved 0.730 (95%-CI: 0.717, 0.745) and a transformer-based model achieved 0.702 (95%-CI: 0.688, 0.717). This paper shows how language models can be used in clinical applications and underlines how risk bias is different for diverse patient groups, even using only free-text data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题