论文标题
用于N- ARY关系提取药物组合的数据集
A Dataset for N-ary Relation Extraction of Drug Combinations
论文作者
论文摘要
联合疗法已成为癌症,结核病,疟疾和艾滋病毒等疾病的护理标准。但是,一组可用的多药治疗组合在确定有效的组合疗法方面构成了挑战。为了帮助医疗专业人员识别有益的药物组合,我们构建了一个专家注册的数据集,用于提取有关科学文献中药物组合功效的信息。除了其实际实用程序之外,数据集还提出了独特的NLP挑战,它是由可变长度关系组成的第一个关系提取数据集。此外,该数据集中的关系主要要求语言理解超过句子级别,从而增加了该任务的挑战。我们提供有希望的基线模型,并确定明确的领域以进一步改进。我们公开发布数据集,代码和基线模型,以鼓励NLP社区参与此任务。
Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available in a situation. To assist medical professionals in identifying beneficial drug-combinations, we construct an expert-annotated dataset for extracting information about the efficacy of drug combinations from the scientific literature. Beyond its practical utility, the dataset also presents a unique NLP challenge, as the first relation extraction dataset consisting of variable-length relations. Furthermore, the relations in this dataset predominantly require language understanding beyond the sentence level, adding to the challenge of this task. We provide a promising baseline model and identify clear areas for further improvement. We release our dataset, code, and baseline models publicly to encourage the NLP community to participate in this task.