用于N- ARY关系提取药物组合的数据集

论文标题

用于N- ARY关系提取药物组合的数据集

A Dataset for N-ary Relation Extraction of Drug Combinations

论文作者

Tiktinsky, Aryeh, Viswanathan, Vijay, Niezni, Danna, Azagury, Dana Meron, Shamay, Yosi, Taub-Tabib, Hillel, Hope, Tom, Goldberg, Yoav

论文摘要

联合疗法已成为癌症，结核病，疟疾和艾滋病毒等疾病的护理标准。但是，一组可用的多药治疗组合在确定有效的组合疗法方面构成了挑战。为了帮助医疗专业人员识别有益的药物组合，我们构建了一个专家注册的数据集，用于提取有关科学文献中药物组合功效的信息。除了其实际实用程序之外，数据集还提出了独特的NLP挑战，它是由可变长度关系组成的第一个关系提取数据集。此外，该数据集中的关系主要要求语言理解超过句子级别，从而增加了该任务的挑战。我们提供有希望的基线模型，并确定明确的领域以进一步改进。我们公开发布数据集，代码和基线模型，以鼓励NLP社区参与此任务。

Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available in a situation. To assist medical professionals in identifying beneficial drug-combinations, we construct an expert-annotated dataset for extracting information about the efficacy of drug combinations from the scientific literature. Beyond its practical utility, the dataset also presents a unique NLP challenge, as the first relation extraction dataset consisting of variable-length relations. Furthermore, the relations in this dataset predominantly require language understanding beyond the sentence level, adding to the challenge of this task. We provide a promising baseline model and identify clear areas for further improvement. We release our dataset, code, and baseline models publicly to encourage the NLP community to participate in this task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题