论文标题

title2event:使用大型中文标题数据集进行基准测试开放式活动提取

Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset

论文作者

Deng, Haolin, Zhang, Yanan, Zhang, Yangfan, Ying, Wangyang, Yu, Changlong, Gao, Jun, Wang, Wei, Bai, Xiaoling, Yang, Nan, Ma, Jin, Chen, Xiang, Zhou, Tianhua

论文摘要

事件提取(EE)对于下游任务(例如新的聚合和事件知识图构建)至关重要。大多数现有的EE数据集手动为每个数据集定义了固定的事件类型和设计特定的架构,但未能涵盖从在线文本中出现的各种事件。此外,新闻标题是事件提及的重要来源,在当前的EE研究中没有得到足够的关注。在本文中,我们介绍了Title2Event,这是一个大规模句子级数据集进行开放事件提取的基准,而无需限制事件类型。 Title2event在中国网页收集的34个主题中包含42,000多个新闻冠军。据我们所知,目前,它是开放活动提取的最大手动通知的中国数据集。我们进一步对Title2Event进行了不同模型的实验,并表明标题的特征使其在事件提取方面具有挑战性,从而解决了高级研究对此问题的重要性。数据集和基线代码可在https://open-event-hub.github.io/title2event上找到。

Event extraction (EE) is crucial to downstream tasks such as new aggregation and event knowledge graph construction. Most existing EE datasets manually define fixed event types and design specific schema for each of them, failing to cover diverse events emerging from the online text. Moreover, news titles, an important source of event mentions, have not gained enough attention in current EE research. In this paper, We present Title2Event, a large-scale sentence-level dataset benchmarking Open Event Extraction without restricting event types. Title2Event contains more than 42,000 news titles in 34 topics collected from Chinese web pages. To the best of our knowledge, it is currently the largest manually-annotated Chinese dataset for open event extraction. We further conduct experiments on Title2Event with different models and show that the characteristics of titles make it challenging for event extraction, addressing the significance of advanced study on this problem. The dataset and baseline codes are available at https://open-event-hub.github.io/title2event.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源