论文标题
自动部分识别ob告
Automatic Section Recognition in Obituaries
论文作者
论文摘要
ob告包含有关人们在时间和文化中的价值观的信息,这使它们成为探索文化历史的有用资源。它们通常是类似的结构,其部分与个人信息,传记素描,特征,家庭,感恩,致敬,葬礼信息和其他方面相对应。为了使这些信息用于进一步的研究,我们提出了一个识别这些部分的统计模型。为了实现这一目标,我们从TheDaily项目,Rememing.ca和伦敦自由出版社中收集了20058年英语ob告。对1008个ob告的三个注释者对我们的注释指南的评估表明,弗莱斯k = 0.87的实质性协议。卷积神经网络以自动分割任务的形式配制,具有微型F1 = 0.81的词袋和基于嵌入的Bilstms和Bilstm-CRF。
Obituaries contain information about people's values across times and cultures, which makes them a useful resource for exploring cultural history. They are typically structured similarly, with sections corresponding to Personal Information, Biographical Sketch, Characteristics, Family, Gratitude, Tribute, Funeral Information and Other aspects of the person. To make this information available for further studies, we propose a statistical model which recognizes these sections. To achieve that, we collect a corpus of 20058 English obituaries from TheDaily Item, Remembering.CA and The London Free Press. The evaluation of our annotation guidelines with three annotators on 1008 obituaries shows a substantial agreement of Fleiss k = 0.87. Formulated as an automatic segmentation task, a convolutional neural network outperforms bag-of-words and embedding-based BiLSTMs and BiLSTM-CRFs with a micro F1 = 0.81.