论文标题

通过注入话语依赖性来改善主题细分

Improving Topic Segmentation by Injecting Discourse Dependencies

论文作者

Xing, Linzi, Huber, Patrick, Carenini, Giuseppe

论文摘要

最近的神经监督主题分割模型具有优于无监督方法的优势,并从Wikipedia采样了大规模培训语料库。但是,这些模型可能会因利用简单的语言线索进行预测而引起的鲁棒性和可传递性有限,但忽略了更重要的索引间局部一致性。为了解决这个问题,我们提出了一种语言感知的神经主题细分模型,并注入了句子上的话语依赖性结构,以鼓励模型使主题边界预测更多地基于句子之间的局部一致性。我们对英语评估数据集的实证研究表明,通过我们建议的策略将上述句子论述结构注入神经主题分段者可以大大改善其在域内和外域数据上的性能,而模型的复杂性很小。

Recent neural supervised topic segmentation models achieve distinguished superior effectiveness over unsupervised methods, with the availability of large-scale training corpora sampled from Wikipedia. These models may, however, suffer from limited robustness and transferability caused by exploiting simple linguistic cues for prediction, but overlooking more important inter-sentential topical consistency. To address this issue, we present a discourse-aware neural topic segmentation model with the injection of above-sentence discourse dependency structures to encourage the model make topic boundary prediction based more on the topical consistency between sentences. Our empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter with our proposed strategy can substantially improve its performances on intra-domain and out-of-domain data, with little increase of model's complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源