用伯特对口语和书面说明的抽象性摘要

论文标题

用伯特对口语和书面说明的抽象性摘要

Abstractive Summarization of Spoken and Written Instructions with BERT

论文作者

Savelieva, Alexandra, Au-Yeung, Bryan, Ramani, Vasanth

论文摘要

由于流动的自发性，流行和其他通常不会在书面文本中不会遇到的问题，因此语音的摘要是一个困难的问题。我们的工作介绍了Bertsum模型在对话语言中的首次应用。我们为从园艺和烹饪到软件配置和体育的各种主题进行了叙述的教学视频的抽象摘要。为了丰富词汇量，我们使用转移学习，并在书面和口语英语的一些大型跨域数据集上预处理模型。我们还对成绩单进行预处理，以恢复ASR系统输出中的句子分割和标点符号。对于How2和Wikihow数据集，使用Rouge和Content-F1评分来评估结果。我们让人类法官从策划的HOWTO100M和YouTube中随机选择的一组摘要。根据盲目评估，我们达到了一定的文本流利性和效用，接近人类内容创建者撰写的摘要。当应用于风格和主题差异很大的Wikihow文章时，该模型击败了当前的SOTA，同时在规范CNN/Dailymail数据集上没有显示性能回归。由于该模型在不同样式和域之间具有很高的推广性，因此它具有提高Internet内容的可访问性和可发现性的巨大潜力。我们设想将此集成为智能虚拟助手的功能，使他们能够根据要求总结书面和口头教学内容。

Summarization of speech is a difficult problem due to the spontaneity of the flow, disfluencies, and other issues that are not usually encountered in written texts. Our work presents the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics, from gardening and cooking to software configuration and sports. In order to enrich the vocabulary, we use transfer learning and pretrain the model on a few large cross-domain datasets in both written and spoken English. We also do preprocessing of transcripts to restore sentence segmentation and punctuation in the output of an ASR system. The results are evaluated with ROUGE and Content-F1 scoring for the How2 and WikiHow datasets. We engage human judges to score a set of summaries randomly selected from a dataset curated from HowTo100M and YouTube. Based on blind evaluation, we achieve a level of textual fluency and utility close to that of summaries written by human content creators. The model beats current SOTA when applied to WikiHow articles that vary widely in style and topic, while showing no performance regression on the canonical CNN/DailyMail dataset. Due to the high generalizability of the model across different styles and domains, it has great potential to improve accessibility and discoverability of internet content. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.

下载PDF全文

下载文献需遵守相关版权规定

论文标题