超越单词列表：迈向类似人类科学文档主题的抽象主题标签

论文标题

超越单词列表：迈向类似人类科学文档主题的抽象主题标签

Moving beyond word lists: towards abstractive topic labels for human-like topics of scientific documents

论文作者

Rosati, Domenic

论文摘要

主题模型代表文档组作为单词列表（主题标签）。这项工作询问是否可以开发出一种替代主题标签的方法，该方法比单词列表更接近主题的自然语言描述。为此，我们提出了一种使用抽象性多文章摘要（MDS）生成类似人类主题标签的方法。我们通过探索性案例研究调查了我们的方法。我们在引文句子中对主题进行建模，以了解需要做些进一步的研究以使MD充分运作主题标签。我们的案例研究表明，除了更类似人类的主题外，还可以通过使用聚类和摘要措施而不是主题模型度量来评估其他优点。但是，我们发现需要几个发展，然后才能设计出良好的研究以评估MD的MD，以完全进行主题建模。也就是说，改善集群的凝聚力，改善MD的事实和忠诚，并增加MD可能支持的文件数量。我们提出了一些关于如何解决这些问题并以有关如何使用主题建模来改善MD的想法的想法。

Topic models represent groups of documents as a list of words (the topic labels). This work asks whether an alternative approach to topic labeling can be developed that is closer to a natural language description of a topic than a word list. To this end, we present an approach to generating human-like topic labels using abstractive multi-document summarization (MDS). We investigate our approach with an exploratory case study. We model topics in citation sentences in order to understand what further research needs to be done to fully operationalize MDS for topic labeling. Our case study shows that in addition to more human-like topics there are additional advantages to evaluation by using clustering and summarization measures instead of topic model measures. However, we find that there are several developments needed before we can design a well-powered study to evaluate MDS for topic modeling fully. Namely, improving cluster cohesion, improving the factuality and faithfulness of MDS, and increasing the number of documents that might be supported by MDS. We present a number of ideas on how these can be tackled and conclude with some thoughts on how topic modeling can also be used to improve MDS in general.

下载PDF全文

下载文献需遵守相关版权规定

论文标题