Sbert-WK：通过解剖基于Bert的单词模型的句子嵌入方法

论文标题

Sbert-WK：通过解剖基于Bert的单词模型的句子嵌入方法

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

论文作者

Wang, Bin, Kuo, C. -C. Jay

论文摘要

句子嵌入是自然语言处理（NLP）的重要研究主题，因为它可以将知识转移到下游任务。同时，一种称为伯特的上下文化单词表示，在相当多的NLP任务中实现了最新的性能。但是，从基于BERT的单词模型中生成高质量的句子表示是一个开放的问题。在先前的研究中表明，不同层的伯特层捕获了不同的语言特性。这使我们能够跨层融合信息以找到更好的句子表示。在这项工作中，我们研究了深层上下文化模型单词表示的层面模式。然后，我们通过通过单词表示范围跨越的空间来解剖基于BERT的单词模型来提出一种新的句子嵌入方法。它称为Sbert-WK方法。 Sbert-WK不需要进一步的培训。我们评估了Sbert-WK关于语义文本相似性和下游监督任务的评估。此外，提出了十个句子级探测任务以详细说明语言分析。实验表明，Sbert-WK实现了最先进的表现。我们的代码公开可用。

Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks. Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models. It was shown in previous study that different layers of BERT capture different linguistic properties. This allows us to fusion information across layers to find better sentence representation. In this work, we study the layer-wise pattern of the word representation of deep contextualized models. Then, we propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation. It is called the SBERT-WK method. No further training is required in SBERT-WK. We evaluate SBERT-WK on semantic textual similarity and downstream supervised tasks. Furthermore, ten sentence-level probing tasks are presented for detailed linguistic analysis. Experiments show that SBERT-WK achieves the state-of-the-art performance. Our codes are publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题