注意单词嵌入

论文标题

注意单词嵌入

Attention Word Embedding

论文作者

Sonkar, Shashank, Waters, Andrew E., Baraniuk, Richard G.

论文摘要

单词嵌入模型学习单词的语义丰富的矢量表示，并广泛用于初始化自然处理语言（NLP）模型。 Word2Vec的流行连续袋（CBOW）模型通过掩盖句子中的给定单词，然后使用其他单词作为预测的上下文来学习一个矢量嵌入。 CBOW的一个限制是，在做出效率低下的预测时，它同样会加权上下文单词，因为某些单词比其他单词具有更高的预测价值。我们通过引入注意单词嵌入（AWE）模型来解决这种低效率，该模型将注意力机制集成到CBOW模型中。我们还提出了AWE-S，其中包含子词信息。我们证明，敬畏和敬畏 - 表现在各种单词相似性数据集上的最新单词嵌入模型以及用于NLP模型的初始化时。

Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题