通过预测故事中的锚词嵌入的视觉讲故事

论文标题

通过预测故事中的锚词嵌入的视觉讲故事

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

论文作者

Zhang, Bowen, Hu, Hexiang, Sha, Fei

论文摘要

我们为视觉讲故事的任务提出了学习模型。主要思想是从图像中预测锚词嵌入，并使用嵌入和图像特征共同生成叙事句子。我们将随机采样名词的嵌入从地面图案中作为目标锚词嵌入来学习预测指标。为了叙述一系列图像，我们将预测的锚词嵌入和图像特征用作SEQ2SEQ模型的关节输入。与最先进的方法相比，所提出的模型在设计方面很简单，易于优化，并且在大多数自动评估指标中获得最佳结果。在人类评估中，该方法还优于竞争方法。

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings of randomly sampled nouns from the groundtruth stories as the target anchor word embeddings to learn the predictor. To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model. As opposed to state-of-the-art methods, the proposed model is simple in design, easy to optimize, and attains the best results in most automatic evaluation metrics. In human evaluation, the method also outperforms competing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题