图像字幕和上下文感知辅助指南

论文标题

图像字幕和上下文感知辅助指南

Image Captioning with Context-Aware Auxiliary Guidance

论文作者

Song, Zeliang, Zhou, Xiaofei, Mao, Zhendong, Tan, Jianlong

论文摘要

图像字幕是一项具有挑战性的计算机视觉任务，旨在生成图像的自然语言描述。最新的研究遵循编码器框架，该框架在很大程度上取决于当前预测的先前生成的单词。这种方法无法有效利用未来的预测信息来学习完整的语义。在本文中，我们提出了可以指导字幕模型感知全球环境的上下文感知辅助指南（CAAG）机制。在字幕模型上，CAAG执行语义注意力，有选择地关注全局预测的有用信息，以重现当前一代。为了验证该方法的适应性，我们将CAAG应用于三个受欢迎的字幕者，我们的提案在具有挑战性的Microsoft Coco Image图像字幕上实现了竞争性能，例如132.2 karpathy拆分上的苹果-D分数和官方在线评估服务器上的130.7 Cider-D（C40）得分。

Image captioning is a challenging computer vision task, which aims to generate a natural language description of an image. Most recent researches follow the encoder-decoder framework which depends heavily on the previous generated words for the current prediction. Such methods can not effectively take advantage of the future predicted information to learn complete semantics. In this paper, we propose Context-Aware Auxiliary Guidance (CAAG) mechanism that can guide the captioning model to perceive global contexts. Upon the captioning model, CAAG performs semantic attention that selectively concentrates on useful information of the global predictions to reproduce the current generation. To validate the adaptability of the method, we apply CAAG to three popular captioners and our proposal achieves competitive performance on the challenging Microsoft COCO image captioning benchmark, e.g. 132.2 CIDEr-D score on Karpathy split and 130.7 CIDEr-D (c40) score on official online evaluation server.

下载PDF全文

下载文献需遵守相关版权规定

论文标题