论文标题
用句子质量的控制信号改进图像字幕
Improving Image Captioning with Control Signal of Sentence Quality
论文作者
论文摘要
在图像字幕的数据集中,每个图像都与几个描述对齐。尽管这些描述的质量有所不同,但现有字幕模型在培训过程中平等对待它们。在本文中,我们提出了一个新的句子质量控制信号,该信号被视为字幕模型的附加输入。通过集成控制信号信息,字幕模型可以意识到目标句子的质量水平并以不同的方式处理它们。此外,我们提出了一种专门针对句子质量控制信号的新颖加强训练方法:面向质量的自称训练(Q-SAT)。在MSCOCO数据集上进行的广泛实验表明,没有地面真相标题的额外信息,由最高质量级别控制的模型优于基于准确性的评估指标的基线模型,从而验证了我们提出的方法的有效性。
In the dataset of image captioning, each image is aligned with several descriptions. Despite the fact that the quality of these descriptions varies, existing captioning models treat them equally in the training process. In this paper, we propose a new control signal of sentence quality, which is taken as an additional input to the captioning model. By integrating the control signal information, captioning models are aware of the quality level of the target sentences and handle them differently. Moreover, we propose a novel reinforcement training method specially designed for the control signal of sentence quality: Quality-oriented Self-Annotated Training (Q-SAT). Extensive experiments on MSCOCO dataset show that without extra information from ground truth captions, models controlled by the highest quality level outperform baseline models on accuracy-based evaluation metrics, which validates the effectiveness of our proposed methods.