单词级的细粒故事可视化

论文标题

单词级的细粒故事可视化

Word-Level Fine-Grained Story Visualization

论文作者

Li, Bowen, Lukasiewicz, Thomas

论文摘要

故事可视化旨在生成一系列图像，以叙述一个多句子故事中的每个句子，在动态场景和角色之间具有全球一致性。当前的作品仍然与输出图像的质量和一致性相处，并依靠其他语义信息或辅助字幕网络。为了应对这些挑战，我们首先引入了一个新的句子表示，该句子将所有故事句子中的单词信息结合在一起，以减轻不一致的问题。然后，我们提出了一个具有融合功能的新歧视者，并进一步扩大了空间关注，以提高图像质量和故事的一致性。与最先进的方法相比，在不同数据集和人类评估上进行的广泛实验表明，我们的方法的出色性能既不使用分割掩码也不使用辅助字幕网络。

Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters. Current works still struggle with output images' quality and consistency, and rely on additional semantic information or auxiliary captioning networks. To address these challenges, we first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem. Then, we propose a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency. Extensive experiments on different datasets and human evaluation demonstrate the superior performance of our approach, compared to state-of-the-art methods, neither using segmentation masks nor auxiliary captioning networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题