MetaClue：迈向全面的视觉隐喻研究

论文标题

MetaClue：迈向全面的视觉隐喻研究

MetaCLUE: Towards Comprehensive Visual Metaphors Research

论文作者

Akula, Arjun R., Driscoll, Brendan, Narayana, Pradyumna, Changpinyo, Soravit, Jia, Zhiwei, Damle, Suyash, Pruthi, Garima, Basu, Sugato, Guibas, Leonidas, Freeman, William T., Li, Yuanzhen, Jampani, Varun

论文摘要

创造力是人类认知必不可少的一部分，也是我们如何理解世界的固有部分。隐喻的抽象是通过抽象概念（例如感觉）之间细微的关系来传达创意思想的基础。尽管计算机视觉基准和方法主要集中在理解和生成图像的字面解释上，但对图像的隐喻理解仍然相对尚未探索。为了实现这一目标，我们介绍了MetaClue，这是一套有关视觉隐喻的视觉任务。我们还收集高质量和丰富的隐喻注释（抽象对象，概念，关系以及相应的对象框），因为没有任何促进这些任务评估的数据集。我们根据我们的注释对视觉和语言的最先进模型进行了全面分析，强调了视觉隐喻分类，本地化，理解（检索，问题答案，字幕）和发电（文本对图像综合）任务中当前方法的优势和缺点。我们希望这项工作能为开发具有类似人类创意能力的AI系统的具体步骤。

Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. We also collect high-quality and rich metaphor annotations (abstract objects, concepts, relationships along with their corresponding object boxes) as there do not exist any datasets that facilitate the evaluation of these tasks. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations, highlighting strengths and weaknesses of current approaches in visual metaphor Classification, Localization, Understanding (retrieval, question answering, captioning) and gEneration (text-to-image synthesis) tasks. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题