论文标题
通过循环双侧相互作用基于短语的负担能力检测
Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
论文作者
论文摘要
负担得起的检测是指在图像中感知具有潜在动作可能性的物体,这是一项艰巨的任务,因为可能的负担能力取决于现实世界应用程序中人的目的。现有作品主要从图像/视频中提取固有的人类对象依赖性,以适应动态变化的负担性能。在本文中,我们从视觉语言的角度探索了可承受的能力,并考虑基于挑战性的基于短语的负担检测问题,即给定描述动作目的的一组短语,应检测到具有相同负担的场景中的所有对象区域。为此,我们提出了一个环状双边一致性增强网络(CBCE-net),以逐渐使语言和视觉特征保持一致。具体而言,提出的CBCE-NET由一个相互指导的视觉语言模块组成,该模块以渐进的方式更新视觉和语言的共同特征,以及一个循环相互作用模块(CIM),以促进以环状方式与对象进行可能相互作用的感知。此外,我们通过用简短的短语注释负担能力类别扩展了公共目的驱动的负担数据集(PAD)。对比实验结果表明,就客观指标和视觉质量而言,我们方法比来自四个相关领域的九种典型方法的优越性。相关代码和数据集将在\ url {https://github.com/lulsheng/cbce-net}发布。
Affordance detection, which refers to perceiving objects with potential action possibilities in images, is a challenging task since the possible affordance depends on the person's purpose in real-world application scenarios. The existing works mainly extract the inherent human-object dependencies from image/video to accommodate affordance properties that change dynamically. In this paper, we explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem,i.e., given a set of phrases describing the action purposes, all the object regions in a scene with the same affordance should be detected. To this end, we propose a cyclic bilateral consistency enhancement network (CBCE-Net) to align language and vision features progressively. Specifically, the presented CBCE-Net consists of a mutual guided vision-language module that updates the common features of vision and language in a progressive manner, and a cyclic interaction module (CIM) that facilitates the perception of possible interaction with objects in a cyclic manner. In addition, we extend the public Purpose-driven Affordance Dataset (PAD) by annotating affordance categories with short phrases. The contrastive experimental results demonstrate the superiority of our method over nine typical methods from four relevant fields in terms of both objective metrics and visual quality. The related code and dataset will be released at \url{https://github.com/lulsheng/CBCE-Net}.