通过对话框为场景图代补充丢失的视觉

论文标题

通过对话框为场景图代补充丢失的视觉

Supplementing Missing Visions via Dialog for Scene Graph Generations

论文作者

Zhao, Zhenghao, Zhu, Ye, Zhu, Xiaoguang, Shang, Yuzhang, Yan, Yan

论文摘要

当前的大多数AI系统都依赖于输入视觉数据足以在各种计算机视觉任务中实现竞争性能的前提。但是，经典的任务设置很少考虑到具有挑战性但常见的实际情况，在这些情况下，由于各种原因（例如，视图范围和闭合），完整的视觉数据可能无法访问。为此，我们使用不完整的视觉输入数据研究了计算机视觉任务设置。具体来说，我们以各种视觉数据丢失为输入来利用场景图生成（SGG）任务。尽管视觉输入不足以直观地导致性能下降，但我们建议通过自然语言对话框互动来补充缺失的愿景，以更好地完成任务目标。我们设计了一个模型不合时宜的补充交互式对话框（SI-DIAL）框架，该框架可以与大多数现有模型共同学习，从而赋予当前的AI系统具有自然语言的问题 - 答案互动的能力。我们通过广泛的实验和分析，通过对多个基准线实现有希望的绩效提高，证明了这种任务设置的可行性以及我们提出的对话模块作为补充信息源的可行性。

Most current AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various computer vision tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a computer vision task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input. While insufficient visual input intuitively leads to performance drop, we propose to supplement the missing visions via the natural language dialog interactions to better accomplish the task objective. We design a model-agnostic Supplementary Interactive Dialog (SI-Dial) framework that can be jointly learned with most existing models, endowing the current AI systems with the ability of question-answer interactions in natural language. We demonstrate the feasibility of such a task setting with missing visual input and the effectiveness of our proposed dialog module as the supplementary information source through extensive experiments and analysis, by achieving promising performance improvement over multiple baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题