Super-Clevr：一种虚拟基准，用于诊断视觉推理中域鲁棒性

论文标题

Super-Clevr：一种虚拟基准，用于诊断视觉推理中域鲁棒性

Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

论文作者

Li, Zhuowan, Wang, Xingrui, Stengel-Eskin, Elias, Kortylewski, Adam, Ma, Wufei, Van Durme, Benjamin, Yuille, Alan

论文摘要

视觉问题回答（VQA）模型通常在分布数据和域概括方面的挣扎通常表现较差。由于该任务的多模式性质，变异的多个因素被交织在一起，因此很难分析概括。这促使我们引入了虚拟基准超级clevr，可以隔离VQA域移动中的不同因素，以便可以独立研究其效果。考虑了四个因素：视觉复杂性，问题冗余，概念分布和概念组成。借助可控的数据，Super-Clevr使我们能够在测试数据与沿每个轴的训练数据不同的情况下测试VQA方法。我们研究了四种现有方法，包括两种神经符号方法NSCL和NSVQA，以及两种非符号方法膜和MDETR；以及我们提出的方法，即概率NSVQA（P-NSVQA），它以不确定性推理扩展了NSVQA。 P-NSVQA在四个域移位因子中的三个方面都优于其他方法。我们的结果表明，解开推理和感知，结合概率不确定性，形成了强大的VQA模型，对域移动更强大。数据集和代码在https://github.com/lizw14/super-clevr上发布。

Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization. Due to the multi-modal nature of this task, multiple factors of variation are intertwined, making generalization difficult to analyze. This motivates us to introduce a virtual benchmark, Super-CLEVR, where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently. Four factors are considered: visual complexity, question redundancy, concept distribution and concept compositionality. With controllably generated data, Super-CLEVR enables us to test VQA methods in situations where the test data differs from the training data along each of these axes. We study four existing methods, including two neural symbolic methods NSCL and NSVQA, and two non-symbolic methods FiLM and mDETR; and our proposed method, probabilistic NSVQA (P-NSVQA), which extends NSVQA with uncertainty reasoning. P-NSVQA outperforms other methods on three of the four domain shift factors. Our results suggest that disentangling reasoning and perception, combined with probabilistic uncertainty, form a strong VQA model that is more robust to domain shifts. The dataset and code are released at https://github.com/Lizw14/Super-CLEVR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题