论文标题
在视觉问题回答中正规化注意网络以进行异常检测
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
论文作者
论文摘要
为了使现实应用应用的稳定性和可靠性,已经评估了单峰任务中DNN的鲁棒性。但是,很少有研究考虑异常情况,即视觉问题回答(VQA)模型在现实世界中部署后可能会遇到的。在这项研究中,我们将最新的VQA模型的鲁棒性评估为五个不同的异常情况,包括最差的场景,最常见的场景以及当前VQA模型的限制。与单峰任务的结果不同,VQA模型中答案的最大置信度无法检测到异常输入,而在VQA模型中,输出后训练后训练(例如离群暴露)无效。因此,我们提出了一种基于注意力的方法,该方法使用输入图像和问题之间的推理信心,并且比单峰任务中的先前方法显示出更有希望的结果。此外,我们表明,注意力网络的最大熵正则化可以显着改善基于注意力的VQA模型的异常检测。得益于简单性,基于注意力的异常检测和正则化是模型不合时宜的方法,可用于最先进的VQA模型中的各种跨模式关注。结果表明,VQA中的跨模式关注对于提高VQA准确性也很重要,而且对各种异常的鲁棒性也很重要。
For stability and reliability of real-world applications, the robustness of DNNs in unimodal tasks has been evaluated. However, few studies consider abnormal situations that a visual question answering (VQA) model might encounter at test time after deployment in the real-world. In this study, we evaluate the robustness of state-of-the-art VQA models to five different anomalies, including worst-case scenarios, the most frequent scenarios, and the current limitation of VQA models. Different from the results in unimodal tasks, the maximum confidence of answers in VQA models cannot detect anomalous inputs, and post-training of the outputs, such as outlier exposure, is ineffective for VQA models. Thus, we propose an attention-based method, which uses confidence of reasoning between input images and questions and shows much more promising results than the previous methods in unimodal tasks. In addition, we show that a maximum entropy regularization of attention networks can significantly improve the attention-based anomaly detection of the VQA models. Thanks to the simplicity, attention-based anomaly detection and the regularization are model-agnostic methods, which can be used for various cross-modal attentions in the state-of-the-art VQA models. The results imply that cross-modal attention in VQA is important to improve not only VQA accuracy, but also the robustness to various anomalies.