论文标题

部分可观测时空混沌系统的无模型预测

Transformer Module Networks for Systematic Generalization in Visual Question Answering

论文作者

Yamada, Moyuru, D'Amario, Vanessa, Takemoto, Kentaro, Boix, Xavier, Sasaki, Tomotake

论文摘要

变形金刚在视觉问题回答(VQA)上取得了出色的性能。但是,它们的系统概括能力,即处理已知概念的新颖组合,尚不清楚。我们揭示了神经模块网络(NMN),即,即使NMN的模块是基于CNN的模块,即使与常规变形金刚相比,要解决子任务,更好或类似的系统概括性能的模块的特定于问题的组成。为了解决针对NMN的变压器的缺点,在本文中,我们研究了模块化是否以及如何为变压器带来好处。也就是说,我们引入了基于变压器模块组成的新型NMN变压器模块网络(TMN)。 TMN在三个VQA数据集中实现了最新的系统概括性能,比标准变压器提高了30%以上的子任务组成。我们表明,不仅模块组成,而且每个子任务的模块专业化都是这种性能增益的关键。

Transformers achieve great performance on Visual Question Answering (VQA). However, their systematic generalization capabilities, i.e., handling novel combinations of known concepts, is unclear. We reveal that Neural Module Networks (NMNs), i.e., question-specific compositions of modules that tackle a sub-task, achieve better or similar systematic generalization performance than the conventional Transformers, even though NMNs' modules are CNN-based. In order to address this shortcoming of Transformers with respect to NMNs, in this paper we investigate whether and how modularity can bring benefits to Transformers. Namely, we introduce Transformer Module Network (TMN), a novel NMN based on compositions of Transformer modules. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, improving more than 30% over standard Transformers for novel compositions of sub-tasks. We show that not only the module composition but also the module specialization for each sub-task are the key of such performance gain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源