一个全部：通过动态推理的一阶段参考表达理解

论文标题

一个全部：通过动态推理的一阶段参考表达理解

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

论文作者

Zhang, Zhipeng, Wei, Zhimin, Huang, Zhongzhen, Niu, Rui, Wang, Peng

论文摘要

参考表达理解（REC）是视觉推理中最重要的任务之一，该任务需要模型来检测自然语言表达式所引用的目标对象。在拟议的管道中，自合并区域提案和选择阶段以来，单阶段参考表达理解（OSREC）已成为主要趋势。许多最新的OSREC模型采用多跳的推理策略，因为单个表达式中经常提到一系列对象，该对象需要多跳的推理来分析语义关系。但是，这些模型的一个未解决的问题是，在推理之前需要预先定义和固定推理的数量，而忽略了表达式的不同复杂性。在本文中，我们提出了一个动态的多步推理网络，该网络允许根据推理状态和表达复杂性动态调整推理步骤。具体来说，我们采用变压器模块来记住和处理推理状态和强化学习策略，以动态推断推理步骤。这项工作可在几个REC数据集上实现最新性能或重大改进，从reccoco（+，g）具有短表达式，到recRemoning，即具有长而复杂的组成表达式的数据集。

Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize & process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题