论文标题
语言结构指导的上下文建模用于参考图像分割
Linguistic Structure Guided Context Modeling for Referring Image Segmentation
论文作者
论文摘要
参考图像分割旨在预测自然语言句子所指对象的前景面具。句子的多模式上下文对于区分引用和背景至关重要。现有方法不足或冗余地对多模式上下文进行建模。为了解决这个问题,我们提出了一个“聚集 - 散布分布”方案,通过交叉模式相互作用对多模式上下文进行建模,并将该方案实现为一种新型的语言结构指导上下文建模(LSCM)模块。我们的LSCM模块构建了一个依赖性解析树抑制单词图(DPT-WG),该单词指导所有单词,包括句子的有效的多模式上下文,同时通过在多模式特征(即收集,约束的传播和分布式)上通过三个步骤排除了干扰的句子。对四个基准测试的广泛实验表明,我们的方法的表现优于所有先前最新的。
Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either insufficiently or redundantly model the multimodal context. To tackle this problem, we propose a "gather-propagate-distribute" scheme to model multimodal context by cross-modal interaction and implement this scheme as a novel Linguistic Structure guided Context Modeling (LSCM) module. Our LSCM module builds a Dependency Parsing Tree suppressed Word Graph (DPT-WG) which guides all the words to include valid multimodal context of the sentence while excluding disturbing ones through three steps over the multimodal feature, i.e., gathering, constrained propagation and distributing. Extensive experiments on four benchmarks demonstrate that our method outperforms all the previous state-of-the-arts.