论文标题
反事实循环一致的学习,以进行教学跟随和在视觉导航中产生
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
论文作者
论文摘要
自从视觉导航(VLN)的兴起以来,在以下说明中取得了巨大进展 - 在说明的指导下建立了一个追随者,以导航环境。但是,对逆任务的关注要少得多:指令生成 - 学习说话者〜为导航路线生成基础描述。现有的VLN方法独立训练扬声器,并经常将其视为数据增强工具,以加强追随者,同时忽略丰富的跨任务关系。在这里,我们描述了一种同时学习这两个任务并利用其内在相关性以促进每种任务的方法:追随者法官是否正确解释了原始导航路线,反之亦然。不需要对齐的指令对,这种循环一致的学习方案是与标记数据定义的特定于任务的训练目标互补的,也可以通过未标记的路径应用(无需配对说明)。添加了另一个称为〜创建者的代理来生成反事实环境。它极大地改变了当前的场景,但留下了新颖的物品(对于执行原始说明至关重要)不变。因此,综合了更有益的培训场景,三个代理组成了强大的VLN学习系统。标准基准的广泛实验表明,我们的方法改善了各种追随者模型的性能,并产生了准确的导航指令。
Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions. However, far less attention has been paid to the inverse task: instruction generation -- learning a speaker~to generate grounded descriptions for navigation routes. Existing VLN methods train a speaker independently and often treat it as a data augmentation tool to strengthen the follower while ignoring rich cross-task relations. Here we describe an approach that learns the two tasks simultaneously and exploits their intrinsic correlations to boost the training of each: the follower judges whether the speaker-created instruction explains the original navigation route correctly, and vice versa. Without the need of aligned instruction-path pairs, such cycle-consistent learning scheme is complementary to task-specific training targets defined on labeled data, and can also be applied over unlabeled paths (sampled without paired instructions). Another agent, called~creator is added to generate counterfactual environments. It greatly changes current scenes yet leaves novel items -- which are vital for the execution of original instructions -- unchanged. Thus more informative training scenes are synthesized and the three agents compose a powerful VLN learning system. Extensive experiments on a standard benchmark show that our approach improves the performance of various follower models and produces accurate navigation instructions.