论文标题
学习样本的重要性对于跨筛查视频时间基础
Learning Sample Importance for Cross-Scenario Video Temporal Grounding
论文作者
论文摘要
时间接地的任务旨在在未修剪的视频中找到视频时刻,并具有给定的句子查询。本文首次研究了一些针对时间基础任务的表面偏见,并提出了一种新颖的目标解决方案。最令人震惊的是,我们观察到,现有的时间基地模型在很大程度上依赖于视觉模态中的某些偏见(例如,对频繁概念或某些时间间隔的高偏爱)。当在跨筛查测试设置中概括该模型时,这会导致性能较低。为此,我们提出了一种名为Debias termal语言本地化(Debiastll)的新方法,以防止模型天真地记住偏见并强制执行基于真正模式间关系的查询句子。 Debias-tll同时训练两种型号。根据我们的设计,在判断样品时,这两个模型的预测的巨大差异显示出较高的样本可能性。利用信息性的差异,我们设计了一种数据重新延长数据偏见的数据。我们在跨阶段的时间接地中评估了提出的模型,在该基础上,火车 /测试数据是异质采购的。实验表明,与最先进的竞争者相比,提出的方法的巨大金色优势。
The task of temporal grounding aims to locate video moment in an untrimmed video, with a given sentence query. This paper for the first time investigates some superficial biases that are specific to the temporal grounding task, and proposes a novel targeted solution. Most alarmingly, we observe that existing temporal ground models heavily rely on some biases (e.g., high preference on frequent concepts or certain temporal intervals) in the visual modal. This leads to inferior performance when generalizing the model in cross-scenario test setting. To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship. Debias-TLL simultaneously trains two models. By our design, a large discrepancy of these two models' predictions when judging a sample reveals higher probability of being a biased sample. Harnessing the informative discrepancy, we devise a data re-weighing scheme for mitigating the data biases. We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced. Experiments show large-margin superiority of the proposed method in comparison with state-of-the-art competitors.