振兴区域的特征，以使视频语言的培训民主化检索预培训

论文标题

振兴区域的特征，以使视频语言的培训民主化检索预培训

Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval

论文作者

Cai, Guanyu, Ge, Yixiao, Zhang, Binjie, Wang, Alex Jinpeng, Yan, Rui, Lin, Xudong, Shan, Ying, He, Lianghua, Qie, Xiaohu, Wu, Jianping, Shou, Mike Zheng

论文摘要

视频语言预训练（VLP）的最新主要方法以端到端的方式从原始像素中学习可转移表示形式，以在下游视频语言检索中实现高级性能。尽管取得了令人印象深刻的结果，但VLP的研究需要大量数据和较长的培训时间变得非常昂贵，从而阻止了进一步的探索。在这项工作中，我们振兴了稀疏采样视频剪辑的区域特征，以显着减少空间和时间视觉冗余，同时实现最先进的结果，以使VLP研究民主化。具体而言，为了充分探索区域特征的潜力，我们介绍了一种新型的双向区域对准正则正规化，该正则正规化适当地优化了句子中区域和某些单词之间的细粒度关系，从而消除了域/模态脱节的域/模式断开连接。四个数据集上下游视频检索任务的广泛结果证明了我们方法在有效性和效率上的优越性，\ textit {efextit {e.g。}，我们的方法获得了竞争结果，而少于85 \％的数据和85 \％的预先培训时间少80 \％，与最大的有效vlp方法相比，与远处的VLP方法相比，它的竞争时间较小。该代码将在\ url {https://github.com/showlab/demovlp}上提供。

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language retrieval. Despite the impressive results, VLP research becomes extremely expensive with the need for massive data and a long training time, preventing further explorations. In this work, we revitalize region features of sparsely sampled video clips to significantly reduce both spatial and temporal visual redundancy towards democratizing VLP research at the same time achieving state-of-the-art results. Specifically, to fully explore the potential of region features, we introduce a novel bidirectional region-word alignment regularization that properly optimizes the fine-grained relations between regions and certain words in sentences, eliminating the domain/modality disconnections between pre-extracted region features and text. Extensive results of downstream video-language retrieval tasks on four datasets demonstrate the superiority of our method on both effectiveness and efficiency, \textit{e.g.}, our method achieves competing results with 80\% fewer data and 85\% less pre-training time compared to the most efficient VLP method so far \cite{lei2021less}. The code will be available at \url{https://github.com/showlab/DemoVLP}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题