迈向统一的密钥帧传播模型

论文标题

迈向统一的密钥帧传播模型

Towards Unified Keyframe Propagation Models

论文作者

Esser, Patrick, Michael, Peter, Sengupta, Soumyadip

论文摘要

许多视频编辑任务，例如旋转镜或对象去除，都需要跨帧的上下文传播。尽管在全球范围内汇总特征的变压器和其他基于注意力的方法在传播从关键帧到整个视频的对象掩盖方面取得了巨大的成功，但它们努力地传播高频细节，例如忠实的纹理。我们假设这是由于全球关注对低频特征的固有偏见。为了克服这一限制，我们提出了一种两流的方法，其中高频特征在本地相互作用和低频特征在全球范围内相互作用。在诸如大型摄像机运动之类的艰难情况下，全局互动流仍然强大，而显式对齐失败。局部交互流通过可变形的特征聚合传播高频细节，并在全局交互流中得知，学会了检测和纠正变形场的错误。我们评估了我们的两流方法的介入任务，其中实验表明，它既改善了图像插入所需的单个帧中特征的传播，又要改善它们从关键框架到目标框架的传播。应用于视频介绍，我们的方法在FID和LPIPS分数方面提高了44％和26％。 https://github.com/runwayml/guided-inpainting上的代码

Many video editing tasks such as rotoscoping or object removal require the propagation of context across frames. While transformers and other attention-based approaches that aggregate features globally have demonstrated great success at propagating object masks from keyframes to the whole video, they struggle to propagate high-frequency details such as textures faithfully. We hypothesize that this is due to an inherent bias of global attention towards low-frequency features. To overcome this limitation, we present a two-stream approach, where high-frequency features interact locally and low-frequency features interact globally. The global interaction stream remains robust in difficult situations such as large camera motions, where explicit alignment fails. The local interaction stream propagates high-frequency details through deformable feature aggregation and, informed by the global interaction stream, learns to detect and correct errors of the deformation field. We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inpainting, as well as their propagation from keyframes to target frames. Applied to video inpainting, our approach leads to 44% and 26% improvements in FID and LPIPS scores. Code at https://github.com/runwayml/guided-inpainting

下载PDF全文

下载文献需遵守相关版权规定

论文标题