视频序列的未来框架预测

论文标题

视频序列的未来框架预测

Future Frame Prediction of a Video Sequence

论文作者

Kaur, Jasmeen, Das, Sukhendu

论文摘要

预测视频序列的未来框架一直是计算机视野对众多应用程序的高度兴趣的问题。预测，预测和理由的未来事件的能力是智力的本质，也是决策系统的主要目标之一，例如人机互动，机器人导航和自动驾驶。但是，挑战在于问题的模棱两可的性质，因为相同的输入视频拍摄可能会有多个将来的序列。一个天真的设计模型将多个可能的未来平均为一个模糊的预测。最近，两种不同的方法试图解决此问题，例如：（a）使用代表潜在随机性的潜在变量模型，以及（b）旨在产生更清晰图像的对抗训练的模型。潜在的变量模型通常会努力产生现实的结果，而受对抗训练的模型则不足以潜在变量，因此未能产生各种预测。这些方法揭示了互补的优势和劣势。结合两种方法会产生看起来更现实的预测，并且更好地涵盖了合理的未来范围。这构成了该项目工作中研究的基础和目标。在本文中，我们提出了一种新型的多尺度架构，结合了两种方法。我们通过一系列有关移动MNIST，UCF101和PENN ACTION数据集的实验和经验评估来验证我们的模型。我们的方法优于使用基线方法获得的结果。

Predicting future frames of a video sequence has been a problem of high interest in the field of Computer Vision as it caters to a multitude of applications. The ability to predict, anticipate and reason about future events is the essence of intelligence and one of the main goals of decision-making systems such as human-machine interaction, robot navigation and autonomous driving. However, the challenge lies in the ambiguous nature of the problem as there may be multiple future sequences possible for the same input video shot. A naively designed model averages multiple possible futures into a single blurry prediction. Recently, two distinct approaches have attempted to address this problem as: (a) use of latent variable models that represent underlying stochasticity and (b) adversarially trained models that aim to produce sharper images. A latent variable model often struggles to produce realistic results, while an adversarially trained model underutilizes latent variables and thus fails to produce diverse predictions. These methods have revealed complementary strengths and weaknesses. Combining the two approaches produces predictions that appear more realistic and better cover the range of plausible futures. This forms the basis and objective of study in this project work. In this paper, we proposed a novel multi-scale architecture combining both approaches. We validate our proposed model through a series of experiments and empirical evaluations on Moving MNIST, UCF101, and Penn Action datasets. Our method outperforms the results obtained using the baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题