自动编码视频潜在的视频发电

论文标题

自动编码视频潜在的视频发电

Autoencoding Video Latents for Adversarial Video Generation

论文作者

Kasaraneni, Sai Hemanth

论文摘要

鉴于视频信号的三维复杂性，由于数据空间中涉及较大的随机性，训练鲁棒和基于GAN的视频生成模型是繁重的。学习数据的分解表示有助于提高鲁棒性并在抽样过程中提供控制。对于视频生成而言，这一领域的最新进展是将运动和外观视为正交信息，并设计有效地将它们拆开的体系结构。这些方法依赖于在发电机上强加结构先验的手工制作体系结构，以分解潜在空间中的外观和运动代码。受到基于自动编码器的图像生成的最新进展的启发，我们提出了Avlae（对抗视频潜在自动编码器），这是一个两流的潜在自动编码器，通过对抗性培训学习了视频分发。特别是，我们建议在对抗性设置中自动化视频生成器的运动和外观潜在向量。我们证明，即使没有发电机中明确的结构组成，我们的方法也会学会解开运动和外观代码。具有定性和定量结果的一些实验证明了我们方法的有效性。

Given the three dimensional complexity of a video signal, training a robust and diverse GAN based video generative model is onerous due to large stochasticity involved in data space. Learning disentangled representations of the data help to improve robustness and provide control in the sampling process. For video generation, there is a recent progress in this area by considering motion and appearance as orthogonal information and designing architectures that efficiently disentangle them. These approaches rely on handcrafting architectures that impose structural priors on the generator to decompose appearance and motion codes in the latent space. Inspired from the recent advancements in the autoencoder based image generation, we present AVLAE (Adversarial Video Latent AutoEncoder) which is a two stream latent autoencoder where the video distribution is learned by adversarial training. In particular, we propose to autoencode the motion and appearance latent vectors of the video generator in the adversarial setting. We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator. Several experiments with qualitative and quantitative results demonstrate the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题