涅rv：具有自适应网络和自回归贴片建模的视频的神经隐式表示

论文标题

涅rv：具有自适应网络和自回归贴片建模的视频的神经隐式表示

NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling

论文作者

Maiya, Shishira R, Girish, Sharath, Ehrlich, Max, Wang, Hanyu, Lee, Kwot Sin, Poirson, Patrick, Wu, Pengxiang, Wang, Chen, Shrivastava, Abhinav

论文摘要

隐式神经表示（INR）最近已证明是高质量视频压缩的强大工具。但是，现有的作品是有限的，因为它们没有明确利用视频中的时间冗余，导致了很长的编码时间。此外，这些方法具有固定的体系结构，这些架构不会扩展到更长的视频或更高的分辨率。为了解决这些问题，我们提出了涅rv，它将视频视为框架组，并将视频符合单独的网络与每个小组进行贴片预测。该设计在空间和时间维度中共享每个组中的计算，从而减少了视频的编码时间。该视频表示形式是自动加压建模的，网络适合使用上一个组模型的权重初始化的当前组。为了进一步提高效率，我们在训练过程中对网络参数进行量化，不需要事后修剪或量化。与先前在基准UVG数据集上的作品相比，Nirvana将编码质量从37.36提高到37.70（以PSNR为单位），而编码速度则提高了12倍，同时保持相同的压缩率。与先前的视频INR作品相比，与更大的分辨率和更长的视频斗争相反，我们表明我们的算法具有很高的灵活性，并且由于其贴片和自动回归设计而自然而然地缩放。此外，我们的方法通过适应不同框架间运动的视频来实现可变的比特率压缩。 Nirvana可以通过更多的GPU实现6倍解码速度，并可以很好地缩放速度，从而实现了各种部署方案的实用性。

Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To further enhance efficiency, we perform quantization of the network parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12X, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm is highly flexible and scales naturally due to its patch-wise and autoregressive designs. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and scales well with more GPUs, making it practical for various deployment scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题