基于分层学习的分层视频语义编码用于细分

论文标题

基于分层学习的分层视频语义编码用于细分

Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

论文作者

Xie, Guangqi, Li, Xin, Lin, Shiqi, Zhang, Li, Zhang, Kai, Li, Yue, Chen, Zhibo

论文摘要

智能任务的快速发展，例如细分，检测，分类等，迫切需要语义压缩，旨在降低压缩成本，同时保持原始的语义信息。但是，将语义指标直接整合到传统的编解码器上是不切实际的，因为它们不能以端到端的方式进行优化。为了解决这个问题，一些开拓性的作品应用了加强学习以实现图像语义压缩。然而，自视频语义压缩以来，由于其复杂的参考体系结构和压缩模式。在本文中，我们迈出了视频语义压缩的一步，并提出了基于任务驱动的视频语义编码的层次结构增强学习，称为HRLVSC。具体而言，为了简化视频语义编码的复杂模式决策，我们以层次结构方式将动作空间分为帧级别和CTU级空间，然后通过框架级别和CTU级别的合作来逐步探索它们的最佳模式选择。此外，由于视频语义编码的模式将随着一组图片（GOP）的帧数成倍增加，因此我们仔细研究了不同模式选择对视频语义编码和设计的效果，并设计了一种简单但有效的模式简化策略。我们已经使用HEVC参考软件HM16.19验证了视频细分任务的HRLVSC。广泛的实验结果表明，在低延迟P配置下，我们的HRLVSC可以为视频语义编码实现超过39％的BD速率节省。

The rapid development of intelligent tasks, e.g., segmentation, detection, classification, etc, has brought an urgent need for semantic compression, which aims to reduce the compression cost while maintaining the original semantic information. However, it is impractical to directly integrate the semantic metric into the traditional codecs since they cannot be optimized in an end-to-end manner. To solve this problem, some pioneering works have applied reinforcement learning to implement image-wise semantic compression. Nevertheless, video semantic compression has not been explored since its complex reference architectures and compression modes. In this paper, we take a step forward to video semantic compression and propose the Hierarchical Reinforcement Learning based task-driven Video Semantic Coding, named as HRLVSC. Specifically, to simplify the complex mode decision of video semantic coding, we divided the action space into frame-level and CTU-level spaces in a hierarchical manner, and then explore the best mode selection for them progressively with the cooperation of frame-level and CTU-level agents. Moreover, since the modes of video semantic coding will exponentially increase with the number of frames in a Group of Pictures (GOP), we carefully investigate the effects of different mode selections for video semantic coding and design a simple but effective mode simplification strategy for it. We have validated our HRLVSC on the video segmentation task with HEVC reference software HM16.19. Extensive experimental results demonstrated that our HRLVSC can achieve over 39% BD-rate saving for video semantic coding under the Low Delay P configuration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题