论文标题
魔鬼在详细信息中:基于窗口的图像压缩的注意力
The Devil Is in the Details: Window-based Attention for Image Compression
论文作者
论文摘要
学到的图像压缩方法比经典图像压缩标准表现出较高的速率性能。大多数现有的博学图像压缩模型基于卷积神经网络(CNN)。尽管有很大的贡献,但基于CNN的模型的主要缺点是其结构并非旨在捕获本地冗余,尤其是非重复质地,这严重影响了重建质量。因此,如何充分利用全球结构和本地纹理成为基于学习的图像压缩的核心问题。受视觉变压器(VIT)和SWIN Transformer的最新进展的启发,我们发现将局部感知的注意机制与全球相关特征学习结合起来可以满足图像压缩中的期望。在本文中,我们首先广泛研究了多种注意力机制对本地特征学习的影响,然后引入更直接但有效的基于窗口的本地注意力障碍。提出的基于窗口的注意力非常灵活,可以用作插件组件,以增强CNN和变压器模型。此外,我们提出了一个新颖的对称变压器(STF)框架,在下采样编码器和上采样解码器中具有绝对变压器块。广泛的实验评估表明,所提出的方法是有效的,并且胜过最先进的方法。该代码可在https://github.com/googolxx/stf上公开获取。
Learned image compression methods have exhibited superior rate-distortion performance than classical image compression standards. Most existing learned image compression models are based on Convolutional Neural Networks (CNNs). Despite great contributions, a main drawback of CNN based model is that its structure is not designed for capturing local redundancy, especially the non-repetitive textures, which severely affects the reconstruction quality. Therefore, how to make full use of both global structure and local texture becomes the core problem for learning-based image compression. Inspired by recent progresses of Vision Transformer (ViT) and Swin Transformer, we found that combining the local-aware attention mechanism with the global-related feature learning could meet the expectation in image compression. In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block. The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models. Moreover, we propose a novel Symmetrical TransFormer (STF) framework with absolute transformer blocks in the down-sampling encoder and up-sampling decoder. Extensive experimental evaluations have shown that the proposed method is effective and outperforms the state-of-the-art methods. The code is publicly available at https://github.com/Googolxx/STF.