论文标题
学习时空频率转换器的低质量视频超分辨率
Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution
论文作者
论文摘要
视频超分辨率(VSR)旨在恢复低分辨率(LR)视频的高分辨率(HR)视频。现有的VSR技术通常通过从附近具有已知降解过程的框架中提取相关纹理来恢复HR框架。尽管取得了重大进展,但仍在有效地提取和传递高质量的低质量序列(例如模糊,添加噪声和压缩伪像)的高质量纹理。在这项工作中,提出了一种新型的频率转换器(FTVSR),用于处理在合并的时空频率域中进行自我注意的低质量视频。首先,将视频帧分为贴片,每个贴片都会转换为光谱图,每个频道代表一个频段。它允许在每个频带上进行细粒度的自我注意,因此可以将真实的视觉纹理与伪影区分开。其次,提出了一种新型的双重频率注意(DFA)机制来捕获全球频率关系和局部频率关系,该机制可以处理现实世界中不同复杂的降解过程。第三,我们探索了在频域中进行视频处理的不同自我注意力方案,并发现``划分的注意力''在应用时间频率关注之前会引起关节空间的关注,从而带来了最佳的视频增强质量。在三个广泛使用的VSR数据集上进行的广泛实验表明,FTVSR在不同质量的视频上的最先进方法具有清晰的视觉边缘。代码和预训练模型可在https://github.com/researchmm/ftvsr上找到。
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.