论文标题
UGC视频的基于深度学习的无参考质量评估模型
A Deep Learning based No-reference Quality Assessment Model for UGC Videos
论文作者
论文摘要
用户生成的内容(UGC)视频的质量评估在确保最终用户的观看体验方面起着重要作用。先前的UGC视频质量评估(VQA)研究使用图像识别模型或图像质量评估(IQA)模型来提取UGC视频的框架级特征,以进行质量回归,因为这些任务之间的域移动和UGC VQA任务之间的域变化,因此被视为优势解决方案。在本文中,我们提出了一个非常简单但有效的UGC VQA模型,该模型试图通过训练端到端的空间特征提取网络来解决此问题,以直接从视频框架的原始像素中直接学习质量吸引的空间特征表示。我们还提取运动特征,以测量空间特征无法建模的时间相关扭曲。提出的模型利用非常稀疏的框架来提取具有非常低空间分辨率的空间特征和致密帧(即视频块)来提取运动特征,从而提取运动功能,从而具有较低的计算复杂性。借助质量更好的功能,我们仅使用简单的多层感知层(MLP)网络将它们回归块级别的质量分数,然后采用时间平均池策略来获得视频级别的质量得分。我们进一步引入了一种多尺度质量融合策略,以解决不同空间分辨率的VQA问题,其中从人类视觉系统的对比敏感性函数中获得了多尺度权重。实验结果表明,所提出的模型在五个流行的UGC VQA数据库上实现了最佳性能,这证明了该模型的有效性。该代码将公开可用。
Quality assessment for User Generated Content (UGC) videos plays an important role in ensuring the viewing experience of end-users. Previous UGC video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of UGC videos for quality regression, which are regarded as the sub-optimal solutions because of the domain shifts between these tasks and the UGC VQA task. In this paper, we propose a very simple but effective UGC VQA model, which tries to address this problem by training an end-to-end spatial feature extraction network to directly learn the quality-aware spatial feature representation from raw pixels of the video frames. We also extract the motion features to measure the temporal-related distortions that the spatial features cannot model. The proposed model utilizes very sparse frames to extract spatial features and dense frames (i.e. the video chunk) with a very low spatial resolution to extract motion features, which thereby has low computational complexity. With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video-level quality score. We further introduce a multi-scale quality fusion strategy to solve the problem of VQA across different spatial resolutions, where the multi-scale weights are obtained from the contrast sensitivity function of the human visual system. The experimental results show that the proposed model achieves the best performance on five popular UGC VQA databases, which demonstrates the effectiveness of the proposed model. The code will be publicly available.