场景文本图像野外超级分辨率

论文标题

场景文本图像野外超级分辨率

Scene Text Image Super-Resolution in the Wild

论文作者

Wang, Wenjia, Xie, Enze, Liu, Xuebo, Wang, Wenhai, Liang, Ding, Shen, Chunhua, Bai, Xiang

论文摘要

低分辨率文本图像通常在自然场景中看到，例如手机捕获的文档。识别低分辨率文本图像是具有挑战性的，因为它们会失去详细的内容信息，从而导致识别准确性不佳。直观的解决方案是将超分辨率（SR）技术引入预处理。但是，以前的单个图像超分辨率（SISR）方法是在合成的低分辨率图像（例如Bicubibic向下采样）上训练的，该图像简单，不适合真正的低分辨率文本识别。为此，我们为一个真实的场景文本SR数据集，称为TextZoom。它包含配对的真实低分辨率和高分辨率图像，这些图像由野生焦距不同的摄像机捕获。如图1所示，它比合成数据更真实和具有挑战性。我们认为即兴表明，识别精度是场景文本SR的最终目标。为此，开发了一个新的文本超分辨率网络，称为TSRN，并开发了三个新型模块。（1）提出了一个顺序残差块来提取文本图像的顺序信息。（2）边界感知的损失旨在锐化角色边界。（3）提出了一个中央对齐模块，以缓解文本Zoom中的未对准问题。在TextZoom上进行的广泛实验表明，与合成SR数据相比，我们的TSRN在很大程度上将识别精度提高了13％以上，而Aster和Moran近9.0％。此外，我们的TSRN在提高TextZoom中LR图像的识别精度方面显然优于7个最先进的SR方法。例如，在Aster和CRNN的识别精度上，它的表现优于LAPSRN超过5％和8％。我们的结果表明，野外的低分辨率文本识别远非解决，因此需要更多的研究工作。

Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An intuitive solution is to introduce super-resolution (SR) techniques as pre-processing. However, previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images (e.g.Bicubic down-sampling), which is simple and not suitable for real low-resolution text recognition. To this end, we pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images which are captured by cameras with different focal length in the wild. It is more authentic and challenging than synthetic data, as shown in Fig. 1. We argue improv-ing the recognition accuracy is the ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution Network termed TSRN, with three novel modules is developed. (1) A sequential residual block is proposed to extract the sequential information of the text images. (2) A boundary-aware loss is designed to sharpen the character boundaries. (3) A central alignment module is proposed to relieve the misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore, our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the recognition accuracy of LR images in TextZoom. For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our results suggest that low-resolution text recognition in the wild is far from being solved, thus more research effort is needed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题