通过N-Gram序列分解和多任务学习增强手写文本识别

论文标题

通过N-Gram序列分解和多任务学习增强手写文本识别

Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning

论文作者

Tassopoulou, Vasiliki, Retsinas, George, Maragos, Petros

论文摘要

手写文本识别领域中的当前最新方法主要是单一任务，具有umigram，字符级别目标单位。在我们的工作中，我们利用了多任务学习方案，训练模型以从细度到粒度的目标单位（从细到粗糙）进行目标序列的分解。我们将此方法视为一种在训练过程中隐含地使用N-Gram信息的方法，而最终识别仅使用unigram输出执行。％为了强调这种多任务方法的内部杂物解码的差异突出了在训练步骤中由不同的N-gram施加的学习内部表示的能力。我们选择n-grams作为目标单元，并从umigrams到四克（即子字级粒度）进行实验。这些多个分解是从特定于任务的CTC损失的网络中学到的。关于网络体系结构，我们提出了两种替代方案，即层次结构和块多任务。总体而言，我们提出的模型即使仅对米格拉姆任务进行了评估，在贪婪解码中，绝对2.52 \％wer和1.02 \％cer的表现都超过了其对应的单任务，但在推理过程中而没有任何计算机间接费用，这暗示着成功强加了隐含的语言模型。

Current state-of-the-art approaches in the field of Handwritten Text Recognition are predominately single task with unigram, character level target units. In our work, we utilize a Multi-task Learning scheme, training the model to perform decompositions of the target sequence with target units of different granularity, from fine to coarse. We consider this method as a way to utilize n-gram information, implicitly, in the training process, while the final recognition is performed using only the unigram output. % in order to highlight the difference of the internal Unigram decoding of such a multi-task approach highlights the capability of the learned internal representations, imposed by the different n-grams at the training step. We select n-grams as our target units and we experiment from unigrams to fourgrams, namely subword level granularities. These multiple decompositions are learned from the network with task-specific CTC losses. Concerning network architectures, we propose two alternatives, namely the Hierarchical and the Block Multi-task. Overall, our proposed model, even though evaluated only on the unigram task, outperforms its counterpart single-task by absolute 2.52\% WER and 1.02\% CER, in the greedy decoding, without any computational overhead during inference, hinting towards successfully imposing an implicit language model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题