论文标题
全面卷积网络连续手语识别
Fully Convolutional Networks for Continuous Sign Language Recognition
论文作者
论文摘要
连续的手语识别(SLR)是一项具有挑战性的任务,需要在签名框架序列的空间和时间维度上学习。最近的工作通过使用CNN和RNN混合网络来实现这一目标。但是,培训这些网络通常是不平凡的,其中大多数无法学习看不见的序列模式,从而导致在线识别的性能不令人满意。在本文中,我们提出了一个完全卷积网络(FCN),以使在线SLR同时从弱注释的视频序列中学习空间和时间特征,仅给出句子级注释。在拟议的网络中引入了光泽功能增强(GFE)模块,以实施更好的序列对准学习。提出的网络是无需任何预训练的端到端训练。我们在两个大型SLR数据集上进行实验。实验表明,我们连续SLR的方法有效,并且在在线识别方面表现良好。
Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences. Most recent work accomplishes this by using CNN and RNN hybrid networks. However, training these networks is generally non-trivial, and most of them fail in learning unseen sequence patterns, causing an unsatisfactory performance for online recognition. In this paper, we propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences with only sentence-level annotations given. A gloss feature enhancement (GFE) module is introduced in the proposed network to enforce better sequence alignment learning. The proposed network is end-to-end trainable without any pre-training. We conduct experiments on two large scale SLR datasets. Experiments show that our method for continuous SLR is effective and performs well in online recognition.