论文标题
使用数据增强技术构建韩国手语增强(Kosla)语料库
Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique
论文作者
论文摘要
我们提出了用于手语翻译的语料库的有效框架。在简单但戏剧性的数据增强技术的帮助下,我们的方法将文本转换为带有最小信息丢失的注释形式。符号语言由手动信号,非手动信号和标志性功能组成。根据专业手语口译员的说法,非手动信号(例如面部表情和手势)在传达确切含义方面起着重要作用。通过考虑手语的语言特征,我们提出的框架是构建包含手动和非手动方式的多模式手语增强语料库(以下称为Kosla语料库)的第一次且独特的尝试。我们建造的语料库在医院环境中表现出了自信的结果,显示了增强数据集的性能得到改善。为了克服数据稀缺性,我们诉诸于数据增强技术,例如同义词替代技术,以提高我们的翻译模型和可用数据的效率,同时保持手语的语法和语义结构。为了进行实验支持,我们通过在两个引物器上的正常句子和手语注释之间执行翻译任务来验证数据增强技术和我们语料库的有用性的有效性。结果是令人信服的,证明了Kosla语料库的BLEU得分很重要。
We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expressions and gestures play an important role in conveying exact meaning. By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus (hereinafter referred to as the KoSLA corpus) containing both manual and non-manual modalities. The corpus we built demonstrates confident results in the hospital context, showing improved performance with augmented datasets. To overcome data scarcity, we resorted to data augmentation techniques such as synonym replacement to boost the efficiency of our translation model and available data, while maintaining grammatical and semantic structures of sign language. For the experimental support, we verify the effectiveness of data augmentation technique and usefulness of our corpus by performing a translation task between normal sentences and sign language annotations on two tokenizers. The result was convincing, proving that the BLEU scores with the KoSLA corpus were significant.