Swin Transformer的面部表情识别

论文标题

Swin Transformer的面部表情识别

Facial Expression Recognition with Swin Transformer

论文作者

Kim, Jun-Hwa, Kim, Namho, Won, Chee Sun

论文摘要

识别人面部表情的任务在包括医疗保健和医疗领域在内的各种与人有关的系统中起着至关重要的作用。随着深度学习的最新成功和大量注释数据的可访问性，面部表达识别研究已经足够成熟，可以在具有视听数据集的真实情况下使用。在本文中，我们介绍了基于Swin Transformer的面部表达方法，以用于AFF-WILD2表达数据集的野外音频视频数据集。具体而言，我们采用三个流网络（即视觉流，时间流和音频流）来进行音频视频，以将多模式信息融合到面部表达识别中。 AFF-WILD2数据集的实验结果显示了我们提出的多模式方法的有效性。

The task of recognizing human facial expressions plays a vital role in various human-related systems, including health care and medical fields. With the recent success of deep learning and the accessibility of a large amount of annotated data, facial expression recognition research has been mature enough to be utilized in real-world scenarios with audio-visual datasets. In this paper, we introduce Swin transformer-based facial expression approach for an in-the-wild audio-visual dataset of the Aff-Wild2 Expression dataset. Specifically, we employ a three-stream network (i.e., Visual stream, Temporal stream, and Audio stream) for the audio-visual videos to fuse the multi-modal information into facial expression recognition. Experimental results on the Aff-Wild2 dataset show the effectiveness of our proposed multi-modal approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题