论文标题

海报:用于面部表达识别的金字塔交叉融合变压器网络

POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition

论文作者

Zheng, Ce, Mendieta, Matias, Chen, Chen

论文摘要

面部表情识别(FER)是计算机视觉中的重要任务,在人类计算机互动,教育,医疗保健和在线监控等领域具有实际应用。在这项具有挑战性的FER任务中,有三个关键问题,尤其是普遍存在:类间相似性,类内部差异和规模敏感性。尽管现有作品通常解决了其中一些问题,但没有一个在统一框架中完全解决了所有三个挑战。在本文中,我们提出了一个两流金字塔交叉融合变压器网络(海报),该网络旨在整体解决这三个问题。具体而言,我们设计了一种基于变压器的交叉融合方法,该方法可以有效地协作面部地标特征和图像功能,以最大程度地关注显着面部区域。此外,海报采用金字塔结构来促进规模不变性。广泛的实验结果表明,我们的海报在RAF-DB(92.05%),Ferplus(91.62%)以及AffectNet 7类(67.31%)和8类(63.34%)上获得了新的最先进结果。该代码可在https://github.com/zczcwh/poster上找到。

Facial expression recognition (FER) is an important task in computer vision, having practical applications in areas such as human-computer interaction, education, healthcare, and online monitoring. In this challenging FER task, there are three key issues especially prevalent: inter-class similarity, intra-class discrepancy, and scale sensitivity. While existing works typically address some of these issues, none have fully addressed all three challenges in a unified framework. In this paper, we propose a two-stream Pyramid crOss-fuSion TransformER network (POSTER), that aims to holistically solve all three issues. Specifically, we design a transformer-based cross-fusion method that enables effective collaboration of facial landmark features and image features to maximize proper attention to salient facial regions. Furthermore, POSTER employs a pyramid structure to promote scale invariance. Extensive experimental results demonstrate that our POSTER achieves new state-of-the-art results on RAF-DB (92.05%), FERPlus (91.62%), as well as AffectNet 7 class (67.31%) and 8 class (63.34%). The code is available at https://github.com/zczcwh/POSTER.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源