冷融合：校准和序数潜在分布融合，用于不确定性感知多模式情绪识别

论文标题

冷融合：校准和序数潜在分布融合，用于不确定性感知多模式情绪识别

COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

论文作者

Tellamekala, Mani Kumar, Amiriparian, Shahin, Schuller, Björn W., André, Elisabeth, Giesbrecht, Timo, Valstar, Michel

论文摘要

自动识别面部和声音的明显情绪很难，部分原因是各种不确定性来源，包括输入数据和机器学习框架中使用的标签。本文介绍了一种不确定性感知的视听融合方法，该方法量化了对情绪预测的模态不确定性。为此，我们提出了一个新颖的融合框架，在该框架中，我们首先通过视听时间上下文向量分别学习潜在分布，然后限制单峰潜在分布的方差向量，以便它们代表每个模式的信息量，以提供W.R.T.情绪识别。特别是，我们对视听潜在分布的方差向量施加了校准和序数排名约束。当经过良好校准时，态度不确定性得分表明它们的相应预测可能与地面真实标签有多大不同。排名良好的不确定性得分允许在模式中对不同框架进行序数排名。为了共同施加这两种约束，我们提出了软马克斯分布匹配损失。在分类和回归设置中，我们将不确定性感知的融合模型与标准模型 - 静态融合基线进行了比较。我们对两个情绪识别语料库的评估AVEC 2019 CES和IEMOCAP表明，视听情感识别可以从良好校准和良好的潜在不确定性度量中受益匪浅。

Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction. To this end, we propose a novel fusion framework in which we first learn latent distributions over audiovisual temporal context vectors separately, and then constrain the variance vectors of unimodal latent distributions so that they represent the amount of information each modality provides w.r.t. emotion recognition. In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions may differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across the modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. In both classification and regression settings, we compare our uncertainty-aware fusion model with standard model-agnostic fusion baselines. Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题