论文标题
基于多模式概率融合提示
Few-shot Multimodal Sentiment Analysis based on Multimodal Probabilistic Fusion Prompts
论文作者
论文摘要
由于社交媒体上的多模式内容的扩散,多模式情感分析引起了人们的重大关注。但是,该领域的现有研究很大程度上依赖于大规模监督数据,这是耗时且劳动密集型的收集数据。因此,有必要应对几种多模式分析的挑战。为了解决这个问题,我们提出了一种新的方法,称为多模式概率融合提示(多点),该提示(多点)在几个射击场景中利用不同模态的多种线索进行多模式情感检测。具体来说,我们首先引入一种称为CD的始终分布式采样方法,该方法可确保少量数据集具有与完整数据集相同的类别分布。与以前的方法主要使用基于文本模式的提示不同,我们设计了统一的多模式提示,以减少不同模态之间的差异,并动态地将多模式示范纳入每个多模式实例的上下文中。为了增强模型的鲁棒性,我们引入了一种概率融合方法,以融合每个输入的多个不同提示的输出预测。我们在六个数据集上进行的广泛实验证明了我们方法的有效性。首先,我们的方法在多模式的几杆设置中优于强大的基线。此外,在相同数量的数据(占完整数据集的1%)下,我们基于CD的实验结果显着优于基于先前采样的数据集,这些数据集是根据每个类的相同数量实例构建的。
Multimodal sentiment analysis has gained significant attention due to the proliferation of multimodal content on social media. However, existing studies in this area rely heavily on large-scale supervised data, which is time-consuming and labor-intensive to collect. Thus, there is a need to address the challenge of few-shot multimodal sentiment analysis. To tackle this problem, we propose a novel method called Multimodal Probabilistic Fusion Prompts (MultiPoint) that leverages diverse cues from different modalities for multimodal sentiment detection in the few-shot scenario. Specifically, we start by introducing a Consistently Distributed Sampling approach called CDS, which ensures that the few-shot dataset has the same category distribution as the full dataset. Unlike previous approaches primarily using prompts based on the text modality, we design unified multimodal prompts to reduce discrepancies between different modalities and dynamically incorporate multimodal demonstrations into the context of each multimodal instance. To enhance the model's robustness, we introduce a probabilistic fusion method to fuse output predictions from multiple diverse prompts for each input. Our extensive experiments on six datasets demonstrate the effectiveness of our approach. First, our method outperforms strong baselines in the multimodal few-shot setting. Furthermore, under the same amount of data (1% of the full dataset), our CDS-based experimental results significantly outperform those based on previously sampled datasets constructed from the same number of instances of each class.