论文标题

关于消费者金融保护局的主题建模数据:一种使用基于BERT的嵌入的方法

Topic Modelling on Consumer Financial Protection Bureau Data: An Approach Using BERT Based Embeddings

论文作者

Sangaraju, Vasudeva Raju, Bolla, Bharath Kumar, Nayak, Deepak Kumar, Kh, Jyothsna

论文摘要

客户的评论和评论对于企业了解用户对产品和服务的情感很重要。但是,需要分析此数据,以评估与主题/方面相关的情感,以提供有效的客户帮助。 LDA和LSA无法捕获语义关系,并且不是任何领域的特异性。在这项研究中,我们评估了Bertopic,这是一种新颖的方法,该方法使用消费者金融保护局(CFPB)数据的句子来生成主题。我们的工作表明,与LDA和LSA相比,伯托具有灵活性,但提供了有意义和多样的主题。此外,特定领域的预训练嵌入(Finbert)会产生更好的主题。我们评估了连贯评分(C_V)和UMass的主题。

Customers' reviews and comments are important for businesses to understand users' sentiment about the products and services. However, this data needs to be analyzed to assess the sentiment associated with topics/aspects to provide efficient customer assistance. LDA and LSA fail to capture the semantic relationship and are not specific to any domain. In this study, we evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB) data. Our work shows that BERTopic is flexible and yet provides meaningful and diverse topics compared to LDA and LSA. Furthermore, domain-specific pre-trained embeddings (FinBERT) yield even better topics. We evaluated the topics on coherence score (c_v) and UMass.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源