了解用户对以任务为导向的对话系统的满意

论文标题

了解用户对以任务为导向的对话系统的满意

Understanding User Satisfaction with Task-oriented Dialogue Systems

论文作者

Siro, Clemencia, Aliannejadi, Mohammad, de Rijke, Maarten

论文摘要

$ $对话系统根据其类型和目的进行评估。通常有两个类别的区分：（1）以任务为导向的对话系统（TDS），通常在实用程序上进行评估，即，它们完成指定任务的能力，以及（2）开放式域聊天机器人，这些聊天机器人根据用户体验进行评估，即根据他们的参与者的能力进行评估。用户体验对TD的用户满意度等级而不是公用事业的用户满意度等级有什么影响？我们通过为从Redial数据集采样的对话（一个广泛使用的对话推荐数据集采样）来收集数据。与先前的工作不同，我们在六个对话方面的转弯和对话级别上注释了对话：相关性，有趣，理解，任务完成，效率和兴趣。注释使我们能够研究不同的对话方面如何影响用户满意度。我们介绍了从注释者的打开注释中得出的一系列用户体验方面，这些方面可能会影响用户的整体印象。我们发现，满意度的概念在注释者和对话中各不相同，并表明相关的转弯对某些注释者来说是重要的，而对于其他注释者来说，这是他们所需要的有趣的转弯。我们的分析表明，所提出的用户体验方面提供了对用户满意度的细粒度分析，这种分析并非由整体人类评级所捕获。

$ $Dialogue systems are evaluated depending on their type and purpose. Two categories are often distinguished: (1) task-oriented dialogue systems (TDS), which are typically evaluated on utility, i.e., their ability to complete a specified task, and (2) open domain chatbots, which are evaluated on the user experience, i.e., based on their ability to engage a person. What is the influence of user experience on the user satisfaction rating of TDS as opposed to, or in addition to, utility? We collect data by providing an additional annotation layer for dialogues sampled from the ReDial dataset, a widely used conversational recommendation dataset. Unlike prior work, we annotate the sampled dialogues at both the turn and dialogue level on six dialogue aspects: relevance, interestingness, understanding, task completion, efficiency, and interest arousal. The annotations allow us to study how different dialogue aspects influence user satisfaction. We introduce a comprehensive set of user experience aspects derived from the annotators' open comments that can influence users' overall impression. We find that the concept of satisfaction varies across annotators and dialogues, and show that a relevant turn is significant for some annotators, while for others, an interesting turn is all they need. Our analysis indicates that the proposed user experience aspects provide a fine-grained analysis of user satisfaction that is not captured by a monolithic overall human rating.

下载PDF全文

下载文献需遵守相关版权规定

论文标题