社区问题中的答案排名答案：一种深度学习方法

论文标题

社区问题中的答案排名答案：一种深度学习方法

Answer ranking in Community Question Answering: a deep learning approach

论文作者

Valentin, Lucas

论文摘要

社区问题回答是计算语言学领域，该领域涉及从Quora或堆栈溢出网站发布的问题和答案中得出的问题。在其中一些问题中，我们发现将发布在每个问题中发布的多个答案进行排名，以解决每个问题的信息，以解决原始问题。这项工作试图通过采用深度学习方法来推动社区问题答案的答案排名。我们首先创建了大量的数据集和发布到堆栈溢出网站的答案。然后，我们利用了密集嵌入和LSTM网络的自然语言处理能力，以对公认的答案属性产生预测，并以排名的形式提出答案，该答案是由问题询问者所接受的可能性标记的可能性。我们还产生了一组数值功能，以协助答案排名任务。这些数值特征要么是从堆栈溢出帖子中发现的元数据中提取的，要么是从问题和答案文本中得出的。我们将深度学习模型的性能与一组森林和增强的树木集合方法进行了比较，发现我们的模型无法改善最佳基线结果。我们推测，这种缺乏性能改进与基线模型可能是由于在问题和答案文本中发现的编程代码片段中存在的大量词汇单词引起的。我们得出结论，尽管深度学习方法可能有助于答案排名问题，应该开发新方法，以协助编程代码段中存在的大量词汇词。

Community Question Answering is the field of computational linguistics that deals with problems derived from the questions and answers posted to websites such as Quora or Stack Overflow. Among some of these problems we find the issue of ranking the multiple answers posted in reply to each question by how informative they are in the attempt to solve the original question. This work tries to advance the state of the art on answer ranking for community Question Answering by proceeding with a deep learning approach. We started off by creating a large data set of questions and answers posted to the Stack Overflow website. We then leveraged the natural language processing capabilities of dense embeddings and LSTM networks to produce a prediction for the accepted answer attribute, and present the answers in a ranked form ordered by how likely they are to be marked as accepted by the question asker. We also produced a set of numerical features to assist with the answer ranking task. These numerical features were either extracted from metadata found in the Stack Overflow posts or derived from the questions and answers texts. We compared the performance of our deep learning models against a set of forest and boosted trees ensemble methods and found that our models could not improve the best baseline results. We speculate that this lack of performance improvement versus the baseline models may be caused by the large number of out of vocabulary words present in the programming code snippets found in the questions and answers text. We conclude that while a deep learning approach may be helpful in answer ranking problems new methods should be developed to assist with the large number of out of vocabulary words present in the programming code snippets

下载PDF全文

下载文献需遵守相关版权规定

论文标题