论文标题
解释离散序列类别的差异
Explaining Differences in Classes of Discrete Sequences
论文作者
论文摘要
尽管有许多机器学习方法可以对序列进行分类和聚类序列,但他们无法解释序列组的差异,使它们可以区分。尽管在某些情况下,具有黑匣子模型就足够了,但需要提高针对人类行为的研究领域的解释性。例如,心理学家对拥有一个模型不太感兴趣,该模型以高准确性预测人类行为,并且更关心确定导致人类行为不同的行动之间的差异。本文介绍了理解离散序列类别之间差异的技术。本文介绍的方法可用于解释序列上的黑匣子机器学习模型。第一种方法使用轮廓分数比较序列的k-gram表示。第二种方法通过分析子序列的距离矩阵来表征差异。作为案例研究,我们训练了黑匣子监督学习方法,以对GitHub团队的序列进行分类,然后利用我们的序列分析技术来衡量和表征具有机器人和没有机器人的团队的事件序列之间的差异。在我们的第二个案例研究中,我们将Minecraft事件序列分类为推断其高级作用,并分析了低级事件作用序列之间的差异。
While there are many machine learning methods to classify and cluster sequences, they fail to explain what are the differences in groups of sequences that make them distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This paper presents techniques for understanding differences between classes of discrete sequences. Approaches introduced in this paper can be utilized to interpret black box machine learning models on sequences. The first approach compares k-gram representations of sequences using the silhouette score. The second method characterizes differences by analyzing the distance matrix of subsequences. As a case study, we trained black box supervised learning methods to classify sequences of GitHub teams and then utilized our sequence analysis techniques to measure and characterize differences between event sequences of teams with bots and teams without bots. In our second case study, we classified Minecraft event sequences to infer their high-level actions and analyzed differences between low-level event sequences of actions.