有效的算法，用于生成近乎最佳的群集描述符，以解释性

论文标题

有效的算法，用于生成近乎最佳的群集描述符，以解释性

Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability

论文作者

Sambaturu, Prathyush, Gupta, Aparna, Davidson, Ian, Ravi, S. S., Vullikanti, Anil, Warren, Andrew

论文摘要

改善机器学习方法结果的解释性已成为重要的研究目标。在这里，我们研究了通过扩展[Davidson等人，Neurips 2018]的最新方法来使集群更容易解释的问题，用于构建集群的简洁表示。给定一组对象$ s $，一个$ s $的分区$π$（进入簇），以及一个标签的宇宙$ t $，以便$ s $中的每个元素都与标签子集有关，目标是为每个集群找到一组代表性的标签，以便这些集合是成对的diss-diss-dissechoints is papewise-disechoint and pairwise-disectients and sotal suble superspitivepitivespitivatiess均为Minimiped。由于这个问题通常是NP-HARD，因此我们开发了近似算法，并为该问题提供了可证明的性能保证。我们还显示了从数据集中解释簇的应用程序，包括代表不同威胁级别的基因组序列簇。

Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects $S$, a partition $π$ of $S$ (into clusters), and a universe $T$ of tags such that each element in $S$ is associated with a subset of tags, the goal is to find a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题