论文标题
具有多组归因的KL差异估计
KL Divergence Estimation with Multi-group Attribution
论文作者
论文摘要
在机器学习和信息理论中估算了两个分布之间的kullback-leibler(KL)差异。通过考虑多组公平性的考虑,我们寻求KL Divergence估计,这些估计准确地反映了子人群对整体差异的贡献。我们对域重叠子集的富裕(可能是无限)$ \ MATHCAL {C} $进行了对子群体的建模。我们提出了$ \ Mathcal {c} $的多组归因的概念,该概念要求以$ \ Mathcal {C} $中的每个子人口为条件的估计差异来满足某些自然准确性和公平性,以确保该模型可预测大量差异分配的次数显着分配。我们的主要技术贡献是表明多组归因可以从最近引入的重要权重的多校准概念中得出[HKRR18,GRSW21]。我们提供实验证据以支持我们的理论结果,并表明,在根据子人群为条件的情况下,多组归因比其他流行算法提供了更好的KL差异估计。
Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) family $\mathcal{C}$ of overlapping subsets of the domain. We propose the notion of multi-group attribution for $\mathcal{C}$, which requires that the estimated divergence conditioned on every sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness desiderata, such as ensuring that sub-populations where the model predicts significant divergence do diverge significantly in the two distributions. Our main technical contribution is to show that multi-group attribution can be derived from the recently introduced notion of multi-calibration for importance weights [HKRR18, GRSW21]. We provide experimental evidence to support our theoretical results, and show that multi-group attribution provides better KL divergence estimates when conditioned on sub-populations than other popular algorithms.