论文标题
在面部分析算法中朝着偏见的因果基准测试
Towards causal benchmarking of bias in face analysis algorithms
论文作者
论文摘要
测量算法偏差对于评估算法公平性和指导算法的改进至关重要。基于观察数据集的计算机视觉中测量算法偏差的当前方法不足,因为它们将算法偏置与数据集偏置混合在一起。 为了解决这个问题,我们开发了一种实验方法,用于测量面部分析算法的算法偏差,该算法直接操纵感兴趣的属性,例如性别和肤色,以揭示属性变化与性能变化之间的因果关系。我们提出的方法是基于生成匹配的样品图像的合成``tractects'',这些图像旨在与特定属性不同,同时使其他属性保持恒定。我们方法的一个关键方面是依靠人类观察者的感知来指导操纵和测量算法偏见。 除了允许测量算法偏差之外,合成样带在观测数据集方面还有其他优势:它们更均匀地允许对少数群体和交叉群体进行更直接的偏见分析,它们可以在新场景中预测新的偏见,从而大大减少伦理和法律挑战和快速的挑战和快速的挑战。 我们通过将方法与采用传统观察方法进行分析性别分类算法偏差的研究进行比较来验证我们的方法。这两种方法得出了不同的结论。虽然观察方法报告性别和肤色偏见,但实验方法揭示了由于性别,头发长度,年龄和面部毛发而引起的偏见。
Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current methods to measure algorithmic bias in computer vision, which are based on observational datasets, are inadequate for this task because they conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. Our proposed method is based on generating synthetic ``transects'' of matched sample images that are designed to differ along specific attributes while leaving other attributes constant. A crucial aspect of our approach is relying on the perception of human observers, both to guide manipulations, and to measure algorithmic bias. Besides allowing the measurement of algorithmic bias, synthetic transects have other advantages with respect to observational datasets: they sample attributes more evenly allowing for more straightforward bias analysis on minority and intersectional groups, they enable prediction of bias in new scenarios, they greatly reduce ethical and legal challenges, and they are economical and fast to obtain, helping make bias testing affordable and widely available. We validate our method by comparing it to a study that employs the traditional observational method for analyzing bias in gender classification algorithms. The two methods reach different conclusions. While the observational method reports gender and skin color biases, the experimental method reveals biases due to gender, hair length, age, and facial hair.