论文标题
人口统计学算法公平:表征追求公平的人口数据收集风险
Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of Demographic Data Collection in the Pursuit of Fairness
论文作者
论文摘要
大多数提出的算法公平技术需要访问有关“敏感属性”或“受保护类别”(例如种族,种族,性别或性别)的数据,以便进行群体的性能比较和标准化,但是这些数据在实践中不可避免地在实践中不可避免地阻碍了对Algorithmic公平性的广泛产生。通过本文,我们考虑呼吁收集更多有关人口统计数据的数据,以实现算法公平,并挑战以下观点:仅凭足够智能的技术方法和足够的数据就可以克服歧视。我们展示了这些技术在为个人进行更公平的算法处理目的分类时,如何在很大程度上忽略了数据治理和系统压迫的更广泛问题。在这项工作中,我们探讨了应收集人口统计数据的条件,并用来通过对个人和社区的一系列社会风险来收集人口统计学数据。对于个人的风险,我们考虑与可能是公平分析目标的敏感属性相关的独特隐私风险,在数据收集过程中错误分类和虚假陈述的个人以及数据主体期望超出数据主体的敏感数据的使用所带来的危害。从更广泛的角度来看,整个团体和社区的风险包括以公平性,虚假陈述和误解为人口统计组的一部分或保持某种身份的含义,并限制自身构成偏见或不公平治疗的能力的能力,以公平性,虚假陈述和误解的方式扩大监视基础设施的风险。我们认为,通过在收集人口统计数据之前和期间面对这些问题,算法公平方法更有可能实际减轻有害治疗差异而不会加强压迫系统。
Most proposed algorithmic fairness techniques require access to data on a "sensitive attribute" or "protected category" (such as race, ethnicity, gender, or sexuality) in order to make performance comparisons and standardizations across groups, however this data is largely unavailable in practice, hindering the widespread adoption of algorithmic fairness. Through this paper, we consider calls to collect more data on demographics to enable algorithmic fairness and challenge the notion that discrimination can be overcome with smart enough technical methods and sufficient data alone. We show how these techniques largely ignore broader questions of data governance and systemic oppression when categorizing individuals for the purpose of fairer algorithmic processing. In this work, we explore under what conditions demographic data should be collected and used to enable algorithmic fairness methods by characterizing a range of social risks to individuals and communities. For the risks to individuals we consider the unique privacy risks associated with the sharing of sensitive attributes likely to be the target of fairness analysis, the possible harms stemming from miscategorizing and misrepresenting individuals in the data collection process, and the use of sensitive data beyond data subjects' expectations. Looking more broadly, the risks to entire groups and communities include the expansion of surveillance infrastructure in the name of fairness, misrepresenting and mischaracterizing what it means to be part of a demographic group or to hold a certain identity, and ceding the ability to define for themselves what constitutes biased or unfair treatment. We argue that, by confronting these questions before and during the collection of demographic data, algorithmic fairness methods are more likely to actually mitigate harmful treatment disparities without reinforcing systems of oppression.