通过众包人口统计信息提高大规模对象识别的公平性

论文标题

通过众包人口统计信息提高大规模对象识别的公平性

Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information

论文作者

Kim, Zu, Araujo, André, Cao, Bingyi, Askew, Cam, Sim, Jack, Green, Mike, Yilla, N'Mah Fodiatu, Weyand, Tobias

论文摘要

在机器学习中，人们对道德问题的认识越来越多，公平已成为一个重要的研究主题。计算机视觉方面的大多数公平努力都集中在人类的感应应用上，并通过增加特定人口组的视觉表现方式来防止人们的身体属性（例如种族，肤色或年龄）歧视。我们认为，ML公平努力也应该扩展到对象识别。建筑物，艺术品，食物和衣服是定义人类文化的物体的例子。在机器学习数据集中公平地代表这些对象将导致模型不太偏向于特定文化，并且更包含不同的传统和价值观。存在许多用于对象识别的研究数据集，但是他们尚未仔细考虑应包括哪些类别，或每个课程应收集多少培训数据。为了解决这个问题，我们提出了一种简单而通用的方法，基于众包撰稿人的人口组成：我们定义公平的相关性得分，估算它们并将其分配给每个类别。我们展示了其在具有里程碑意义的识别域中的应用，并提供了详细的分析和最终的公平地标排名。我们提出分析，与现有数据集相比，这会导致对世界的公平覆盖。评估数据集用于2021 Google Landmark挑战，该挑战是第一个，重点是通用对象识别的公平性。

There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML fairness efforts should extend to object recognition as well. Buildings, artwork, food and clothing are examples of the objects that define human culture. Representing these objects fairly in machine learning datasets will lead to models that are less biased towards a particular culture and more inclusive of different traditions and values. There exist many research datasets for object recognition, but they have not carefully considered which classes should be included, or how much training data should be collected per class. To address this, we propose a simple and general approach, based on crowdsourcing the demographic composition of the contributors: we define fair relevance scores, estimate them, and assign them to each class. We showcase its application to the landmark recognition domain, presenting a detailed analysis and the final fairer landmark rankings. We present analysis which leads to a much fairer coverage of the world compared to existing datasets. The evaluation dataset was used for the 2021 Google Landmark Challenges, which was the first of a kind with an emphasis on fairness in generic object recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题