洛根：当地群体偏差通过聚类检测

论文标题

洛根：当地群体偏差通过聚类检测

LOGAN: Local Group Bias Detection by Clustering

论文作者

Zhao, Jieyu, Chang, Kai-Wei

论文摘要

机器学习技术已被广泛用于自然语言处理（NLP）。但是，正如许多最近的研究所揭示的那样，机器学习模型通常会继承和扩大数据中的社会偏见。已经提出了各种指标来量化模型预测中的偏差。特别是，其中一些评估了受保护组和测试语料库中有利组之间模型性能的差异。但是，我们认为在语料库级别评估偏差不足以理解偏见如何嵌入模型中。实际上，在整个数据中，不同组之间具有相似汇总性能的模型在局部区域的实例上的行为可能有所不同。为了分析和检测这种局部偏见，我们提出了洛根（Logan），这是一种基于聚类的新偏差检测技术。毒性分类和对象分类任务的实验表明，洛根识别局部区域中的偏见，并使我们能够更好地分析模型预测中的偏见。

Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题