通过转移学习的公平生成模型

论文标题

通过转移学习的公平生成模型

Fair Generative Models via Transfer Learning

论文作者

Teo, Christopher TH, Abdollahzadeh, Milad, Cheung, Ngai-Man

论文摘要

这项工作涉及公平生成模型。在深层生成模型中，数据集偏见一直是不公平的主要原因。先前的工作提议增加具有较小的无偏见参考数据集的大偏见数据集。在这种设置下，已经提出了一种弱监督的方法，该方法在生成的样品中达到了最先进的质量和公平性。在我们的工作中，基于此设置，我们提出了一种简单而有效的方法。具体来说，首先，我们提出了Fairtl，这是一种学习公平生成模型的转移学习方法。在Fairtl下，我们将使用可用的大偏置数据集预先培训生成模型，然后使用小型无偏见的参考数据集对模型进行调整。我们发现，得益于较大的（偏见）数据集，我们的Fairtl可以在预训练期间学习表达的样本生成。然后将这些知识转移到适应过程中的目标模型，这也学会了捕获小参考数据集的基本公平分布。其次，我们提出了Fairtl ++，在其中引入了另外两项创新以改进Fairtl：（i）多次反馈和（ii）线性浏览，然后进行微调（LP-FT）。进一步迈出一步，我们考虑只有一个预先训练（可能有偏见的）模型，但是用于预先培训的数据集时，该模型无法访问的替代方案，具有挑战性的设置。我们证明，在此设置下，我们提出的Fairtl和Fairtl ++仍然非常有效。我们注意到，以前的工作需要访问大型偏见的数据集，并且无法处理这种更具挑战性的设置。广泛的实验表明，Fairtl和Fairtl ++在产生的样品的质量和公平性方面都达到了最新。可以在bearwithchris.github.io/fairtl/上找到代码和其他资源。

This work addresses fair generative models. Dataset biases have been a major cause of unfairness in deep generative models. Previous work had proposed to augment large, biased datasets with small, unbiased reference datasets. Under this setup, a weakly-supervised approach has been proposed, which achieves state-of-the-art quality and fairness in generated samples. In our work, based on this setup, we propose a simple yet effective approach. Specifically, first, we propose fairTL, a transfer learning approach to learn fair generative models. Under fairTL, we pre-train the generative model with the available large, biased datasets and subsequently adapt the model using the small, unbiased reference dataset. We find that our fairTL can learn expressive sample generation during pre-training, thanks to the large (biased) dataset. This knowledge is then transferred to the target model during adaptation, which also learns to capture the underlying fair distribution of the small reference dataset. Second, we propose fairTL++, where we introduce two additional innovations to improve upon fairTL: (i) multiple feedback and (ii) Linear-Probing followed by Fine-Tuning (LP-FT). Taking one step further, we consider an alternative, challenging setup when only a pre-trained (potentially biased) model is available but the dataset that was used to pre-train the model is inaccessible. We demonstrate that our proposed fairTL and fairTL++ remain very effective under this setup. We note that previous work requires access to the large, biased datasets and is incapable of handling this more challenging setup. Extensive experiments show that fairTL and fairTL++ achieve state-of-the-art in both quality and fairness of generated samples. The code and additional resources can be found at bearwithchris.github.io/fairTL/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题