使用内在长尾数据的标签 - 噪声学习

论文标题

使用内在长尾数据的标签 - 噪声学习

Label-Noise Learning with Intrinsically Long-Tailed Data

论文作者

Lu, Yang, Zhang, Yiliang, Han, Bo, Cheung, Yiu-ming, Wang, Hanzi

论文摘要

标签噪声是导致深度学习模型概括不佳的关键因素之一。现有的标签 - 噪声学习方法通常假定培训数据的基础类别是平衡的。但是，实际数据通常是不平衡的，导致观察到的与标签噪声的内在类别分布之间的不一致。在这种情况下，很难将清洁样品与固有尾巴类别的嘈杂样品区分开，并具有未知的内在类别分布。在本文中，我们提出了一个学习框架，用于使用内在长尾数据进行标记噪声学习。具体而言，我们提出了两阶段的双维样品选择（TABASCO），以更好地将干净的样品与嘈杂的样品分开，尤其是对于尾巴类别。塔巴斯科由两个新的分离指标组成，它们相互补充以补偿在样品分离中使用单个度量的限制。基准的广泛实验证明了我们方法的有效性。我们的代码可在https://github.com/wakings/tabasco上找到。

Label noise is one of the key factors that lead to the poor generalization of deep learning models. Existing label-noise learning methods usually assume that the ground-truth classes of the training data are balanced. However, the real-world data is often imbalanced, leading to the inconsistency between observed and intrinsic class distribution with label noises. In this case, it is hard to distinguish clean samples from noisy samples on the intrinsic tail classes with the unknown intrinsic class distribution. In this paper, we propose a learning framework for label-noise learning with intrinsically long-tailed data. Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples, especially for the tail classes. TABASCO consists of two new separation metrics that complement each other to compensate for the limitation of using a single metric in sample separation. Extensive experiments on benchmarks demonstrate the effectiveness of our method. Our code is available at https://github.com/Wakings/TABASCO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题