论文标题

半监督类发现

Semi-Supervised Class Discovery

论文作者

Nixon, Jeremy, Liu, Jeremiah, Berthelot, David

论文摘要

处理在初始培训分布(OOD)之外的数据点(OOD)之外的一种有希望的方法是创建新类,以捕获先前被拒绝为无法分类的数据点中捕获相似之处。生成标签的系统可以针对任意数量的数据部署,发现通过训练创建更高质量数据的分类方案。我们介绍了数据集重建精度,这是模型创建标签能力的有效性的新的重要度量。我们针对此数据集重建度量介绍了基准。我们应用新的启发式课程可学习性,以确定课程是否值得在培训数据集中添加。我们表明,我们的类发现系统可以成功地应用于视觉和语言,并在自动发现新颖的课程中演示了半监督学习的价值。

One promising approach to dealing with datapoints that are outside of the initial training distribution (OOD) is to create new classes that capture similarities in the datapoints previously rejected as uncategorizable. Systems that generate labels can be deployed against an arbitrary amount of data, discovering classification schemes that through training create a higher quality representation of data. We introduce the Dataset Reconstruction Accuracy, a new and important measure of the effectiveness of a model's ability to create labels. We introduce benchmarks against this Dataset Reconstruction metric. We apply a new heuristic, class learnability, for deciding whether a class is worthy of addition to the training dataset. We show that our class discovery system can be successfully applied to vision and language, and we demonstrate the value of semi-supervised learning in automatically discovering novel classes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源