关于从标签比例学习的复杂性

论文标题

关于从标签比例学习的复杂性

On the Complexity of Learning from Label Proportions

论文作者

Fish, Benjamin, Reyzin, Lev

论文摘要

在使用标签比例学习的问题（我们称为LLP学习的问题）中，培训数据是未标记的，只有接收每个标签的示例的比例。目的是学习一个假设，以预测样品基础分布的标签比例。这种学习模式适用于各种环境，包括预测民意测验中政治选举中候选人的选票数量。在本文中，我们正式定义了这一类，并解决有关LLP计算复杂性的基础问题，并表征其与PAC学习的关系。在我们的结果中，我们也许令人惊讶地表明，对于有限的VC类，可以有效地学习的内容是在标准复杂性假设下可以有效地倾斜在PAC中的严格子集。我们还表明，存在LLP中的可学习性的函数类别独立于ZFC，ZFC是标准设置的理论公理。这意味着LLP学习不能轻易地表征（例如VC维度的PAC）。

In the problem of learning with label proportions, which we call LLP learning, the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is applicable to a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we formally define this class and resolve foundational questions regarding the computational complexity of LLP and characterize its relationship to PAC learning. Among our results, we show, perhaps surprisingly, that for finite VC classes what can be efficiently LLP learned is a strict subset of what can be leaned efficiently in PAC, under standard complexity assumptions. We also show that there exist classes of functions whose learnability in LLP is independent of ZFC, the standard set theoretic axioms. This implies that LLP learning cannot be easily characterized (like PAC by VC dimension).

下载PDF全文

下载文献需遵守相关版权规定

论文标题