重新评估“分类和计数”量化方法

论文标题

重新评估“分类和计数”量化方法

Re-Assessing the "Classify and Count" Quantification Method

论文作者

Moreo, Alejandro, Sebastiani, Fabrizio

论文摘要

学习量化（又称\量化）是一项涉及通过监督学习培训班级患病率无偏见的任务。该任务源于观察到的“分类和计数”（CC）是获得类流行估计的微不足道方法，通常是一个有偏见的估计器，因此提供了次优的量化精度。在观察之后，已经提出了几种学习量化的方法，这些方法已被证明表现优于CC。在这项工作中，我们认为以前的作品未能使用CC的正确优化版本。因此，我们重新评估了CC（及其变体）的真实优势，并认为，尽管（A）执行超级参数优化，但它们仍不如某些尖端的方法，但进行了近乎状态的准确性，并且（b）通过使用真实的量化损失而不是基于标准的标准分类损失来执行此优化。对三个公开二进制情感分类数据集进行的实验支持了这些结论。

Learning to quantify (a.k.a.\ quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that "Classify and Count" (CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy; following this observation, several methods for learning to quantify have been proposed that have been shown to outperform CC. In this work we contend that previous works have failed to use properly optimised versions of CC. We thus reassess the real merits of CC (and its variants), and argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy once (a) hyperparameter optimisation is performed, and (b) this optimisation is performed by using a true quantification loss instead of a standard classification-based loss. Experiments on three publicly available binary sentiment classification datasets support these conclusions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题