论文标题

在可靠性未知的数据源上培训神经网络

Training Neural Networks on Data Sources with Unknown Reliability

论文作者

Capstick, Alexander, Palermo, Francesca, Cui, Tianyu, Barnaghi, Payam

论文摘要

当数据由多个来源生成时,常规培训方法更新模型,假设每个源相等的可靠性,并且不考虑其个人数据质量。但是,在许多应用中,来源具有多样化的可靠性,可能会对神经网络的性能产生负面影响。一个关键问题是,在培训期间,通常不知道各个来源的数据质量。在存在嘈杂数据的情况下,训练模型的先前方法不能利用源标签可以提供的其他信息。为了关注监督学习,我们旨在通过使用由可能性回火动机的动态重新加权策略来训练每个数据源上的神经网络,以与源的估计可靠性成正比。这样,我们允许在热身过程中对所有来源进行培训,并在最终训练阶段减少对较低的可靠来源的学习,当时显示模型过于噪音。我们通过各种实验表明,当对可靠和不可靠的数据源的混合物进行培训时,这可以显着改善模型性能,并在仅在可靠来源上培训模型时保持性能。

When data is generated by multiple sources, conventional training methods update models assuming equal reliability for each source and do not consider their individual data quality. However, in many applications, sources have varied levels of reliability that can have negative effects on the performance of a neural network. A key issue is that often the quality of the data for individual sources is not known during training. Previous methods for training models in the presence of noisy data do not make use of the additional information that the source label can provide. Focusing on supervised learning, we aim to train neural networks on each data source for a number of steps proportional to the source's estimated reliability by using a dynamic re-weighting strategy motivated by likelihood tempering. This way, we allow training on all sources during the warm-up and reduce learning on less reliable sources during the final training stages, when it has been shown that models overfit to noise. We show through diverse experiments that this can significantly improve model performance when trained on mixtures of reliable and unreliable data sources, and maintain performance when models are trained on reliable sources only.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源