论文标题
使用半监督的生成对抗网络的Pulsar候选识别
Pulsar Candidate Identification Using Semi-Supervised Generative Adversarial Networks
论文作者
论文摘要
机器学习方法越来越多地帮助天文学家识别新的无线电脉冲星。但是,它们需要大量标记的数据,这很耗时才能产生和偏见。在这里,我们描述了一个半监督的生成对抗网络(SGAN),该网络比使用大多数未标记数据集的标准监督算法获得了更好的分类性能。与我们的标准监督基线相比,我们仅在100名标记的候选人和5000名未标记的候选人中获得了94.9%的精度和平均F评分,分别为81.1%和82.7%。我们对标有数据集更大的训练的最终模型的准确性和平均F得分值为99.2%,召回率为99.7%。当有限的标记数据可用时,该技术允许在新仪器的PULSAR调查的早期阶段进行高质量的分类。我们开源我们的工作,以及由高空分辨率宇宙 - 南低纬度调查产生的新的pulsar候选数据集。该数据集具有任何公共数据集的PULSAR检测数量最多,我们希望它将是对未来机器学习模型进行基准测试的有价值的工具。
Machine learning methods are increasingly helping astronomers identify new radio pulsars. However, they require a large amount of labelled data, which is time consuming to produce and biased. Here we describe a Semi-Supervised Generative Adversarial Network (SGAN) which achieves better classification performance than the standard supervised algorithms using majority unlabelled datasets. We achieved an accuracy and mean F-Score of 94.9% trained on only 100 labelled candidates and 5000 unlabelled candidates compared to our standard supervised baseline which scored at 81.1% and 82.7% respectively. Our final model trained on a much larger labelled dataset achieved an accuracy and mean F-score value of 99.2% and a recall rate of 99.7%. This technique allows for high quality classification during the early stages of pulsar surveys on new instruments when limited labelled data is available. We open-source our work along with a new pulsar-candidate dataset produced from the High Time Resolution Universe - South Low Latitude Survey. This dataset has the largest number of pulsar detections of any public dataset and we hope it will be a valuable tool for benchmarking future machine learning models.