论文标题
雪茄星系光谱的无监督分类
Unsupervised classification of CIGALE galaxy spectra
论文作者
论文摘要
目标。本研究旨在为无监督的分类算法(称为Fisher-Em)在星系光谱上提供更深入的了解。该算法在区分潜在子空间中使用高斯混合物。为此,我们研究了该算法隔离用于生成模拟光谱的物理参数的能力以及噪声对分类的影响。方法。通过表征恒星种群的九个输入参数的代码雪茄和不同的值,我们模拟了11 475个包含496个单色通量的星系光谱的样本。统计模型和最佳簇数在Fisher-EM中通过综合完成的可能性(ICL)标准给出。我们多次重复分析以评估结果的鲁棒性。结果。在无噪声谱的情况下,可以区分两个不同的分类。当添加噪声时,上面的13个簇上面的13个簇消失,而使用12个簇的分类非常强大,降低了噪声到信号与噪声比(SNR)为3。在SNR = 1时,最佳最佳为5个簇,但分类仍然与前面的分类兼容。用于模拟的参数的分布显示了类之间的极好歧视。在每个类别的光谱和参数分布中的光谱中都有更高的分散体,这使我们得出结论,尽管ICL较高,但在无噪声情况下,具有13个以上簇的分类与物理上无关。结论。这项研究至少对Fisher-EM算法得出两个结论。首先,星系光谱的无监督分类对噪声既可靠又坚固。其次,此类分析能够提取光谱中包含的有用的物理信息并构建高度有意义的分类。在数据驱动的天体物理学时期,重要的是要信任不需要的机器学习方法,这些方法不需要不可避免地会偏见的训练样本。
Aims. The present study aims at providing a deeper insight into the power and limitation of an unsupervised classification algorithm (called Fisher-EM) on spectra of galaxies. This algorithm uses a Gaussian mixture in a discriminative latent subspace. To this end, we investigate the capacity of this algorithm to segregate the physical parameters used to generate mock spectra and the influence of the noise on the classification. Methods. With the code CIGALE and different values for nine input parameters characterising the stellar population, we have simulated a sample of 11 475 optical spectra of galaxies containing 496 monochromatic fluxes. The statistical model and the optimum number of clusters is given in Fisher-EM by the integrated completed likelihood (ICL) criterion. We repeated the analyses several times to assess the robustness of the results. Results. Two distinct classifications can be distinguished in the case of the noiseless spectra. The one above 13 clusters disappears when noise is added, while the classification with 12 clusters is very robust against noise down to a signal to noise ratio (SNR) of 3. At SNR=1, the optimum is 5 clusters, but the classification is still compatible with the previous one. The distribution of the parameters used for the simulation shows an excellent discrimination between classes. A higher dispersion both in the spectra within each class and in the parameter distribution, leads us to conclude that despite a much higher ICL, the classification with more than 13 clusters in the noiseless case is not physically relevant. Conclusions. This study yields two conclusions valid at least for the Fisher-EM algorithm. Firstly, the unsupervised classification of spectra of galaxies is both reliable and robust to noise. Secondly, such analyses are able to extract the useful physical information contained in the spectra and to build highly meaningful classifications. In an epoch of data-driven astrophysics, it is important to trust unsupervised machine learning approaches that do not require training samples which are unavoidably biased.