论文标题
使用大型扩散模型生成的图像改善种群的皮肤病学分类器
Improving dermatology classifiers across populations using images generated by large diffusion models
论文作者
论文摘要
皮肤病学分类算法在没有足够多样化的培训数据的情况下开发的算法可能会在整个人群之间概括地概括。尽管有意的数据收集和注释为改善表示形式提供了最佳手段,但用于生成培训数据的新计算方法也可能有助于减轻采样偏见的影响。在本文中,我们表明Dall $ \ cdot $ e 2是一种大规模的文本对图扩散模型,可以在皮肤类型的跨类型的皮肤病上产生逼真的逼真的图像。使用Fitzpatrick 17K数据集作为基准,我们证明了使用DALL $ \ CDOT $ E 2生成的合成图像增强培训数据可改善整体皮肤病的分类,尤其是对于代表性不足的组。
Dermatological classification algorithms developed without sufficiently diverse training data may generalize poorly across populations. While intentional data collection and annotation offer the best means for improving representation, new computational approaches for generating training data may also aid in mitigating the effects of sampling bias. In this paper, we show that DALL$\cdot$E 2, a large-scale text-to-image diffusion model, can produce photorealistic images of skin disease across skin types. Using the Fitzpatrick 17k dataset as a benchmark, we demonstrate that augmenting training data with DALL$\cdot$E 2-generated synthetic images improves classification of skin disease overall and especially for underrepresented groups.