论文标题
高效VDVAE:更少
Efficient-VDVAE: Less is more
论文作者
论文摘要
近年来,层次结构VAE已成为可靠的选择,以进行最大似然估计。但是,不稳定问题和苛刻的计算要求阻碍了该地区的研究进展。我们向非常深的VAE提出了简单的修改,以使其收敛到$ 2.6 \ times $加快速度,可节省多达$ 20 \ times的内存负载,并在培训期间提高稳定性。尽管发生了这些变化,但我们的模型与我们评估的所有$ 7 $常用图像数据集的当前最新模型相比,我们的模型实现了可比或更好的负日志性能。我们还提出了反对使用5位基准测试的论点,作为由于5位量化引起的不良偏见而衡量层次VAE的性能的一种方法。此外,我们从经验上证明,分层VAE的潜在空间尺寸的大约$ 3 \%$足以编码大多数图像信息,而不会丢失性能,从而打开门以有效利用下游任务中的层次结构vaes潜在空间。我们在https://github.com/rayhane-mamah/efficited-vdvae上发布源代码和模型。
Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .