论文标题
recab-vae:基于分析差异的Gumbel-Softmax变异推断
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence
论文作者
论文摘要
Gumbel-Softmax分布或具体分布通常用于放松分类分布的离散特性,并通过可区分的重新聚体化来启用反向传播。尽管它可靠地产生了较低的方差梯度,但它仍然依赖于随机抽样过程来优化。在这项工作中,我们提出了一种放松的分类分析结合(重述),这是一种新颖的分化样度量,对应于放松的分类分布的Kullback-Leibler Divergence(KLD)的上限。提出的指标易于实现,因为它具有封闭的形式解决方案,并且经验结果表明它与实际的KLD接近。除了这个新的度量标准外,我们提出了一个放松的分类分析结合变异自动编码器(recab-vae),该分析结合了连续和放松的离散潜在表示。我们基于提出的框架实施情感文本到语音综合系统,并表明所提出的系统可以灵活,稳定地控制语音质量的情绪表达,而不是使用随机估计或分类分布近似的基线。
The Gumbel-softmax distribution, or Concrete distribution, is often used to relax the discrete characteristics of a categorical distribution and enable back-propagation through differentiable reparameterization. Although it reliably yields low variance gradients, it still relies on a stochastic sampling process for optimization. In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution. The proposed metric is easy to implement because it has a closed form solution, and empirical results show that it is close to the actual KLD. Along with this new metric, we propose a relaxed categorical analytic bound variational autoencoder (ReCAB-VAE) that successfully models both continuous and relaxed discrete latent representations. We implement an emotional text-to-speech synthesis system based on the proposed framework, and show that the proposed system flexibly and stably controls emotion expressions with better speech quality compared to baselines that use stochastic estimation or categorical distribution approximation.