论文标题

估计连续分布的信息度量

On the Estimation of Information Measures of Continuous Distributions

论文作者

Pichler, Georg, Piantanida, Pablo, Koliander, Günther

论文摘要

基于样本的连续分布的信息度量的估计是统计和机器学习中的一个基本问题。在本文中,当概率密度函数属于预先确定的凸元$ \ Mathcal {p} $时,我们分析了$ k $维欧几里得空间中微分熵的估计。首先,如果$ \ Mathcal {p} $的密度差异熵是无限的,则表明将差异性熵估算为任何准确性,这是不可行的,清楚地表明了其他假设的必要性。随后,我们研究了足够的条件,使置信度具有差异熵的估计。特别是,假设概率密度函数是Lipschitz连续具有已知Lipschitz常数和已知的有界支持的概率密度函数,我们为基于简单直方图的差异熵提供了置信界。我们的重点是差异熵,但我们提供的示例表明,相互的信息和相对熵也相似。

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源