论文标题
Chow-liu对树结构分布的近乎最佳学习
Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu
论文作者
论文摘要
我们为经典的Chow-Liu算法(IEEE TRANS。 For a distribution $P$ on $Σ^n$ and a tree $T$ on $n$ nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a $T$-structured distribution $Q$ such that $D(P\;||\;Q)$ is at most $\varepsilon$ more than the best possible tree-structured distribution for $P$.我们表明,如果$ p $本身是树的结构,则使用插件估算器的Chow-liu算法使用$ \ widetilde {o}(|σ|^3 n \ varepsilon^{ - 1} { - 1})$ i.i.d.相比之下,对于一般的$ p $(可能不是树的结构化),$ω(n^2 \ varepsilon^{ - 2})$样品对于找到$ \ varepsilon $ - approximate树是必要的。我们的上限是基于一项新的条件独立测试仪,该测试仪解决了Canonne,Diakonikolas,Kane和Stewart〜(Stoc,2018年)提出的一个空旷的问题:我们证明,对于三个随机变量,$ x,y,z $每个$ x $ avanty $ i(x; y \ y \ y \ y \ y \ y \ y \ geq v act co q y v verec cop y varrons都可能是$ x,y,z $ $ \ widetilde {o}(|σ|^3/\ varepsilon)$样本。最后,我们表明,对于特定的树$ t $,带有$ \ widetilde {o}(|σ|^2n \ varepsilon^{ - 1})$ samble $ p $ over $σ^n $上的$ p $,可以有效地学习通过在kl divergence中使用添加1个估算的node nondotor的$ t $ t $结构的$ t $ t $结构的分配。
We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution $P$ on $Σ^n$ and a tree $T$ on $n$ nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a $T$-structured distribution $Q$ such that $D(P\;||\;Q)$ is at most $\varepsilon$ more than the best possible tree-structured distribution for $P$. We show that if $P$ itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with $\widetilde{O}(|Σ|^3 n\varepsilon^{-1})$ i.i.d.~samples outputs an $\varepsilon$-approximate tree for $P$ with constant probability. In contrast, for a general $P$ (which may not be tree-structured), $Ω(n^2\varepsilon^{-2})$ samples are necessary to find an $\varepsilon$-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three random variables $X,Y,Z$ each over $Σ$, testing if $I(X; Y \mid Z)$ is $0$ or $\geq \varepsilon$ is possible with $\widetilde{O}(|Σ|^3/\varepsilon)$ samples. Finally, we show that for a specific tree $T$, with $\widetilde{O} (|Σ|^2n\varepsilon^{-1})$ samples from a distribution $P$ over $Σ^n$, one can efficiently learn the closest $T$-structured distribution in KL divergence by applying the add-1 estimator at each node.