强大协方差测试的样本复杂性

论文标题

强大协方差测试的样本复杂性

The Sample Complexity of Robust Covariance Testing

论文作者

Diakonikolas, Ilias, Kane, Daniel M.

论文摘要

我们研究了在强大的环境中测试高维高斯的协方差矩阵的问题，在这种情况下，在Huber的污染模型中，输入分布已损坏。具体而言，我们得到了I.I.D. $ z =（1-ε）x +εb$的分布中的样本，其中$ x $是零均值且未知的协方差高斯$ \ MATHCAL {N}（0，σ）$，$ b $是固定但未知的噪声分布，$ε> 0 $是任意小的代表contaMINISTINAL的任意小常数。我们要区分$σ$是身份矩阵与$γ$ -far的情况与弗罗贝尼乌斯规范中的身份。在没有污染的情况下，先前的工作为使用$ o（d）$样本的假设测试任务提供了一个简单的测试仪。此外，在恒定因素内，该样品上限被证明是最好的。我们的主要结果是，在受污染的设置中，协方差测试的样本复杂性大大增加。特别是，我们证明了$ω（d^2）$的样本复杂性下限$ε$一个任意的小常数和$γ= 1/2 $。最好的下限是$ o（d^2）$样品，甚至足以牢固地{\ em学习}协方差。我们结果的概念含义是，对于我们考虑的自然环境，强大的假设检验至少与可靠的估计一样困难。

We study the problem of testing the covariance matrix of a high-dimensional Gaussian in a robust setting, where the input distribution has been corrupted in Huber's contamination model. Specifically, we are given i.i.d. samples from a distribution of the form $Z = (1-ε) X + εB$, where $X$ is a zero-mean and unknown covariance Gaussian $\mathcal{N}(0, Σ)$, $B$ is a fixed but unknown noise distribution, and $ε>0$ is an arbitrarily small constant representing the proportion of contamination. We want to distinguish between the cases that $Σ$ is the identity matrix versus $γ$-far from the identity in Frobenius norm. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. Moreover, this sample upper bound was shown to be best possible, within constant factors. Our main result is that the sample complexity of covariance testing dramatically increases in the contaminated setting. In particular, we prove a sample complexity lower bound of $Ω(d^2)$ for $ε$ an arbitrarily small constant and $γ= 1/2$. This lower bound is best possible, as $O(d^2)$ samples suffice to even robustly {\em learn} the covariance. The conceptual implication of our result is that, for the natural setting we consider, robust hypothesis testing is at least as hard as robust estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题