论文标题
eptas for $ k $ -means offine子空间聚类
EPTAS for $k$-means Clustering of Affine Subspaces
论文作者
论文摘要
我们考虑对具有不完整或损坏条目的数据的基本$ k $ - 均值聚类的概括。当数据对象由$ \ mathbb {r}^d $中的点表示时,当丢失或未指定的某些条目时,数据点是不完整的。一个不完整的数据点,最多$δ$未指定的条目对应于最多$δ$的轴 - 平行仿射子空间,称为$δ$ - 点。因此,我们寻求$ n $输入$δ$ - $ k $簇的分区,以最大程度地减少$ k $ -MEANS目标。对于$δ= 0 $,当指定每个点的所有坐标时,这是通常的$ k $ -MEANS群集。我们给出了一种算法,该算法在时间$ f(k,ε,δ)\ cdot n^2 \ cdot d $中找到$(1+ε)$ - 近似解决方案,用于某些功能$ f $ of $ k,ε$和$Δ$。
We consider a generalization of the fundamental $k$-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in $\mathbb{R}^d$, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most $Δ$ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most $Δ$, called a $Δ$-point. Thus we seek a partition of $n$ input $Δ$-points into $k$ clusters minimizing the $k$-means objective. For $Δ=0$, when all coordinates of each point are specified, this is the usual $k$-means clustering. We give an algorithm that finds an $(1+ ε)$-approximate solution in time $f(k,ε, Δ) \cdot n^2 \cdot d$ for some function $f$ of $k,ε$, and $Δ$ only.