论文标题
数据离散和正弦波拟合的核心
Coresets for Data Discretization and Sine Wave Fitting
论文作者
论文摘要
在\ emph {Monitoring}问题中,输入是一个无绑定的流$ p = {p_1,p_2 \ cdots} $ in $ [n]中的整数:= \ {1,\ cdots,n \} $,是从传感器或心脏跳动中获得的)。目标(例如,用于异常检测)是近似于到目前为止以$ p $的$ n $点点,例如单个频率$ \ sin $,例如$\min_{c\in C}cost(P,c)+λ(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2π}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $λ$ is a given regularization function.对于任何近似错误$ \ varepsilon> 0 $,我们证明\ emph {enter} $ n $ integers的$ p $具有加权子集$ s \ subseteq p $(有时称为核心 - 核心)$ | s | s | s | \ in O(n) [n] $)达到$ 1 \ pm \ varepsilon $的乘法系数。使用已知的核心技术,这意味着仅使用$ o((\ log(n)\ log(n))^{o(1)})$ memore的流算法。我们的结果适用于大型功能。提供了实验结果和开源代码。
In the \emph{monitoring} problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+λ(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2π}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $λ$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that \emph{every} set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided.