论文标题

在高维度处离群值的几何优化的亚线性时间框架

A Sub-linear Time Framework for Geometric Optimization with Outliers in High Dimensions

论文作者

Ding, Hu

论文摘要

许多现实世界中的问题可以作为高维度中的几何优化问题提出,尤其是在机器学习和数据挖掘领域。此外,在优化目标函数时,我们通常需要考虑异常值。但是,离群值的存在可能使问题比其香草版本更具挑战性。在本文中,我们首先研究了带有异常值问题的基本最小封闭球(MEB)。受Bădoiu和Clarkson的核心方法的启发,我们提出了一种基于两种新型技术,即均匀的自适应采样方法和三明治柠檬剂,提出了一个亚线性时间双标准近似算法。据我们所知,我们的结果是第一个子线性时间算法,它具有样本量({\ em i.e.,}采样点的数量),与输入点$ n $和维度$ d $的数量无关,对于MEB而言,具有高尺寸的Outliers的MEB。此外,我们观察到,可以将这两种技术推广到高维度的异常值,包括固定拟合,$ k $ - 中心聚类和与异常值的SVM,并分别实现这些问题的亚线性时间算法。

Many real-world problems can be formulated as geometric optimization problems in high dimensions, especially in the fields of machine learning and data mining. Moreover, we often need to take into account of outliers when optimizing the objective functions. However, the presence of outliers could make the problems to be much more challenging than their vanilla versions. In this paper, we study the fundamental minimum enclosing ball (MEB) with outliers problem first; partly inspired by the core-set method from Bădoiu and Clarkson, we propose a sub-linear time bi-criteria approximation algorithm based on two novel techniques, the Uniform-Adaptive Sampling method and Sandwich Lemma. To the best of our knowledge, our result is the first sub-linear time algorithm, which has the sample size ({\em i.e.,} the number of sampled points) independent of both the number of input points $n$ and dimensionality $d$, for MEB with outliers in high dimensions. Furthermore, we observe that these two techniques can be generalized to deal with a broader range of geometric optimization problems with outliers in high dimensions, including flat fitting, $k$-center clustering, and SVM with outliers, and therefore achieve the sub-linear time algorithms for these problems respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源