论文标题
带有面板数据回归的核心
Coresets for Regressions with Panel Data
论文作者
论文摘要
本文介绍了针对面板数据设置的回归问题的核心问题。我们首先为面板数据回归问题的几种变体定义插件,然后提出有效的算法,以构建大小的插件,这些大小依赖于1/$ \ varepsilon $(其中$ \ varepsilon $是错误参数),而回归参数的数量 - 独立于面板数据的个体数量或每个个体中的个人数量。我们的方法基于Feldman-Langberg框架,在Feldman-Langberg框架中,一个关键步骤是在上限“总灵敏度”的上限,这大约是所有个人时间对的最大影响之和,而对回归参数的所有可能选择。从经验上讲,我们通过合成和现实世界数据集评估了我们的方法;使用我们的方法构建的核心尺寸要比完整数据集构建的尺寸要小得多,而核心确实加速了计算回归目标的运行时间。
This paper introduces the problem of coresets for regression problems to panel data settings. We first define coresets for several variants of regression problems with panel data and then present efficient algorithms to construct coresets of size that depend polynomially on 1/$\varepsilon$ (where $\varepsilon$ is the error parameter) and the number of regression parameters - independent of the number of individuals in the panel data or the time units each individual is observed for. Our approach is based on the Feldman-Langberg framework in which a key step is to upper bound the "total sensitivity" that is roughly the sum of maximum influences of all individual-time pairs taken over all possible choices of regression parameters. Empirically, we assess our approach with synthetic and real-world datasets; the coreset sizes constructed using our approach are much smaller than the full dataset and coresets indeed accelerate the running time of computing the regression objective.