论文标题
使用存储桶重用的独家和非排他性在线随机实验的统计属性
Statistical Properties of Exclusive and Non-exclusive Online Randomized Experiments using Bucket Reuse
论文作者
论文摘要
随机实验是技术行业产品开发的关键部分。通常有必要运行独家实验的程序,即不能在同一时间内在同一单元上运行的实验。这些程序意味着对随机采样的限制,因为当前正在实验中的单元无法将其采样为新的。此外,从技术上讲,要与大量人群进行这种类型的协调,通常将人群中的单位分组为“桶”,然后在存储桶水平上进行抽样。本文研究了限制抽样和桶级抽样的一些统计含义。本文的贡献是三倍:首先,桶采样与复杂采样设计中随机实验的现有文献有关,这些文献使得可以确定平均治疗效果的均值估计器的属性。在桶的随机采样下,需要这些特性来推断人群。其次,得出了由独家实验程序施加的采样来引入的偏差。最后,介绍了支持理论发现的模拟结果以及有关如何经验评估和处理这种偏见的建议。
Randomized experiments is a key part of product development in the tech industry. It is often necessary to run programs of exclusive experiments, i.e., experiments that cannot be run on the same units during the same time. These programs implies restriction on the random sampling, as units that are currently in an experiment cannot be sampled into a new one. Moreover, to technically enable this type of coordination with large populations, the units in the population are often grouped into 'buckets' and sampling is then performed on the bucket level. This paper investigates some statistical implications of both the restricted sampling and the bucket-level sampling. The contribution of this paper is threefold: First, bucket sampling is connected to the existing literature on randomized experiments in complex sampling designs which enables establishing properties of the difference-in-means estimator of the average treatment effect. These properties are needed for inference to the population under random sampling of buckets. Second, the bias introduced by restricting the sampling as imposed by programs of exclusive experiments, is derived. Finally, simulation results supporting the theoretical findings are presented together with recommendations on how to empirically evaluate and handle this bias.