论文标题
在存在链接的异质概率的情况下,将群集抽样和链接追踪抽样结合到估计隐藏人群的估计总数和手段
Combining Cluster Sampling and Link-Tracing Sampling to Estimate Totals and Means of Hidden Populations in Presence of Heterogeneous Probabilities of Links
论文作者
论文摘要
我们提出了类似于Horvitz-Thompson和Hajek样的估计量,即与Felix-Medina和Thompson(2004)提出的链接追踪采样变体相关的兴趣变量值的总和值。作为这种人口的例子是吸毒者,无家可归的人和性工作者。在此抽样变体中,建造了人口成员倾向于聚集的场所或地方的框架,例如公园和酒吧。该框架不认为覆盖整个人口。从框架中选择了元素的初始群集样本,在该框架中,簇是场所,并要求初始样本中的元素命名其触点,他们也是人口的成员。通过在样品中包括不在初始样品中的命名元素来增加样本量。提出的估计器不使用基于设计的包含概率,而是基于模型的包含概率,这些概率源自Felix-Medina等人提出的模型。 (2015年),并通过最大似然估计器进行估计。假定纳入概率是异质的,也就是说,它们取决于采样的人。通过引导程序获得了提出估计器方差的估计值,它们用于构建总数和手段的置信区间。提出的估计器和置信区间的性能通过两项数值研究评估,其中一个基于实际数据,结果表明它们的性能是可以接受的。
We propose Horvitz-Thompson-like and Hajek-like estimators of the total and mean of the values of a variable of interest associated with the elements of a hard-to-reach population sampled by the variant of link-tracing sampling proposed by Felix-Medina and Thompson (2004). As examples of this type of population are drug users, homeless people and sex workers. In this sampling variant, a frame of venues or places where the members of the population tend to gather, such as parks and bars, is constructed. The frame is not assumed to cover the whole population. An initial cluster sample of elements is selected from the frame, where the clusters are the venues, and the elements in the initial sample are asked to name their contacts who are also members of the population. The sample size is increased by including in the sample the named elements who are not in the initial sample. The proposed estimators do not use design-based inclusion probabilities, but model-based inclusion probabilities which are derived from a model proposed by Felix-Medina et al. (2015) and are estimated by maximum likelihood estimators. The inclusion probabilities are assumed to be heterogeneous, that is, that they depend on the sampled people. Estimates of the variances of the proposed estimators are obtained by bootstrap and they are used to construct confidence intervals of the totals and means. The performance of the proposed estimators and confidence intervals is evaluated by two numerical studies, one of them based on real data, and the results show that their performance is acceptable.
