论文标题

稀疏网络渐近学用于逻辑回归

Sparse network asymptotics for logistic regression

论文作者

Graham, Bryan S.

论文摘要

考虑一个$ n $消费者选择购买或不购买$ m $不同产品的两部分网络。本文考虑了$ n \ times m $ i-buys-j购买决策的逻辑回归的属性,$ \ weft [y_ {y_ {ij} \ right] _ {1 \ leq i \ leq i \ leq n,1 \ leq j \ leq j \ leq m} $ yound offection and $ sequection and $ sequorces of compotity和$ n $ n $ n $ n($ n) (ii)每个消费者购买的产品的平均数量在极限上是有限的。后一个假设意味着购买网络稀疏:实际上只进行了所有可能购买的一小部分(与许多现实世界中的一致)。在稀疏的网络渐近学下,在logit复合log-likelihoody的得分分数的扩展型式差异分解中,第一个也是最后一个项是相等的。相反,在密集的网络渐近学下,最后一项在渐近上可以忽略不计。使用三角形阵列的Martingale Central limem定理(CLT)显示了逻辑回归系数的渐近正态性。与密集的情况不同,此处得出的正态结果也具有网络图形的退化。相关的是,当手头的数据集中碰巧没有二元依赖性时,它专门针对罕见事件和IID数据的逻辑回归行为得出的结果。稀疏的网络渐近学可能会导致实践中的推断,因为它们提出了方差估计器,而方差估计值(i)包含其他抽样差异来源,并且(ii)在不同程度的二元依赖性下有效。

Consider a bipartite network where $N$ consumers choose to buy or not to buy $M$ different products. This paper considers the properties of the logistic regression of the $N\times M$ array of i-buys-j purchase decisions, $\left[Y_{ij}\right]_{1\leq i\leq N,1\leq j\leq M}$, onto known functions of consumer and product attributes under asymptotic sequences where (i) both $N$ and $M$ grow large and (ii) the average number of products purchased per consumer is finite in the limit. This latter assumption implies that the network of purchases is sparse: only a (very) small fraction of all possible purchases are actually made (concordant with many real-world settings). Under sparse network asymptotics, the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there happens to be no dyadic dependence in the dataset in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and iid data. Sparse network asymptotics may lead to better inference in practice since they suggest variance estimators which (i) incorporate additional sources of sampling variation and (ii) are valid under varying degrees of dyadic dependence.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源