论文标题
是否应该将单变量COX回归用于针对事件结果的特征选择?
Should univariate Cox regression be used for feature selection with respect to time-to-event outcomes?
论文作者
论文摘要
重要性:事件时间结局通常用于临床试验和生物标志物发现研究,并主要使用COX比例危害模型进行了分析。但是目前尚不清楚当事件的结果引起主要兴趣时,应推荐哪些统计模型进行特征选择任务。目的:探索对数转换的生存时间的高斯回归是否可以优于特征选择中的COX比例危害模型。设计:在这项模拟研究中,真实模型是具有10个协变量的多元COX比例危害模型。对于所有特征选择比较,假定只有5个为所有模型拟合的10个真实特征以及5个随机噪声特征。使用10,000个模拟数据集探索每个样本量和检查率方案。将不同的统计模型应用于同一数据集以估计特征效应。使用灵敏度,特异性和效果大小排名的准确性比较模型性能。结果:当特征是独立的并且真实模型是多元的COX比例危险模型时,对数转换的生存时间(响应变量)的高斯回归,只有两个协变量超过了单变量的COX比例危害模型和逻辑回归模型,并且在特征选择中,不仅是敏感性的尺寸,而且较高的尺寸均可效应,而量的尺寸均高于较高的速度,并且效果的尺寸均可构成效率,并构成了效果的效果。结论和相关性:这项研究表明,将日志转移的生存时间在特征选择实践中添加到事件时期的结果中包括高斯回归的重要性。
IMPORTANCE: Time-to-event outcomes are commonly used in clinical trials and biomarker discovery studies and have been primarily analyzed using Cox proportional hazards models. But it's unclear which statistical models should be recommended for feature selection tasks when time-to-event outcomes are of the primary interest. OBJECTIVE: To explore if Gaussian regression of log-transformed survival time could outperform Cox proportional hazards models in feature selection. DESIGN: In this simulation study, the true models are multivariate Cox proportional hazards models with 10 covariates. For all feature selection comparisons, it's assumed that only 5 out the 10 true features are observed/measured for all model fitting, along with 5 random noise features. Each sample size and censoring rate scenario is explored using 10,000 simulation datasets. Different statistical models are applied to the same dataset to estimate feature effects. Model performance is compared using sensitivity, specificity, and accuracy of effect size ranking. RESULTS: When features are independent and the true models are multivariate Cox proportional hazards models, Gaussian regression of log-transformed survival time (response variable) with only two covariates outperformed both the univariate Cox proportional hazards model and logistic regression in feature selection, in terms of not only higher sensitivity, comparable specificity, but also higher accuracy of effect size ranking, regardless of the sample size and censoring rate values. CONCLUSIONS AND RELEVANCE: This study demonstrates the importance of including Gaussian regression of log-transformed survival time in feature selection practice for time-to-event outcomes.