基于机器学习算法的胃癌淋巴结转移预测模型研究

施昊旻; 燕速; 乔梦梦; 杨惠莲

doi:10.7619/jcmp.20233076

基于机器学习算法的胃癌淋巴结转移预测模型研究

Research on gastric cancer lymph node metastasis prediction model based on machine learning algorithms

摘要

摘要:
目的基于4种机器学习(ML)算法构建胃癌淋巴结转移的预测模型并验证。
方法回顾性收集531例胃癌根治术患者的临床资料, 按3∶1比例将患者随机分为训练集399例和测试集132例。通过单因素分析筛选胃癌淋巴结转移的特征选择变量，分别建立逻辑回归、随机森林、K-邻近算法、支持向量机算法模型并进行变量重要性排序。将所有ML算法模型在测试集中进行验证，绘制受试者工作特征(ROC)曲线，基于曲线下面积(AUC)、灵敏度、特异度、准确度确定最优ML算法模型。基于最优ML算法模型的变量重要性排序构建列线图模型，通过ROC曲线、校准曲线、决策曲线评价列线图模型的区分能力、校准能力和临床适用性。
结果 4种ML算法模型比较结果显示，随机森林模型为最优算法模型，其在训练集中的准确度、灵敏度、特异度分别为72.7%、69.9%、75.0%, AUC为0.803, 其在测试集中的准确度、灵敏度、特异度分别为64.4%、66.7%、62.5%, AUC为0.751。基于随机森林算法模型的变量构建列线图模型, ROC曲线显示列线图模型在训练集、测试集中的AUC分别为0.721、0.776, 校准曲线和决策曲线显示列线图模型在训练集与测试集中均有较好的校准能力和临床适用性。
结论随机森林模型是4种ML算法模型中的最优算法模型，基于随机森林模型构建的列线图模型能够较准确地预测胃癌淋巴结转移风险，从而更好地指导临床诊断和治疗决策。

Abstract:
Objective To establish and validate a prediction model for gastric cancer lymph node metastasis based on four machine learning (ML) algorithms.
Methods A retrospective analysis was conducted on clinical data of 531 patients who underwent radical gastrectomy. The patients were randomly divided into training set (399 patients) and test set (132 patients) in a ratio of 3 to 1. Univariate analysis was used to screen for variables associated with gastric cancer lymph node metastasis, and Logistic regression, random forest, K-nearest neighbor algorithm, and support vector machine algorithm models were established to rank the importance of variables. All ML algorithm models were validated in the test set, and receiver operating characteristic (ROC) curves were plotted. The optimal ML algorithm model was determined based on the area under the curve (AUC), sensitivity, specificity, and accuracy. A nomogram model was constructed based on the variable importance ranking of the optimal ML algorithm model. The discrimination, calibration, and clinical applicability of the nomogram model were evaluated using ROC curves, calibration curves, and decision curves.
Results The results of the comparison of the four ML algorithm models showed that the random forest model was the optimal algorithm model. The accuracy, sensitivity, and specificity of the random forest model in the training set were 72.7%, 69.9%, and 75.0%, respectively, with an AUC of 0.803. The accuracy, sensitivity, and specificity of the random forest model in the test set were 64.4%, 66.7%, and 62.5%, respectively, with an AUC of 0.751. A nomogram model was constructed based on the variables of the random forest algorithm model. The ROC curve showed that the AUCs of the nomogram model in the training set and test set were 0.721 and 0.776, respectively. Calibration curves and decision curves showed that the nomogram model had good calibration and clinical applicability in both the training set and test set.
Conclusion The random forest model is the optimal algorithm model among the four ML algorithm models. The nomogram model based on the random forest model can accurately predict the risk of gastric cancer lymph node metastasis, thereby better guiding clinical diagnosis and treatment decisions.

HTML全文

参考文献(21)

施引文献

资源附件(0)