Abstract:
Objective To establish and validate a prediction model for gastric cancer lymph node metastasis based on four machine learning (ML) algorithms.
Methods A retrospective analysis was conducted on clinical data of 531 patients who underwent radical gastrectomy. The patients were randomly divided into training set (399 patients) and test set (132 patients) in a ratio of 3 to 1. Univariate analysis was used to screen for variables associated with gastric cancer lymph node metastasis, and Logistic regression, random forest, K-nearest neighbor algorithm, and support vector machine algorithm models were established to rank the importance of variables. All ML algorithm models were validated in the test set, and receiver operating characteristic (ROC) curves were plotted. The optimal ML algorithm model was determined based on the area under the curve (AUC), sensitivity, specificity, and accuracy. A nomogram model was constructed based on the variable importance ranking of the optimal ML algorithm model. The discrimination, calibration, and clinical applicability of the nomogram model were evaluated using ROC curves, calibration curves, and decision curves.
Results The results of the comparison of the four ML algorithm models showed that the random forest model was the optimal algorithm model. The accuracy, sensitivity, and specificity of the random forest model in the training set were 72.7%, 69.9%, and 75.0%, respectively, with an AUC of 0.803. The accuracy, sensitivity, and specificity of the random forest model in the test set were 64.4%, 66.7%, and 62.5%, respectively, with an AUC of 0.751. A nomogram model was constructed based on the variables of the random forest algorithm model. The ROC curve showed that the AUCs of the nomogram model in the training set and test set were 0.721 and 0.776, respectively. Calibration curves and decision curves showed that the nomogram model had good calibration and clinical applicability in both the training set and test set.
Conclusion The random forest model is the optimal algorithm model among the four ML algorithm models. The nomogram model based on the random forest model can accurately predict the risk of gastric cancer lymph node metastasis, thereby better guiding clinical diagnosis and treatment decisions.