慢性便秘患者肠道准备不合格的机器学习预测模型的构建与验证

闻艳; 穆蔚然; 高雪丽; 陈婉珍

doi:10.7619/jcmp.20255957

慢性便秘患者肠道准备不合格的机器学习预测模型的构建与验证

Construction and validation of machine learning based predictive models for inadequate bowel preparation in patients with chronic constipation

摘要

摘要:
目的探讨慢性便秘(CC)患者结肠镜检查前肠道准备不合格的影响因素，并基于机器学习算法构建并验证针对该高危群体的肠道准备不合格预测模型。
方法采用连续抽样法选取2022年5月—2024年5月在南京市中医院接受结肠镜检查的CC患者700例，根据肠道准备质量分为合格组(560例)和不合格组(140例)。采用单因素和多因素Logistic回归分析探讨CC患者结肠镜检查前肠道准备不合格的影响因素。基于SPSS软件，采用Logistic回归、决策树(CRT)、反向传播神经网络(BPNN)等机器学习算法构建CC患者结肠镜检查前肠道准备不合格的预测模型。选取2024年6—10月在南京市中医院接受结肠镜检查的250例CC患者作为独立外部验证队列，检验模型的泛化能力。采用受试者工作特征(ROC)曲线的曲线下面积(AUC)、敏感度、特异度等指标综合比较3种模型的预测价值。
结果单因素和多因素Logistic回归分析结果显示，年龄、便秘病程、糖尿病、腹部手术史、结肠镜检查等待时间、是否服用西甲硅油均是CC患者结肠镜检查前肠道准备不合格的独立影响因素(P＜0.05)。基于CRT模块构建的模型显示，便秘病程、年龄、糖尿病均是CC患者肠道准备不合格的分类因素。根据BPNN模型中自变量的重要性标准化后的结果，便秘病程、年龄、结肠镜检查等待时间、是否服用西甲硅油、糖尿病是影响CC患者肠道准备不合格的前5位因素。3种机器学习算法构建的模型的AUC均>0.800, 其中Logistic回归模型预测效能最佳, AUC为0.889(95%CI: 0.857~0.922), 敏感度为0.843, 特异度为0.821。
结论本研究构建的3种机器学习预测模型均能有效识别CC患者中肠道准备不合格的高风险个体, Logistic回归模型综合表现最佳，为临床进行风险分层与精准干预提供可靠工具。

Abstract:
Objective To explore the influencing factors of inadequate bowel preparation before colonoscopy in patients with chronic constipation (CC), and to construct and validate a predictive model for inadequate bowel preparation tailored to this high-risk group based on machine learning algorithms.
Methods A total of 700 CC patients with colonoscopy in Nanjing Hospital of Traditional Chinese Medicine from May 2022 to May 2024 were selected by consecutive sampling method, and they were divided into qualified group (560 cases) and unqualified group (140 cases) according to the quality of bowel preparation. Univariate and multivariate Logistic regression analyses were employed to identify the influencing factors of inadequate bowel preparation before colonoscopy in CC patients. Using SPSS software, machine learning algorithms including Logistic regression, decision tree (CRT), and back propagation neural network (BPNN) were applied to construct predictive models for inadequate bowel preparation before colonoscopy in CC patients. A total of 250 CC patients with colonoscopy in Nanjing Hospital of Traditional Chinese Medicine from June to October 2024 were selected as an independent external validation cohort for assessment of the generalization ability of the models. The predictive value of the three models was comprehensively compared using metrics such as the area under the curve (AUC) of receiver operating characteristic (ROC) curve, sensitivity, and specificity.
Results Univariate and multivariate Logistic regression analyses revealed that age, duration of constipation, diabetes, history of abdominal surgery, waiting timefor colonoscopy, and whether simethicone was taken were independent factors contributing to inadequate bowel preparation before colonoscopy in CC patients (P < 0.05). The model constructed based on the CRT module indicated that the duration of constipation, age, and diabetes were classification factors for inadequate bowel preparation in CC patients. According to the standardized importance results of independent variables in the BPNN model, the top five factors influencing inadequate bowel preparation in CC patients were the duration of constipation, age, waiting time for colonoscopy, whether simethicone was taken, and diabetes. The AUC of the models constructed using the three machine learning algorithms were all larger than 0.800, with the Logistic regression model demonstrating the best predictive performance, with an AUC of 0.889 (95%CI, 0.857 to 0.922), a sensitivity of 0.843, and a specificity of 0.821.
Conclusion The three machine learning predictive models constructed in this study can effectively identify high-risk individuals with inadequate bowel preparation among CC patients. Logistic regression model exhibits the best overall performance, providing a reliable tool for clinical risk stratification and precise intervention.

HTML全文

参考文献(27)

施引文献

资源附件(0)