基于梯度提升机算法的弥漫大B细胞淋巴瘤患者并发间质性肺炎预测模型构建与验证

Construction and verification of prediction model of interstitial pneumonia in patients with diffuse large B-cell lymphoma based on gradient elevator algorithm

  • 摘要:
    目的 基于梯度提升机(GBM)算法构建弥漫大B细胞淋巴瘤(DLBCL)患者并发间质性肺炎(IP)的预测模型并验证模型效能。
    方法 回顾性分析220例DLBCL患者的临床数据,将患者按7∶3比例分为训练集154例和测试集66例,其中51例患者发生IP(占23.18%), 169例患者未发生IP。基于GBM算法构建预测模型,采用受试者工作特征(ROC)曲线评估模型的区分度,采用校准曲线评估模型的拟合情况。
    结果 经过筛选,年龄、疾病分期、国际预后指数(IPI)评分、吸烟史、乳酸脱氢酶(LDH)这5个最优特征被纳入GBM模型,其相对重要性从高到低依次为年龄、疾病分期、LDH、IPI评分、吸烟史。ROC曲线显示, GBM模型在训练集和测试集中的曲线下面积(AUC)分别为0.872(95%CI: 0.800~0.945)、0.891(95%CI: 0.755~1.000)。校准曲线显示, GBM模型在训练集和测试集中的预测概率均与实际IP发生率具有较好的一致性。
    结论 DLBCL患者治疗后的IP发生率为23.18%, 主要与年龄、疾病分期、IPI评分、吸烟史、LDH水平有关,基于这些因素构建的GBM模型具有较高的准确度和区分度,可为DLBCL患者的临床治疗决策提供参考依据。

     

    Abstract:
    Objective To construct a prediction model of interstitial pneumonia(IP) in patients with diffuse large B-cell lymphoma(DLBCL)based on gradient boosting machine (GBM) and to verify its efficacy.
    Methods The clinical data of 220 patients with DLBCL were retrospectively analyzed, including 51 cases(23.18%) with IP and 169 cases without IP. The patients were divided into training set (154 cases) and test set(66 cases) according to a 7 to 3 ratio. The prediction model was constructed based on GBM algorithm. The receiver operating characteristic (ROC) curve was used to evaluate model differentiation, and model fitting was represented by a calibration curve.
    Results Five optimal features including age, disease stage, international prognostic index (IPI) score, smoking history, and lactate dehydrogenase (LDH) were involved in. The descending order of their relative importance was as follows: age, staging of disease, LDH, IPI score and smoking history. The ROC curve showed that the area under the curve (AUC) of the GBM model was 0.872(95%CI, 0.800 to 0.945) in the training set and 0.891(95%CI, 0.755 to 1.000) in the test set, respectively. The calibration curve showed that the GBM predicted probabilities in the test set and training set were in agreement with the observed outcomes.
    Conclusion The incidence of IP in DLBCL patients after treatment is 23.18%, which is mainly related to age, disease stage, IPI score, smoking history and LDH level. The GBM model built based on these factors has high accuracy and differentiation, and could provide a reference for decision-making of clinical treatment in DLBCL patients.

     

/

返回文章
返回