围产期抑郁症辅助诊断预测模型的构建及机器学习算法的筛选

Construction of a predictive model for auxiliary diagnosis of perinatal depression and screening of machine learning algorithm

  • 摘要:
    目的 基于机器学习(ML)算法构建孕产妇围产期抑郁症(PND)辅助诊断预测模型并评估不同ML算法模型的性能。
    方法 采用9条目患者健康问卷抑郁量表(PHQ-9)对5 814例孕产妇(产前研究对象4 665例, 产后研究对象1 149例)进行评估,收集7种量表的19个量表维度变量和人口学特征作为观察变量。按照年龄分别对产前、产后研究对象进行1∶1倾向性评分匹配,利用单因素分析及Pearson相关系数确定特征选择变量。基于Logistic回归模型、随机森林(RF)、支持向量机(SVM)、极限梯度提升树(XGBoost)、反向传播(BP)神经网络这5种ML算法分别纳入所有变量和特征选择变量集,构建产前、产后抑郁的诊断模型。采用5折交叉验证方法评估模型的预测性能,评价指标包括灵敏度、特异度和曲线下面积(AUC)。
    结果 纳入不同变量的情况下,通过5种ML算法分别基于产前研究对象、产后研究对象构建的预测模型的灵敏度、特异度、AUC均在0.600~0.900范围内; RF算法在产前预测模型(纳入所有变量时, AUC为0.834; 纳入特征选择变量集时, AUC为0.849)和产后预测模型(纳入所有变量时, AUC为0.873; 纳入特征选择变量集时, AUC为0.864)的构建中均为最优算法。
    结论 基于5种ML算法构建的预测模型均可有效预测孕产妇PND风险,其中以RF算法的表现最优,为开发快速筛查和诊断PND的辅助工具提供了参考依据。

     

    Abstract:
    Objective To construct a predictive model for assisted diagnosis of maternal perinatal depression (PND) based on machine learning (ML) algorithms and to evaluate the performance of different ML algorithm models.
    Methods A total of 5 814 pregnant women (4 665 prenatal study subjects and 1 149 postnatal study subjects) were evaluated using the 9-item Patient Health Questionnaire Depression Scale (PHQ-9). A total of 19 scale dimension variables of 7 Scales and demographic characteristics were collected as observation variables. Prenatal and postnatal subjects were matched at a 1∶1 ratio propensity score according to age. The feature selection variables were determined by single factor analysis and Pearson correlation coefficient. A diagnostic model for prenatal and postnatal depression was constructed based on five ML algorithm, including Logistic regression model, Random Forest (RF), support vector machine (SVM), Limit Gradient Lift Tree (XGBoost) and Backpropagation (BP) neural network. A 5-fold cross-validation method was used to evaluate thepredictive performance of the model, including sensitivity, specificity and area under the curve (AUC).
    Results When different variables were included, the sensitivity, specificity and AUC of the prediction model constructed by five ML algorithms based on prenatal and postnatal subjects were all within the range of 0.600 to 0.900. RF algorithm was the optimal algorithm in the construction of both prenatal prediction model(when all variables were included, the AUC was 0.834; when the feature selection variable set was included, the AUC was 0.849) and postnatal prediction model(when all variables were included, the AUC was 0.873; when the feature selection variable set was included, the AUC was 0.864).
    Conclusion The prediction model based on five ML algorithms can effectively predict the risk of PND in pregnant women, and the performance of RF algorithm is the best, which provides a reference for the development of auxiliary tools for rapid screening and diagnosis of PND.

     

/

返回文章
返回