基于克罗恩病线粒体相关基因的人工神经网络模型的构建

Establishment of artificial neural network model based on mitochondria-associated genes in Crohn's disease

  • 摘要:
    目的 基于高通量基因表达(GEO)数据库筛选克罗恩病(CD)线粒体相关基因, 构建人工神经网络诊断模型并评价其效果。
    方法 从GEO数据库下载与CD相关的GSE186582与GSE102133数据集,进行差异表达基因(DEGs)筛选。将DEGs与MitoCarta 3.0数据库线粒体基因取交集。利用最小绝对收缩和选择算子回归算法、随机森林算法识别CD的特征基因并构建人工神经网络诊断模型。采用验证集GSE95095对模型进一步验证,采用受试者操作特征曲线的曲线下面积(AUC)评价诊断效能。通过CIBERSORT算法对CD免疫细胞浸润情况进行评估,并研究生物标志物与浸润的免疫细胞的关系。
    结果 共获取DEGs 551个,其中上调基因275, 下调基因276个。CD相关线粒体基因20个。通过2种机器学习算法筛选出9个CD线粒体相关特征基因(SOD2MTHFD2BPHLPXMP2RMND1AGXT2MAOAHMGCS2MAOB)。采用筛选出的特征基因构建人工神经网络诊断模型。模型在训练组和验证组的AUC分别为0.956和0.736。免疫细胞浸润评估结果显示,特征基因与静止记忆CD4+T细胞、活性记忆CD4+T细胞、活性树突状细胞、中性粒白细胞、CD8+T细胞等相关。
    结论 基于9个线粒体基因构建的CD人工神经网络诊断模型预测性能较好。

     

    Abstract:
    Objective To screen mitochondria-related genes in Crohn's disease (CD) based on the Gene Expression Omnibus (GEO) database, construct an artificial neural network diagnostic model and evaluate its performance.
    Methods The CD-related datasets GSE186582 and GSE102133 were downloaded from the GEO database for differential expression genes (DEGs) screening. The intersection of DEGs and mitochondrial genes from the MitoCarta 3.0 database was obtained. Least absolute shrinkage and selection operator regression and random forest algorithms were used to identify CD-specific genes and construct an artificial neural network diagnostic model. The model was further validated by the validation set GSE95095, and the diagnostic performance was evaluated by the area under the curve (AUC) of the receiver operating characteristic curve. The immune cell infiltration in CD was assessed by the CIBERSORT algorithm, and the relationship between biomarkers and infiltrated immune cells was investigated.
    Results A total of 551 DEGs were obtained, including 275 upregulated and 276 downregulated genes. There were 20 mitochondria-related genes associated with CD. A total of 9 mitochondria-related feature genes (SOD2, MTHFD2, BPHL, PXMP2, RMND1, AGXT2, MAOA, HMGCS2, MAOB) were screened by two machine learning algorithms. An artificial neural network diagnostic model was constructed by the selected feature genes. The values of AUC of the model in the training and validation groups were 0.956 and 0.736 respectively. Immune cell infiltration analysis showed that the feature genes were associated with resting memory CD4+ T cells, activated memory CD4+ T cells, activated dendritic cells, neutrophils, and CD8+ T cells.
    Conclusion The artificial neural network diagnostic model for CD based on 9 mitochondrial genes has good predictive performance.

     

/

返回文章
返回