基于机器学习算法筛选鼻咽癌诊断基因标志物的研究

王艺任, 刘艾艾, 詹翔, 罗颜, 周平

王艺任, 刘艾艾, 詹翔, 罗颜, 周平. 基于机器学习算法筛选鼻咽癌诊断基因标志物的研究[J]. 实用临床医药杂志, 2023, 27(7): 6-11. DOI: 10.7619/jcmp.20230091
引用本文: 王艺任, 刘艾艾, 詹翔, 罗颜, 周平. 基于机器学习算法筛选鼻咽癌诊断基因标志物的研究[J]. 实用临床医药杂志, 2023, 27(7): 6-11. DOI: 10.7619/jcmp.20230091
WANG Yiren, LIU Aiai, ZHAN Xiang, LUO Yan, ZHOU Ping. Screening of genetic markers for diagnosis of nasopharyngeal carcinoma based on machine learning algorithm[J]. Journal of Clinical Medicine in Practice, 2023, 27(7): 6-11. DOI: 10.7619/jcmp.20230091
Citation: WANG Yiren, LIU Aiai, ZHAN Xiang, LUO Yan, ZHOU Ping. Screening of genetic markers for diagnosis of nasopharyngeal carcinoma based on machine learning algorithm[J]. Journal of Clinical Medicine in Practice, 2023, 27(7): 6-11. DOI: 10.7619/jcmp.20230091

基于机器学习算法筛选鼻咽癌诊断基因标志物的研究

基金项目: 

四川省科技计划联合创新专项资助项目 2022YFS0616

四川省医学科研课题计划 S21004

四川省泸州市人民政府-西南医科大学科技战略合作项目 2020LZXNYDJ2

西南医科大学创新创业训练计划 S202210632248

详细信息
    通讯作者:

    周平, E-mail: zhouping11@swmu.edu.cn

  • 中图分类号: R739.62;R446

Screening of genetic markers for diagnosis of nasopharyngeal carcinoma based on machine learning algorithm

  • 摘要:
    目的 

    基于最小绝对收缩和选择算子(LASSO)算法与支持向量机递归特征消除(SVM-RFE)算法筛选用于鼻咽癌(NPC)诊断的特征基因标志物。

    方法 

    从GEO数据库下载基因表达微阵列数据集GSE53819、GSE13597作为训练集,从GTEx数据库、ICGC数据库分别下载转录组测序数据集GTEx-NPC、ICGC-NPC作为训练集、验证集。通过基因表达差异分析筛选NPC相关差异表达基因(DEGs),再通过LASSO算法和SVM-RFE算法分别筛选3个训练集中的NPC诊断特征基因。结合外部验证集,通过受试者工作特征(ROC)曲线的曲线下面积(AUC)评估特征基因对NPC的诊断效能。

    结果 

    本研究共筛选出582个NPC相关DEGs,包括156个高表达DEGs和426个低表达DEGs;基于LASSO算法与SVM-RFE算法,GSE53819、GSE13597、GTEx-NPC数据集均筛选出3个关键诊断特征基因HOXA10、AFF3、SHISA3,且GTEx-NPC数据集另有1个特征基因PLAU;ROC曲线分析结果显示,特征基因HOXA10、AFF3、SHISA3、PLAU在各数据集中诊断NPC的AUC均大于0.7,具有良好的诊断效能。

    结论 

    基于LASSO算法和SVM-RFE算法可筛选出4个潜在的NPC诊断特征基因标志物,且外部验证结果显示这些基因标志物在诊断NPC方面具有良好效能,这为NPC的早期诊断和相关基因的分子机制研究提供了有价值的参考。

    Abstract:
    Objective 

    To screen genetic markers for diagnosis of nasopharyngeal carcinoma (NPC) by the Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine Recursive Feature Elimination (SVM-RFE) algorithms.

    Methods 

    Microarray data sets including GSE53819 and GSE13597 of gene expression were downloaded from the GEO database, and transcriptome sequencing data sets including GTEx database and ICGC-NPC database were downloaded as training set and verification set. Differentially expressed genes (DEGs) related to NPC were identified through gene expression differential analysis. Subsequently, LASSO regression and SVM-RFE were used to screen diagnostic feature genes for NPC in three data sets. Finally, an external validation set was used to evaluate the predictive performance of these diagnostic genes by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.

    Results 

    A total of 582 DEGs related to NPC were identified, including 156 high expression DEGs and 426 low expression DEGs. Three diagnostic feature genes including HOXA10, AFF3 and SHISA3 were identified by LASSO regression algorithm and SVM-RFE algorithm in the microarray data set. Besides, there was another characteristic gene namely PLAU in the GTEx-NPC dataset. ROC curve analysis results showed that the AUC values of characteristic genes such as HOXA10, AFF3, SHISA3 and PLAU in the diagnosis of NPC in all data sets were greater than 0.7, showing good diagnostic efficacy.

    Conclusion 

    Four potential diagnostic feature gene markers for NPC based on LASSO and SVM-RFE algorithm are identified, and they provide valuable references for the diagnosis of NPC, showing a valuable reference for the early diagnosis of NPC and the study of the molecular mechanism of related genes.

  • 图  1   NPC患者与正常对照者DEGs的可视化分析

    A: NPC患者与正常对照者的DEGs热图(group1为正常对照组, group2为NPC患者组,红色为高表达,蓝色为低表达,颜色越深表示基因表达量越高或越低); B: DEGs火山图(蓝色为低表达,红色为高表达)。

    图  2   582个DEGs的GO和KEGG功能富集分析结果

    A: 上调基因的KEGG分析结果; B: 上调基因的GO分析结果; C: 下调基因的KEGG分析结果; D: 下调基因的GO分析结果。不同颜色代表富集结果显著性不同,富集比越大代表FDR值越小,圆圈大小代表富集基因个数,基因个数越多则圆圈越大。

    图  3   LASSO算法和SVM-RFE算法筛选关键特征基因的可视化分析图和特征基因交集韦恩图

    A: LASSO算法筛选GSE53819数据集的关键特征基因; B: LASSO算法筛选GSE13597数据集的关键特征基因; C: LASSO算法筛选GTEx-NPC数据集的关键特征基因; D: SVM-RFE算法筛选GSE53819数据集的关键特征基因数量; E: SVM-RFE算法筛选GSE13597数据集的关键特征基因数量; F: SVM-RFE算法筛选GTEx-NPC数据集的关键特征基因数量; G: GSE53819数据集不同算法结果的韦恩图; H: GSE13597数据集不同算法结果的韦恩图; I: GTEx-NPC数据集不同算法结果的韦恩图。

    图  4   训练集和验证集中特征基因诊断NPC的ROC曲线

    A: 训练集GSE53819; B: 训练集GSE13597; C: 训练集GTEx-NPC; D: 验证集ICGC-NPC。

    表  1   训练集和验证集中特征基因对NPC的诊断效能

    数据集 特征基因 曲线下面积 95%置信区间
    GSE53819 HOXA10 0.950 0.924~0.975
    SHISA3 0.979 0.965~0.993
    AFF3 0.985 0.957~1.000
    GSE13597 HOXA10 0.944 0.852~1.000
    SHISA3 0.774 0.702~0.847
    AFF3 0.738 0.675~0.802
    GTEx-NPC HOXA10 0.950 0.912~0.986
    SHISA3 0.954 0.928~0.992
    AFF3 0.835 0.796~0.861
    PLAU 0.940 0.902~0.968
    ICGC-NPC HOXA10 0.863 0.822~0.894
    SHISA3 0.814 0.785~0.843
    AFF3 0.798 0.747~0.835
    PLAU 0.841 0.801~0.917
    下载: 导出CSV
  • [1] 周溢, 杨丽, 张妍欣, 等. 鼻咽癌幸存者经济毒性现状及影响因素分析[J]. 军事护理, 2023(1): 15-18.
    [2] 薛飞, 张婷, 王锐, 等. 鼻咽癌的临床特征及诊断治疗进展[J]. 医学研究生学报, 2022, 35(11): 1213-1218. https://www.cnki.com.cn/Article/CJFDTOTAL-JLYB202211019.htm
    [3] 吴师雄, 谢静, 方佳宇, 等. 生物信息学方法筛选鼻咽癌的7个关键基因[J]. 武汉大学学报: 医学版, 2022, 43(2): 257-261. https://www.cnki.com.cn/Article/CJFDTOTAL-HBYK202202016.htm
    [4] 赵琳, 何章彪, 张欣, 等. 利用生物信息学分析鼻咽癌关键基因和信号通路[J]. 中国老年学杂志, 2021, 41(7): 1486-1490. https://www.cnki.com.cn/Article/CJFDTOTAL-ZLXZ202107046.htm
    [5]

    ZHANG H, ZOU X, WU L R, et al. Identification of a 7-microRNA signature in plasma as promising biomarker for nasopharyngeal carcinoma detection[J]. Cancer Med, 2020, 9(3): 1230-1241. doi: 10.1002/cam4.2676

    [6]

    GAO P, LU W H, HU S S, et al. Differentially infiltrated identification of novel diagnostic biomarkers associated with immune infiltration in nasopharyngeal carcinoma[J]. Dis Markers, 2022, 2022: 3934704.

    [7] 王静娴, 赵芃, 李业棉, 等. 高维生物医学数据变量筛选方法的模拟研究[J]. 西安交通大学学报: 医学版, 2021, 42(4): 628-632. https://www.cnki.com.cn/Article/CJFDTOTAL-XAYX202104027.htm
    [8]

    LIN X H, LI C, ZHANG Y H, et al. Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics[J]. Molecules, 2017, 23(1): 52. doi: 10.3390/molecules23010052

    [9] 李慧, 曹博雅, 任璐彤, 等. 基于网络药理学的治伤风颗粒治疗感冒的作用机制探讨[J]. 实用临床医药杂志, 2021, 25(12): 18-23, 41. doi: 10.7619/jcmp.20211607
    [10] 盛福梅, 连旭, 韩崇旭. 甲状腺癌差异表达基因的生物信息学分析[J]. 实用临床医药杂志, 2021, 25(10): 1-5, 10. doi: 10.7619/jcmp.20211192
    [11] 欧阳天斌. 鼻咽癌患者调强放射治疗后鼻窦炎的临床特征分析[J]. 中国眼耳鼻喉科杂志, 2023, 23(1): 44-48. https://www.cnki.com.cn/Article/CJFDTOTAL-YRBH202301009.htm
    [12] 陈海珍, 陈建国, 王高仁, 等. 南通市319例鼻咽癌住院患者临床资料分析[J]. 实用肿瘤学杂志, 2022, 36(5): 411-416. https://www.cnki.com.cn/Article/CJFDTOTAL-SYZL202205004.htm
    [13]

    GARCIA-MAGARIÑOS M, ANTONIADIS A, CAO R, et al. Lasso logistic regression, GSoft and the cyclic coordinate descent algorithm: application to gene expression data[J]. Stat Appl Genet Mol Biol, 2010, 9: 76-104.

    [14]

    NAN S, SUN L, CHEN B, et al. Density-dependent quantized least squares support vector machine for large data sets[J]. IEEE Trans Neural Netw Learn Syst, 2017, 28(1): 94-106.

    [15]

    DALIRI M R. Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis[J]. Biomed Tech: Berl, 2012, 57(5): 395-402.

    [16]

    DING X J, YANG F, MA F M. An efficient model selection for linear discriminant function-based recursive feature elimination[J]. J Biomed Inform, 2022, 129: 104070.

    [17]

    GONG D, ZHU H, ZENG L, et al. Overexpression of HOXA10 promotes the growth and metastasis of nasopharyngeal carcinoma[J]. Exp Biol Med: Maywood, 2021, 246(23): 2454-2462.

    [18]

    CHEN Z, GONG Q, LI D, et al. CircKIAA0368 promotes proliferation, migration, and invasion by upregulating HOXA10 in nasopharyngeal carcinoma[J]. Am J Rhinol Allergy, 2022, 36(5): 615-627.

    [19]

    ZHANG J, LI Y Q, GUO R, et al. Hypermethylation of SHISA3 promotes nasopharyngeal carcinoma metastasis by reducing SGSM1 stability[J]. Cancer Res, 2019, 79(4): 747-759.

    [20]

    ZENG Y, ZHANG X, LI F, et al. AFF3 is a novel prognostic biomarker and a potential target for immunotherapy in gastric cancer[J]. J Clin Lab Anal, 2022, 36(6): e24437.

    [21]

    LI Z X, CHEN C H, WANG J C, et al. Overexpressed PLAU and its potential prognostic value in head and neck squamous cell carcinoma[J]. PeerJ, 2021, 9: e10746.

    [22]

    DONG Y L, SUN Y, HUANG Y L, et al. Depletion of MLKL inhibits invasion of radioresistant nasopharyngeal carcinoma cells by suppressing epithelial-mesenchymal transition[J]. Ann Transl Med, 2019, 7(23): 741.

    [23] 陈彦竹, 何倩, 马宏志, 等. PI3K-Akt/mTOR/AMPK通路基因突变与鼻咽癌疗效及预后的关系[J]. 中南大学学报: 医学版, 2022, 47(2): 165-173. https://www.cnki.com.cn/Article/CJFDTOTAL-HNYD202202003.htm
    [24]

    FAN X Q, XIE X N, YANG M, et al. YBX3 mediates the metastasis of nasopharyngeal carcinoma via PI3K/AKT signaling[J]. Front Oncol, 2021, 11: 617621.

  • 期刊类型引用(3)

    1. 万文明,汪福昌,吴娟. 地塞米松灌肠辅助治疗在活动期溃疡性结肠炎患者中的应用. 中国当代医药. 2023(13): 50-53 . 百度学术
    2. 李开望. 自拟中药清肠愈疡汤结合西药治疗溃疡性结肠炎疗效观察. 中国处方药. 2020(04): 132-133 . 百度学术
    3. 徐琛. 愈疡汤保留灌肠+美沙拉嗪栓塞肛对大肠湿热型直肠型溃疡性结肠炎症状改善及GIQLI评分的影响. 江西中医药大学学报. 2020(05): 61-63 . 百度学术

    其他类型引用(0)

图(4)  /  表(1)
计量
  • 文章访问数:  403
  • HTML全文浏览量:  70
  • PDF下载量:  47
  • 被引次数: 3
出版历程
  • 收稿日期:  2023-01-11
  • 修回日期:  2023-02-27
  • 网络出版日期:  2023-04-22

目录

    /

    返回文章
    返回
    x 关闭 永久关闭