Screening of genetic markers for diagnosis of nasopharyngeal carcinoma based on machine learning algorithm
-
摘要:目的
基于最小绝对收缩和选择算子(LASSO)算法与支持向量机递归特征消除(SVM-RFE)算法筛选用于鼻咽癌(NPC)诊断的特征基因标志物。
方法从GEO数据库下载基因表达微阵列数据集GSE53819、GSE13597作为训练集,从GTEx数据库、ICGC数据库分别下载转录组测序数据集GTEx-NPC、ICGC-NPC作为训练集、验证集。通过基因表达差异分析筛选NPC相关差异表达基因(DEGs),再通过LASSO算法和SVM-RFE算法分别筛选3个训练集中的NPC诊断特征基因。结合外部验证集,通过受试者工作特征(ROC)曲线的曲线下面积(AUC)评估特征基因对NPC的诊断效能。
结果本研究共筛选出582个NPC相关DEGs,包括156个高表达DEGs和426个低表达DEGs;基于LASSO算法与SVM-RFE算法,GSE53819、GSE13597、GTEx-NPC数据集均筛选出3个关键诊断特征基因HOXA10、AFF3、SHISA3,且GTEx-NPC数据集另有1个特征基因PLAU;ROC曲线分析结果显示,特征基因HOXA10、AFF3、SHISA3、PLAU在各数据集中诊断NPC的AUC均大于0.7,具有良好的诊断效能。
结论基于LASSO算法和SVM-RFE算法可筛选出4个潜在的NPC诊断特征基因标志物,且外部验证结果显示这些基因标志物在诊断NPC方面具有良好效能,这为NPC的早期诊断和相关基因的分子机制研究提供了有价值的参考。
Abstract:ObjectiveTo screen genetic markers for diagnosis of nasopharyngeal carcinoma (NPC) by the Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine Recursive Feature Elimination (SVM-RFE) algorithms.
MethodsMicroarray data sets including GSE53819 and GSE13597 of gene expression were downloaded from the GEO database, and transcriptome sequencing data sets including GTEx database and ICGC-NPC database were downloaded as training set and verification set. Differentially expressed genes (DEGs) related to NPC were identified through gene expression differential analysis. Subsequently, LASSO regression and SVM-RFE were used to screen diagnostic feature genes for NPC in three data sets. Finally, an external validation set was used to evaluate the predictive performance of these diagnostic genes by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.
ResultsA total of 582 DEGs related to NPC were identified, including 156 high expression DEGs and 426 low expression DEGs. Three diagnostic feature genes including HOXA10, AFF3 and SHISA3 were identified by LASSO regression algorithm and SVM-RFE algorithm in the microarray data set. Besides, there was another characteristic gene namely PLAU in the GTEx-NPC dataset. ROC curve analysis results showed that the AUC values of characteristic genes such as HOXA10, AFF3, SHISA3 and PLAU in the diagnosis of NPC in all data sets were greater than 0.7, showing good diagnostic efficacy.
ConclusionFour potential diagnostic feature gene markers for NPC based on LASSO and SVM-RFE algorithm are identified, and they provide valuable references for the diagnosis of NPC, showing a valuable reference for the early diagnosis of NPC and the study of the molecular mechanism of related genes.
-
-
图 3 LASSO算法和SVM-RFE算法筛选关键特征基因的可视化分析图和特征基因交集韦恩图
A: LASSO算法筛选GSE53819数据集的关键特征基因; B: LASSO算法筛选GSE13597数据集的关键特征基因; C: LASSO算法筛选GTEx-NPC数据集的关键特征基因; D: SVM-RFE算法筛选GSE53819数据集的关键特征基因数量; E: SVM-RFE算法筛选GSE13597数据集的关键特征基因数量; F: SVM-RFE算法筛选GTEx-NPC数据集的关键特征基因数量; G: GSE53819数据集不同算法结果的韦恩图; H: GSE13597数据集不同算法结果的韦恩图; I: GTEx-NPC数据集不同算法结果的韦恩图。
表 1 训练集和验证集中特征基因对NPC的诊断效能
数据集 特征基因 曲线下面积 95%置信区间 GSE53819 HOXA10 0.950 0.924~0.975 SHISA3 0.979 0.965~0.993 AFF3 0.985 0.957~1.000 GSE13597 HOXA10 0.944 0.852~1.000 SHISA3 0.774 0.702~0.847 AFF3 0.738 0.675~0.802 GTEx-NPC HOXA10 0.950 0.912~0.986 SHISA3 0.954 0.928~0.992 AFF3 0.835 0.796~0.861 PLAU 0.940 0.902~0.968 ICGC-NPC HOXA10 0.863 0.822~0.894 SHISA3 0.814 0.785~0.843 AFF3 0.798 0.747~0.835 PLAU 0.841 0.801~0.917 -
[1] 周溢, 杨丽, 张妍欣, 等. 鼻咽癌幸存者经济毒性现状及影响因素分析[J]. 军事护理, 2023(1): 15-18. [2] 薛飞, 张婷, 王锐, 等. 鼻咽癌的临床特征及诊断治疗进展[J]. 医学研究生学报, 2022, 35(11): 1213-1218. https://www.cnki.com.cn/Article/CJFDTOTAL-JLYB202211019.htm [3] 吴师雄, 谢静, 方佳宇, 等. 生物信息学方法筛选鼻咽癌的7个关键基因[J]. 武汉大学学报: 医学版, 2022, 43(2): 257-261. https://www.cnki.com.cn/Article/CJFDTOTAL-HBYK202202016.htm [4] 赵琳, 何章彪, 张欣, 等. 利用生物信息学分析鼻咽癌关键基因和信号通路[J]. 中国老年学杂志, 2021, 41(7): 1486-1490. https://www.cnki.com.cn/Article/CJFDTOTAL-ZLXZ202107046.htm [5] ZHANG H, ZOU X, WU L R, et al. Identification of a 7-microRNA signature in plasma as promising biomarker for nasopharyngeal carcinoma detection[J]. Cancer Med, 2020, 9(3): 1230-1241. doi: 10.1002/cam4.2676
[6] GAO P, LU W H, HU S S, et al. Differentially infiltrated identification of novel diagnostic biomarkers associated with immune infiltration in nasopharyngeal carcinoma[J]. Dis Markers, 2022, 2022: 3934704.
[7] 王静娴, 赵芃, 李业棉, 等. 高维生物医学数据变量筛选方法的模拟研究[J]. 西安交通大学学报: 医学版, 2021, 42(4): 628-632. https://www.cnki.com.cn/Article/CJFDTOTAL-XAYX202104027.htm [8] LIN X H, LI C, ZHANG Y H, et al. Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics[J]. Molecules, 2017, 23(1): 52. doi: 10.3390/molecules23010052
[9] 李慧, 曹博雅, 任璐彤, 等. 基于网络药理学的治伤风颗粒治疗感冒的作用机制探讨[J]. 实用临床医药杂志, 2021, 25(12): 18-23, 41. doi: 10.7619/jcmp.20211607 [10] 盛福梅, 连旭, 韩崇旭. 甲状腺癌差异表达基因的生物信息学分析[J]. 实用临床医药杂志, 2021, 25(10): 1-5, 10. doi: 10.7619/jcmp.20211192 [11] 欧阳天斌. 鼻咽癌患者调强放射治疗后鼻窦炎的临床特征分析[J]. 中国眼耳鼻喉科杂志, 2023, 23(1): 44-48. https://www.cnki.com.cn/Article/CJFDTOTAL-YRBH202301009.htm [12] 陈海珍, 陈建国, 王高仁, 等. 南通市319例鼻咽癌住院患者临床资料分析[J]. 实用肿瘤学杂志, 2022, 36(5): 411-416. https://www.cnki.com.cn/Article/CJFDTOTAL-SYZL202205004.htm [13] GARCIA-MAGARIÑOS M, ANTONIADIS A, CAO R, et al. Lasso logistic regression, GSoft and the cyclic coordinate descent algorithm: application to gene expression data[J]. Stat Appl Genet Mol Biol, 2010, 9: 76-104.
[14] NAN S, SUN L, CHEN B, et al. Density-dependent quantized least squares support vector machine for large data sets[J]. IEEE Trans Neural Netw Learn Syst, 2017, 28(1): 94-106.
[15] DALIRI M R. Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis[J]. Biomed Tech: Berl, 2012, 57(5): 395-402.
[16] DING X J, YANG F, MA F M. An efficient model selection for linear discriminant function-based recursive feature elimination[J]. J Biomed Inform, 2022, 129: 104070.
[17] GONG D, ZHU H, ZENG L, et al. Overexpression of HOXA10 promotes the growth and metastasis of nasopharyngeal carcinoma[J]. Exp Biol Med: Maywood, 2021, 246(23): 2454-2462.
[18] CHEN Z, GONG Q, LI D, et al. CircKIAA0368 promotes proliferation, migration, and invasion by upregulating HOXA10 in nasopharyngeal carcinoma[J]. Am J Rhinol Allergy, 2022, 36(5): 615-627.
[19] ZHANG J, LI Y Q, GUO R, et al. Hypermethylation of SHISA3 promotes nasopharyngeal carcinoma metastasis by reducing SGSM1 stability[J]. Cancer Res, 2019, 79(4): 747-759.
[20] ZENG Y, ZHANG X, LI F, et al. AFF3 is a novel prognostic biomarker and a potential target for immunotherapy in gastric cancer[J]. J Clin Lab Anal, 2022, 36(6): e24437.
[21] LI Z X, CHEN C H, WANG J C, et al. Overexpressed PLAU and its potential prognostic value in head and neck squamous cell carcinoma[J]. PeerJ, 2021, 9: e10746.
[22] DONG Y L, SUN Y, HUANG Y L, et al. Depletion of MLKL inhibits invasion of radioresistant nasopharyngeal carcinoma cells by suppressing epithelial-mesenchymal transition[J]. Ann Transl Med, 2019, 7(23): 741.
[23] 陈彦竹, 何倩, 马宏志, 等. PI3K-Akt/mTOR/AMPK通路基因突变与鼻咽癌疗效及预后的关系[J]. 中南大学学报: 医学版, 2022, 47(2): 165-173. https://www.cnki.com.cn/Article/CJFDTOTAL-HNYD202202003.htm [24] FAN X Q, XIE X N, YANG M, et al. YBX3 mediates the metastasis of nasopharyngeal carcinoma via PI3K/AKT signaling[J]. Front Oncol, 2021, 11: 617621.
-
期刊类型引用(3)
1. 万文明,汪福昌,吴娟. 地塞米松灌肠辅助治疗在活动期溃疡性结肠炎患者中的应用. 中国当代医药. 2023(13): 50-53 . 百度学术
2. 李开望. 自拟中药清肠愈疡汤结合西药治疗溃疡性结肠炎疗效观察. 中国处方药. 2020(04): 132-133 . 百度学术
3. 徐琛. 愈疡汤保留灌肠+美沙拉嗪栓塞肛对大肠湿热型直肠型溃疡性结肠炎症状改善及GIQLI评分的影响. 江西中医药大学学报. 2020(05): 61-63 . 百度学术
其他类型引用(0)