Objective To screen genetic markers for diagnosis of nasopharyngeal carcinoma (NPC) by the Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine Recursive Feature Elimination (SVM-RFE) algorithms.
Methods Microarray data sets including GSE53819 and GSE13597 of gene expression were downloaded from the GEO database, and transcriptome sequencing data sets including GTEx database and ICGC-NPC database were downloaded as training set and verification set. Differentially expressed genes (DEGs) related to NPC were identified through gene expression differential analysis. Subsequently, LASSO regression and SVM-RFE were used to screen diagnostic feature genes for NPC in three data sets. Finally, an external validation set was used to evaluate the predictive performance of these diagnostic genes by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.
Results A total of 582 DEGs related to NPC were identified, including 156 high expression DEGs and 426 low expression DEGs. Three diagnostic feature genes including HOXA10, AFF3 and SHISA3 were identified by LASSO regression algorithm and SVM-RFE algorithm in the microarray data set. Besides, there was another characteristic gene namely PLAU in the GTEx-NPC dataset. ROC curve analysis results showed that the AUC values of characteristic genes such as HOXA10, AFF3, SHISA3 and PLAU in the diagnosis of NPC in all data sets were greater than 0.7, showing good diagnostic efficacy.
Conclusion Four potential diagnostic feature gene markers for NPC based on LASSO and SVM-RFE algorithm are identified, and they provide valuable references for the diagnosis of NPC, showing a valuable reference for the early diagnosis of NPC and the study of the molecular mechanism of related genes.