Abstract:
Lung cancer is still one of the leading causes of cancer deaths worldwide. Cancer detection is a complex and challenging process; however, when identified at an incipient stage, it is amenable to curative interventions. Machine learning is one of the most promising artificial intelligence methods widely used in oncological diagnosis and detection of early stages of disease. Thus, this study assesses the application of machine learning to predict lung cancer for symptom-based diagnosis. We introduce a novel approach by evaluating deep learning methods, whereas previous research primarily relied on traditional machine learning models. The numerical dataset in the Kaggle repository was preprocessed to ensure the quality of inputs and the holdout method was used to evaluate the model performance. Various implemented like Logistic Regression, Decision Trees, Gradient Boosting, SVM, ANN, Random Forest, XGBoost, Linear Regression, Naive Bayes, and LSTM. The models were evaluated using Accuracy, Sensitivity, Specificity, and ROC-AUC matrices. The SVM model outperformed all the other models with an accuracy value of 96.98% followed by ANN (96.82%) and LSTM (96.52 %) models. SVM recorded higher sensitivity compared to ANN (98.48%) and LSTM (98.99%). However, SVM and LSTM recorded a lower specificity value of 74.24%, whereas ANN recorded a highest value of 81.81%. The ROC-AUC was highest for both LSTM and ANN (99.32%) while SVM resulted in 98.58%. These results show that machine learning algorithms can classify lung cancer with acceptable accuracy which opens the way toward the improvement of clinical diagnosis. This study emphasizes the usage of machine learning models like SVM in clinical practice to improve the detection rate of early lung cancer as a binary classification task. Advanced machine learning algorithms can be further finetuned and coupled with different cross-validation methods to check the suitability to detect lung cancers in the future.