IMPROVING THE ACCURACY OF PREDICTING LUNG CANCER PATIENT SURVIVAL TIME USING LSTM NEURAL NETWORK

Nazeer, K.A.A; Sumanathilaka, T.G.D.K; Toppo, E.R; Lakra, R

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Conferences Organized by SUSL
→
University level conferences
→
International Conference of Sabaragamuwa University of Sri Lanka (ICSUSL)
→
7th International Conference of Sabaragamuwa University of Sri Lanka (ICSUSL) 2019
→
View Item

IMPROVING THE ACCURACY OF PREDICTING LUNG CANCER PATIENT SURVIVAL TIME USING LSTM NEURAL NETWORK

Nazeer, K.A.A; Sumanathilaka, T.G.D.K; Toppo, E.R; Lakra, R

URI: http://repo.lib.sab.ac.lk:8080/xmlui/handle/123456789/557

Date: 2019-11-14

Abstract:

Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large data-sets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques are applied to the SEER database to classify lung cancer patients in terms of survival, including Multi-Layer Perception (MLP), Long Short Term Memory (LSTM) and Deep belief model (DBM). Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and the number of primaries, with the goal to enable comparison of predictive power between the various methods. The prediction is treated like a continuous target, rather than a classification into categories as the first step towards improving survival prediction. Several types of research have been performed on the current topic and the best result was achieved using Custom ensemble with RMSE value 15.30. The LSTM Neural Network was trained and tested in order to improve the accuracy of the predication and the below results were achieved. The dataset with more than a hundred thousand instances on lung cancer patients’ details was downloaded from the SEER training model, National Cancer Institute and principal components for training were identified using Principal Component Analysis (PCA). The data was divided into 75% as training data and rest for testing. The best performing technique was Long Short Memory (LSTM) with a Root Mean Square Error (RMSE) value of 10.53 months.10 fold cross-validation was performed over the dataset and results were taken. This results proven that LSTM Neural Network can be successfully used for estimating patient survival time with the ultimate goal to inform patient care decisions.

Show full item record