Sabaragamuwa University of Sri Lanka

Machine learning based detection of software vulnerabilities in C code

Show simple item record

dc.contributor.author Junoj, S.
dc.contributor.author Wijeratne, P.M.A.K.
dc.contributor.author Kumara, B.T.G.S.
dc.date.accessioned 2025-12-12T10:24:56Z
dc.date.available 2025-12-12T10:24:56Z
dc.date.issued 2025-02-19
dc.identifier.citation Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025, Faculty of Computing, Sabaragamuwa University of Sri Lanka. en_US
dc.identifier.isbn 978-624-5727-57-5
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4975
dc.description.abstract Software security vulnerability detection is an integral part of creating secure and reliable software. C programming is used extensively in system-level and embedded applications for efficiency and direct control over hardware resources without inherent security features, thus being especially vulnerable to common categories of attacks, including buffer overflow and null pointer dereferences. Classic security vulnerability detection methods based on manual code reviews and static analysis tools are unable to discover complicated security bugs. Given the shortcomings of previous approaches, this research work presents a machine learning-based approach to automate the detection and classification of vulnerabilities in C code. Datasets were gathered from Kaggle and IEEE DataPort, consisting of real-world samples of C code that are quite varied. Feature extraction was performed with the Word2Vec model, which is more powerful than traditional frequency-based methods for capturing semantic and contextual relationships in code. Various machine learning and deep learning models have been explored in the research: Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Convolutional Neural Network (CNN), and Long-Term Short Memory (LSTM). Later on, a hybrid CNN-LSTM model is suggested for better results. These models were then developed, trained, validated, and tested using the 80-10-10 split, evaluated based on the accuracy, precision, recall, and F1-score. These results show that the Decision Tree model had the highest accuracy of 93.46% in vulnerability detection, while the hybrid CNN-LSTM model performed best in classification with an accuracy of 94.55%. These results prove that machine learning significantly enhances software vulnerability detection compared to traditional methods. The study further elucidates how these models can be integrated into real-world software development workflows and improve automated security assessments. Future studies should compare this approach to state-of-the-art vulnerability detection frameworks in order to further tune the machine learning-based security solution. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing, Sabaragamuwa University of Sri Lanka en_US
dc.subject Convolutional Neural Network en_US
dc.subject Vulnerability detection en_US
dc.subject vulnerability classification en_US
dc.subject Machine learning en_US
dc.subject Deep learning en_US
dc.title Machine learning based detection of software vulnerabilities in C code en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account