Sabaragamuwa University of Sri Lanka

An explainable AI-based approach for Java code smell detection to improve software maintainability

Show simple item record

dc.contributor.author Wedaarachchi ., R
dc.contributor.author Herath, G.A.C.A.
dc.contributor.author Wasalthilaka, W.V.S.K.
dc.date.accessioned 2026-06-03T09:35:53Z
dc.date.available 2026-06-03T09:35:53Z
dc.date.issued 2026-01-28
dc.identifier.isbn 978-624-5727-44-5
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5322
dc.description.abstract Code smells are early indicators of design flaws that compromise long term software maintainability. Traditional static analysis tools often generate false positives or fail to detect context dependent smells because they rely on fixed thresholds. As software systems become sophisticated, machine learning (ML) approaches provide a viable alternative for capturing deeper structural and semantic patterns. In order to detect three maintainability critical code smells in Java applications Complex Method, Long Parameter List, and Multifaceted Abstraction; this study proposes an explainable machine learning driven methodology. Using the DACOS dataset, containing 10,341 method level and 4,424 class level samples, separate binary classification pipelines were developed for each smell to guarantee sufficient granularity and domain specific feature engineering. Models, including XGBoost, LightGBM, Random Forest, Gradient Boosting, Decision Tree, SVM, KNN, and Logistic Regression, were trained using an 80/20 stratified split for preserving class distribution, SMOTETomek for resampling, and Grid- SearchCV for hyperparameter tuning to enable an unbiased performance assessment. From intending individual model performance, the top three models for each code smell were integrated to build weighted voting ensemble model to further improve robustness. The findings suggest relatively consistent performance across the cross-validation folds. For Multifaceted Abstraction, XGBoost achieved an accuracy of 0.9401 (SD=0.0043) and an F1 score of 0.9073 (SD=0.0059), while for Complex Method and Long Parameter List, XGBoost achieved accuracy 0.8222 (SD=0.0101) and 0.8104 (SD=0.0085), respectively. Key features were highlighted in the instance level explanations provided by LIME. These insights increase transparency and enable better informed refactoring approaches. However, the results are derived from only a single dataset and a limited number of code smell type, which may limit their generalizability and external validity. Overall, this study presents the potential of a reliable, precise, and interpretable ML based technique for automated code smell detection within the evaluated context to software maintainability. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing. Sabaragamuwa University of Sri Lanka. en_US
dc.subject Code smells en_US
dc.subject Explainable AI en_US
dc.subject Java en_US
dc.subject Machine learning en_US
dc.subject Software maintainability en_US
dc.title An explainable AI-based approach for Java code smell detection to improve software maintainability en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account