Explainable Artificial Intelligence Approach for Agile Story Point Estimation and Issue Type Prediction

De Silva, H.M.C.J.; Wijerathna, P.M.A.K.; Kumara, B.T.G.S.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
COMPUTING UNDERGRADUATE RESEARCH SYMPOSIUM
→
Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025
→
View Item

Explainable Artificial Intelligence Approach for Agile Story Point Estimation and Issue Type Prediction

De Silva, H.M.C.J.; Wijerathna, P.M.A.K.; Kumara, B.T.G.S.

URI: http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4969

Date: 2025-02-19

Abstract:

Agile Software Development (ASD) relies on accurate story point estimation and issue type prediction for effective sprint planning and resource allocation. Despite advancements in Machine Learning (ML), existing approaches lack interpretability and have not yet addressed both tasks together. This study fills this gap by proposing two specialized deep learning models: a regression model for story point estimation (0-20) and a classification model for the prediction of the issue type as bug, story, or task. Both models were trained on a subset of the TAWOS dataset with 65,427 issue reports. Issue titles and descriptions were combined and used as input. Then these textual inputs were converted into vector representations using Word2Vec embeddings to achieve feature extraction. To address the class imbalance in the classification task, dynamic augmentation using BERT-based contextual substitutions was applied. Bidirectional Long Short-Term Memory (BiLSTM) networks were selected after evaluating several other models, including Random Forest, XGBoost, Support Vector Machine, and Logistic Regression. Compared to other traditional ML models, BiLSTM demonstrated better performance. To enhance the interpretability of the models, we incorporated Local Interpretable Model-agnostic Explanations (LIME). This method provides transparency by offering insights into which specific words most influenced the predictions for each issue, both for the regression and classification tasks. The regression model achieved a mean absolute error of 2.11 and a root mean square error of 3.46. The classification model achieved 84% accuracy, with F1 scores of 0.85 for bugs, 0.87 for stories, and 0.71 for tasks. For future work, we propose integrating these models into a unified model. This research fills a gap in ASD practices by introducing an explainable approach that aligns with agile industry norms and fosters trust among practitioners.

Show full item record