Interpreting machine learning models for software bug priority prediction using explainable artificial intelligence

Kulasooriya, K.A.S.Y.; Nirubikaa, R

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Conferences Organized by SUSL
→
University level conferences
→
International Conference of Sabaragamuwa University of Sri Lanka (ICSUSL)
→
The 10th International Conference of Sabaragamuwa University of Sri Lanka and The 6th China-Sri Lanka Communication and Cooperation Forum
→
View Item

Interpreting machine learning models for software bug priority prediction using explainable artificial intelligence

Kulasooriya, K.A.S.Y.; Nirubikaa, R

URI: http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5165

Date: 2025-12-03

Abstract:

Bug priority prediction is significant during the software development process for handling critical issues. Although previous researchers have developed many machine learning models to predict bug priorities, their black-box nature has limited users’ ability to understand the results. The objective of this research is to develop an accurate and interpretable model for software bug priority prediction by evaluating multiple ML models, creating an ensemble model of the bestperforming models and integrating Explainable Artificial Intelligence (XAI) methods to explain predictions in a human-understandable way. In the proposed work, Local Interpretable Modelagnostic Explanations (LIME) in XAI are used to explain model behaviour. To conduct the research, a dataset of nearly 90,000 Bugzilla bug reports was used from 2020 to 2024, where bugs are assigned to the priority levels from P1 to P5, with P1 being the highest. Bug descriptions were used to predict the priority of a bug. Nine Machine Learning models were implemented, including Random Forest (RF), Long Short-Term Memory (LSTM), Extreme Gradient Boosting (XGBoost), etc. After data preprocessing, feature extraction methods: Word2Vec, Global vectors for word representation (GloVe) and FastText were performed separately, along with class balancing techniques: Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling Approach (ADASYN) and data augmentation. The best results were shown in Random Forest and LSTM, with the highest accuracies of 85% and 81% respectively. Then, an ensemble model of RF and LSTM enabled by ADASYN-balanced Word2Vec features delivered the highest performance with 87% for overall accuracy, precision, recall and f-score, as well as robust priority level-wise metrics. LIME was able to offer explanations by detecting keywords from the bug description, which impacted the outcome the most. As an example, a P3 priority bug was influenced by words like “hang”, “windows”, indicating usability problems. Words like “blocks”, “entire” negatively influenced the prediction, as they may suggest more urgent problems (P1 or P2) or minor problems (P5). In conclusion, this study highlights the importance of using XAI in software projects to understand and trust the results of predictive models.

Show full item record