Abstract:
Bug priority prediction is significant during the software development process for handling critical
issues. Although previous researchers have developed many machine learning models to
predict bug priorities, their black-box nature has limited users’ ability to understand the results.
The objective of this research is to develop an accurate and interpretable model for software bug
priority prediction by evaluating multiple ML models, creating an ensemble model of the bestperforming
models and integrating Explainable Artificial Intelligence (XAI) methods to explain
predictions in a human-understandable way. In the proposed work, Local Interpretable Modelagnostic
Explanations (LIME) in XAI are used to explain model behaviour. To conduct the research,
a dataset of nearly 90,000 Bugzilla bug reports was used from 2020 to 2024, where bugs
are assigned to the priority levels from P1 to P5, with P1 being the highest. Bug descriptions
were used to predict the priority of a bug. Nine Machine Learning models were implemented,
including Random Forest (RF), Long Short-Term Memory (LSTM), Extreme Gradient Boosting
(XGBoost), etc. After data preprocessing, feature extraction methods: Word2Vec, Global
vectors for word representation (GloVe) and FastText were performed separately, along with
class balancing techniques: Synthetic Minority Oversampling Technique (SMOTE), Adaptive
Synthetic Sampling Approach (ADASYN) and data augmentation. The best results were shown
in Random Forest and LSTM, with the highest accuracies of 85% and 81% respectively. Then,
an ensemble model of RF and LSTM enabled by ADASYN-balanced Word2Vec features delivered
the highest performance with 87% for overall accuracy, precision, recall and f-score,
as well as robust priority level-wise metrics. LIME was able to offer explanations by detecting
keywords from the bug description, which impacted the outcome the most. As an example, a P3
priority bug was influenced by words like “hang”, “windows”, indicating usability problems.
Words like “blocks”, “entire” negatively influenced the prediction, as they may suggest more
urgent problems (P1 or P2) or minor problems (P5). In conclusion, this study highlights the
importance of using XAI in software projects to understand and trust the results of predictive
models.