Sabaragamuwa University of Sri Lanka

Semantic Metadata-enhanced Deep Learning Techniques for Duplicate Bug Report Detection

Show simple item record

dc.contributor.author Sandunika, D.M.N.
dc.contributor.author Herath, G.A.C.A.
dc.date.accessioned 2026-06-04T08:59:28Z
dc.date.available 2026-06-04T08:59:28Z
dc.date.issued 2026-01-28
dc.identifier.isbn 978-624-5727-44-5
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5328
dc.description.abstract Duplication of bug reports poses a significant impact on software development efficiency with 12-25% of bugs report being duplicated on large projects. Manual identification techniques are both time consuming and prone to error. This study examines whether semantic metadata that encodes content-level similarities together with simplified deep learning architectures can be more effective than relying on complex models. We aim to examine existing practices and determine weaknesses, research and prove semantic content-based metadata to be useful in robust identification, structure and test various deep learning models to define the best methods to use to achieve higher levels of performance. Existing studies reveal that machine learning methods outperform traditional information-retrieval techniques by a significant margin, while deep learning approaches achieve higher accuracy but often suffer from limited feature diversity. We collected 535,477 of quality pairs with a 70%-10%-20% split for training, validation, and testing. Our feature engineering on DistilBERT yielded a mean correlation of 0.1888, surpassing traditional metadata. We compared 6 architectures, namely, LSTM, CNN, LSTM+Metadata, Hybrid+Attention, Hybrid+Attention with Metadata, and proposed LSTM+CNN+Metadata, evaluated through accuracy, precision, recall, F1-score, and AUC-ROC metrics. The proposed architecture resulted 92.53% F1-score, which is better than complex attention-based models. Contextual processing is proven to be better, as LSTM performs higher than CNN. This paper highlights that the quality of features is more critical for model performance than model complexity, presenting a cost-effective, accurate, and scalable method for automated duplicate bug detection in real-world applications. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing. Sabaragamuwa University of Sri Lanka. en_US
dc.subject Deep Learning en_US
dc.subject Duplicate Bug Detection en_US
dc.subject LSTM Networks en_US
dc.subject Semantic Metadata en_US
dc.subject Software Maintenance en_US
dc.title Semantic Metadata-enhanced Deep Learning Techniques for Duplicate Bug Report Detection en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account