Classifying Code Quality in Java Open Source Software Projects using Machine Learning and Contribution Metrics.

Welagedara, H.T; Wasalthilaka, W.V.S.K.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
COMPUTING UNDERGRADUATE RESEARCH SYMPOSIUM
→
Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025
→
View Item

dc.contributor.author	Welagedara, H.T
dc.contributor.author	Wasalthilaka, W.V.S.K.
dc.date.accessioned	2025-12-12T10:19:11Z
dc.date.available	2025-12-12T10:19:11Z
dc.date.issued	2025-02-19
dc.identifier.citation	Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025, Faculty of Computing, Sabaragamuwa University of Sri Lanka.	en_US
dc.identifier.isbn	978-624-5727-57-5
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4973
dc.description.abstract	Geographically dispersed volunteer teams can achieve collaborative and transparent processes with Open-Source Software Development (OSSD). While it outperforms traditional methodologies, challenges remain in preserving code quality, managing third-party dependencies leading to compatibility issues, and inconsistencies in developer contributions that can lead to code redundancies. Java as the foundation of software development, has fostered numerous open-source projects, enhancing research dependability. This study proposes a machine learning model that classifies code quality in Java-based open-source software projects by analyzing code contributions. Popular machine learning techniques used for software quality prediction, such as Regression, Decision Trees, Random Forest, Support Vector Machine and Bayesian Learning are used, as well as established software quality metrics to measure the developer’s contribution using source code such as Lines of Code (LOC), Coupling Between Objects (CBO), Response for a Class (RFC), Weighted Methods per Class(WMC), Lack of Cohesion in Methods (LCOM ), Depth of Inheritance Tree (DIT) and Number of Children (NOC). The proposed model was evaluated using a dataset, containing over 200,000 observations and 53 software metrics extracted from open-source projects. Performance was measured using Precision, Accuracy, Recall and F1-Score. Decision Tree and Random Forest currently show the highest model accuracy, with 58%. Neural networks weren't included due to their high computational cost and limited interpretability. This analysis enhances Java OSSD projects by accurately evaluating code contributions, ensuring reliability and sustainability. Refining code review, prioritizing refactoring, and leveraging the best ML approach to predict code quality can strengthen development processes and advance OSSD efficiency.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Computing, Sabaragamuwa University of Sri Lanka	en_US
dc.subject	Code Contributions	en_US
dc.subject	Code Quality	en_US
dc.subject	Machine Learning	en_US
dc.subject	Open-Source Software Development	en_US
dc.subject	Software Quality Metrics	en_US
dc.title	Classifying Code Quality in Java Open Source Software Projects using Machine Learning and Contribution Metrics.	en_US
dc.type	Article	en_US