Sabaragamuwa University of Sri Lanka

Classifying Code Quality in Java Open Source Software Projects using Machine Learning and Contribution Metrics.

Show simple item record

dc.contributor.author Welagedara, H.T
dc.contributor.author Wasalthilaka, W.V.S.K.
dc.date.accessioned 2025-12-12T10:19:11Z
dc.date.available 2025-12-12T10:19:11Z
dc.date.issued 2025-02-19
dc.identifier.citation Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025, Faculty of Computing, Sabaragamuwa University of Sri Lanka. en_US
dc.identifier.isbn 978-624-5727-57-5
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4973
dc.description.abstract Geographically dispersed volunteer teams can achieve collaborative and transparent processes with Open-Source Software Development (OSSD). While it outperforms traditional methodologies, challenges remain in preserving code quality, managing third-party dependencies leading to compatibility issues, and inconsistencies in developer contributions that can lead to code redundancies. Java as the foundation of software development, has fostered numerous open-source projects, enhancing research dependability. This study proposes a machine learning model that classifies code quality in Java-based open-source software projects by analyzing code contributions. Popular machine learning techniques used for software quality prediction, such as Regression, Decision Trees, Random Forest, Support Vector Machine and Bayesian Learning are used, as well as established software quality metrics to measure the developer’s contribution using source code such as Lines of Code (LOC), Coupling Between Objects (CBO), Response for a Class (RFC), Weighted Methods per Class(WMC), Lack of Cohesion in Methods (LCOM ), Depth of Inheritance Tree (DIT) and Number of Children (NOC). The proposed model was evaluated using a dataset, containing over 200,000 observations and 53 software metrics extracted from open-source projects. Performance was measured using Precision, Accuracy, Recall and F1-Score. Decision Tree and Random Forest currently show the highest model accuracy, with 58%. Neural networks weren't included due to their high computational cost and limited interpretability. This analysis enhances Java OSSD projects by accurately evaluating code contributions, ensuring reliability and sustainability. Refining code review, prioritizing refactoring, and leveraging the best ML approach to predict code quality can strengthen development processes and advance OSSD efficiency. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing, Sabaragamuwa University of Sri Lanka en_US
dc.subject Code Contributions en_US
dc.subject Code Quality en_US
dc.subject Machine Learning en_US
dc.subject Open-Source Software Development en_US
dc.subject Software Quality Metrics en_US
dc.title Classifying Code Quality in Java Open Source Software Projects using Machine Learning and Contribution Metrics. en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account