Abstract:
Geographically dispersed volunteer teams can achieve collaborative and transparent processes with Open-Source Software Development (OSSD). While it outperforms traditional methodologies, challenges remain in preserving code quality, managing third-party dependencies leading to compatibility issues, and inconsistencies in developer contributions that can lead to code redundancies. Java as the foundation of software development, has fostered numerous open-source projects, enhancing research dependability. This study proposes a machine learning model that classifies code quality in Java-based open-source software projects by analyzing code contributions. Popular machine learning techniques used for software quality prediction, such as Regression, Decision Trees, Random Forest, Support Vector Machine and Bayesian Learning are used, as well as established software quality metrics to measure the developer’s contribution using source code such as Lines of Code (LOC), Coupling Between Objects (CBO), Response for a Class (RFC), Weighted Methods per Class(WMC), Lack of Cohesion in Methods (LCOM ), Depth of Inheritance Tree (DIT) and Number of Children (NOC). The proposed model was evaluated using a dataset, containing over 200,000 observations and 53 software metrics extracted from open-source projects. Performance was measured using Precision, Accuracy, Recall and F1-Score. Decision Tree and Random Forest currently show the highest model accuracy, with 58%. Neural networks weren't included due to their high computational cost and limited interpretability. This analysis enhances Java OSSD projects by accurately evaluating code contributions, ensuring reliability and sustainability. Refining code review, prioritizing refactoring, and leveraging the best ML approach to predict code quality can strengthen development processes and advance OSSD efficiency.