Sabaragamuwa University of Sri Lanka

Enhancing student course repetition prediction using ensemble learning and imbalance handling techniques

Show simple item record

dc.contributor.author Fernando, J.K.R
dc.date.accessioned 2026-01-17T08:43:08Z
dc.date.available 2026-01-17T08:43:08Z
dc.date.issued 2025-12-03
dc.identifier.issn 2815-0341
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5190
dc.description.abstract Course repetition prediction constitutes one important task in educational data mining that serves to warn academic institutions of impending risks and allow them to intervene and help reduce dropout rates. A similar task was studied using the Turkish Student Evaluation dataset by two models, namely Backpropagation Neural Networks (BPNNs) and Radial Basis Function Networks (RBFNs), with recognition rates of 86.3% and 84.8%, respectively. Even though these algorithms achieve far better accuracy rates, they fail to consider minority class performance, i.e., students who repeat the course once or more times. This implies low practical value since institutions are more interested in identifying such groups of at-risk students with high accuracy. In this work, extending beyond these previous studies, modern machine learning methods are employed, including Extreme Gradient Boosting (XGBoost), dropout-regularised Multi-Layer Perceptrons (MLPs), and an ensemble of each by soft voting. The SMOTE algorithm is applied to this problem to address the high level of class imbalance present in the dataset and allow the minority classes to receive better representation during training. Models were trained with 5-fold cross-validation using an 80/20 stratified split, and the evaluation measures used were recognition rate, precision, recall, F1-score, confusion matrices, and class-wise discrimination using ROC and Precision–Recall curves. 95% confidence intervals and paired significance tests tested statistical robustness. Results showed that XGBoost alone got 84.3% accuracy. However, minority classes are not properly identified. SMOTE improved recall and F1-scores for repeat students, while MLPs further increased sensitivity to minority classes, albeit at a lower weighted accuracy. Finally, the ensemble presented the most equitable solution, with an accuracy of 80.6% and a macro-F1 of 0.49, significantly better in fairness metrics than its single counterparts. ROC and PR curves also showed that the ensemble had superior discrimination, especially for minority classes. The results confirm the importance of integrating imbalance handling with ensemble methods in educational prediction tasks. Not only does the framework propose to yield competitive accuracy as a whole, but it also gives fair predictions that are interpretable and statistically significant. The framework will contribute to the development of early warning systems in higher education, enabling institutions to proactively identify and intervene with students at risk of course repetition. en_US
dc.language.iso en en_US
dc.publisher Sabaragamuwa University of Sri Lanka en_US
dc.subject Course repetition en_US
dc.subject Ensemble methods en_US
dc.subject Machine learning en_US
dc.subject SMOTE en_US
dc.subject XGBoost en_US
dc.title Enhancing student course repetition prediction using ensemble learning and imbalance handling techniques en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account