| dc.description.abstract |
Course repetition prediction constitutes one important task in educational data mining that
serves to warn academic institutions of impending risks and allow them to intervene and help
reduce dropout rates. A similar task was studied using the Turkish Student Evaluation dataset
by two models, namely Backpropagation Neural Networks (BPNNs) and Radial Basis Function
Networks (RBFNs), with recognition rates of 86.3% and 84.8%, respectively. Even though
these algorithms achieve far better accuracy rates, they fail to consider minority class performance,
i.e., students who repeat the course once or more times. This implies low practical
value since institutions are more interested in identifying such groups of at-risk students with
high accuracy. In this work, extending beyond these previous studies, modern machine learning
methods are employed, including Extreme Gradient Boosting (XGBoost), dropout-regularised
Multi-Layer Perceptrons (MLPs), and an ensemble of each by soft voting. The SMOTE algorithm
is applied to this problem to address the high level of class imbalance present in the
dataset and allow the minority classes to receive better representation during training. Models
were trained with 5-fold cross-validation using an 80/20 stratified split, and the evaluation measures
used were recognition rate, precision, recall, F1-score, confusion matrices, and class-wise
discrimination using ROC and Precision–Recall curves. 95% confidence intervals and paired
significance tests tested statistical robustness. Results showed that XGBoost alone got 84.3%
accuracy. However, minority classes are not properly identified. SMOTE improved recall and
F1-scores for repeat students, while MLPs further increased sensitivity to minority classes, albeit
at a lower weighted accuracy. Finally, the ensemble presented the most equitable solution,
with an accuracy of 80.6% and a macro-F1 of 0.49, significantly better in fairness metrics than
its single counterparts. ROC and PR curves also showed that the ensemble had superior discrimination,
especially for minority classes. The results confirm the importance of integrating
imbalance handling with ensemble methods in educational prediction tasks. Not only does the
framework propose to yield competitive accuracy as a whole, but it also gives fair predictions
that are interpretable and statistically significant. The framework will contribute to the development
of early warning systems in higher education, enabling institutions to proactively identify
and intervene with students at risk of course repetition. |
en_US |