Abstract:
Precisely predicting sports results is a widely known challenge in the sports industry. It
has now become the trend to predict individual sports as well as less predictable team
sports such as football, volleyball, basketball etc. Predicting the outcome of a football
match is an expanded area of research simply for the commercial assets involved in
the betting process. Conventionally, the final outcome of a match was predicted by
the field experts. However, today this approach is empowered by the growing amount
of diverse football-related information that needs to be processed. In this study, we
use various machine learning (ML) techniques to compare the prediction results of the
German Bundesliga which is one of the most popular European Leagues. This study
mainly discusses the comparison between the performances of different machine learning
models used in previous studies. The data used in this study were collected from season
2008/2009 to season 2022/2023 of the German Bundesliga. In order to increase the accuracy
of the models, new attributes were introduced by calculating the rolling averages
of the previous matches. Logistic Regression, Decision Tree, Random Forest, Support
Vector Machine, k-Nearest Neighbor, Gradient Boosting, and Na¨ıve Bayes are the ML
techniques used to predict the results by partitioning the dataset into training and
testing. Training dataset includes data from season 2008/2009 to 2017/2018 (66.67%)
and testing dataset includes data from season 2018/2019 to 2022/2023 (33.33%). By
using several evaluation metrics such as accuracy, precision, sensitivity, F-1 score, and
mean squared error, the best performing model is chosen to make the predictions. The
results show that Random Forest gives the maximum accuracy of 0.6146 with precision,
and sensitivity of 0.5221, and 0.8495 respectively. It can be concluded that, introducing
new features, Random Forest is the best method that can be used in match result
prediction.