Abstract:
Road traffic accidents continue to be a major global problem, since they cause deaths and nonfatal
injuries, which lead to economic, healthcare and social burdens on a country. Urban centres
like Colombo are crucial in the accident analysis in Sri Lanka due to the high traffic density
and complex road networks. The main goal of the study is to create a classification model using
Random Forest to predict the accident severity as either “Minor” or “Severe” using the variables
such as time of day, road surface conditions, weather, lighting, number of vehicles involved, and
location type. Furthermore, it focuses on identifying the most influential variable to the model
through variable importance and provides data-driven policy recommendations to improve traffic
safety. The dataset contains records of accidents that occurred in the Colombo Municipal
Council area from 2019 to 2023. It was sourced from the City Traffic Police Station, Fort.
The Random Forest model was constructed using the Random Forest package in R. It builds
multiple decision trees using bootstrapped samples and selects random subsets of variables at
each split, and then the final prediction is made through majority voting among the individual
trees. The mean decrease in Gini impurity from the Random Forest model highlighted variables
such as “Location Type”, “Day Night”, “Number of Vehicles” and “Weather Condition” as key
predictors of accident severity. This suggests that the environment where the accident occurs
significantly influences predicting the severity of the accident, and a greater number of vehicles
may increase the likelihood of more severe outcomes, possibly due to increased collision complexity.
Also, accident severity patterns vary across weekdays. The confusion matrix showed
that out of the total cases, it accurately identifies 929 minor incidents and 160 severe ones. As a
result, a sensitivity of 86.18% for minor cases but a worrying specificity of just 22.66% for severe
ones was obtained. Essentially, while the model reliably spots minor incidents, it struggles
to flag the more consequential severe events, which may be due to the imbalance between the
classes. Therefore, addressing the class imbalance and exploring ensemble tuning or alternative
algorithms can be applied to improve the performance of the model in future studies.