Abstract:
The main reason for the existence of most anti-corruption laws today is the inability to
address the root causes. Abnormal behaviors occur through robbery, corruption, murder,
threats, etc. Proper solutions to these are implemented only after abnormal incidents
occur. Some CCTV cameras support object detection, but nothing beyond that. Manual
monitoring of CCTV footage for abnormal events is laborious and time-consuming.
Therefore, this study aimed to develop a new method for real-time identification of
abnormal behavior in fighting scenes using a 3D Convolutional Neural Network (CNN)
based spatiotemporal autoencoder. Initially, the study suggested an intelligent video
surveillance system which uses deep learning techniques, including facial expression
detection with CNN and YOLO v7. However, the accuracy of facial expression detection
alone is limited in the real world. The proposed video surveillance system accurately
detects abnormal fights by comparing a specially prepared video stream to frames
generated by an autoencoder. A model was created using TensorFlow and other libraries
to identify fighting scenes in a video stream through spatio-temporal encoders. After
studying the proposed method using three case studies respectively, the last case study
was able to reach the desired result. They were also tested on three different publicly
available datasets: fer2013.csv facial expression dataset, emotion-facial-expression
dataset in the Roboflow library, and CUHK Avenue dataset. The three case studies
aimed to detect abnormal behavior in real-time, and the last method proposed achieved
a 72.56% accuracy in identifying fighting scenes. Furthermore, future research could be
carried out on this approach by studying areas with highly reported fighting incidents
and developing new models specifically for those areas. The proposed system has the
potential to detect abnormal activities in real-time, which can be useful in addressing
the problem of abnormal behavior in both public and private environments.