Advanced Hate Speech Detection in Social Media Content  Using LSTM and CNN

Varatharaj, A.; Ahangama, S.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
Abstracts of the graduate colloquium
→
Abstracts of the graduate colloquium 2025
→
View Item

dc.contributor.author	Varatharaj, A.
dc.contributor.author	Ahangama, S.
dc.date.accessioned	2025-02-25T09:39:45Z
dc.date.available	2025-02-25T09:39:45Z
dc.date.issued	2025-02-19
dc.identifier.issn	3084-8911
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4869
dc.description.abstract	Hate speech is an offensive communication aimed at a certain group. It refers to online mass media in which people communicate information, ideas, messages, and other stuff. Social media platforms and online forums enable user interaction with user-generated material, making them indispensable in everyday life. Individuals must be protected from harmful behavior by enhanced surveillance and effective policies. Hate speech is commonly characterized as "a deliberate act of assault directed against a specific group to harm them due to specific characteristics of their identity". The research gaps are listed below. Existing techniques for identifying and classifying hate speech are insufficient. It highlights the need for improved methods to address the evolving nature of hate speech. Second, existing techniques have limited adaptability. Finally, established models face challenges with complex social media terminology; this study seeks to enhance English hate speech detection using advanced deep learning techniques. This research aimed to build models with deep neural networks and embedded words. Our approach uses transformer-based models with hyperparameter tuning and generative configurations to enhance precision and efficacy. GPU acceleration is used for efficient training models and execution. This research proposes methods such as replacing emojis with text descriptions and removing special characters while retaining emojis to improve interpretation and context preservation. Data is acquired from social media via APIs and data providers before being preprocessed for noise removal, deduplication, normalization, tokenization, stop word removal, and lemmatization. With text features designed for analysis, the data is separated into training, validation, and testing sets. Numerical representations are constructed utilizing TF-IDF and word embeddings, such as Word2Vec and GloVe. Convolutional Neural Networks (CNNs) were used to detect specific sentences and Long Short-Term Memory Networks (LSTMs) to grasp the context. Both models are trained and optimized, and their efficiency is measured using accuracy, precision, recall, and F1-score. NLTK (Natural Language Toolkit) is a powerful Python tool for developing NLP baseline models. The baseline model, the Gradient Boosting Classifier, achieved an accuracy of 0.93, demonstrating excellent performance in traditional machine learning techniques. Our strategy will soon improve hate speech identification using code-mixed languages.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Computing, Sabaragamuwa University of Sri Lanka, P.O. Box 02, Belihuloya, 70140, Sri Lanka.	en_US
dc.subject	Hate Speech	en_US
dc.subject	Deep Learning	en_US
dc.subject	LSTM	en_US
dc.subject	Text Embedding	en_US
dc.title	Advanced Hate Speech Detection in Social Media Content Using LSTM and CNN	en_US
dc.type	Article	en_US