Sabaragamuwa University of Sri Lanka

Advanced Hate Speech Detection in Social Media Content Using LSTM and CNN

Show simple item record

dc.contributor.author Varatharaj, A.
dc.contributor.author Ahangama, S.
dc.date.accessioned 2025-02-25T09:39:45Z
dc.date.available 2025-02-25T09:39:45Z
dc.date.issued 2025-02-19
dc.identifier.issn 3084-8911
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4869
dc.description.abstract Hate speech is an offensive communication aimed at a certain group. It refers to online mass media in which people communicate information, ideas, messages, and other stuff. Social media platforms and online forums enable user interaction with user-generated material, making them indispensable in everyday life. Individuals must be protected from harmful behavior by enhanced surveillance and effective policies. Hate speech is commonly characterized as "a deliberate act of assault directed against a specific group to harm them due to specific characteristics of their identity". The research gaps are listed below. Existing techniques for identifying and classifying hate speech are insufficient. It highlights the need for improved methods to address the evolving nature of hate speech. Second, existing techniques have limited adaptability. Finally, established models face challenges with complex social media terminology; this study seeks to enhance English hate speech detection using advanced deep learning techniques. This research aimed to build models with deep neural networks and embedded words. Our approach uses transformer-based models with hyperparameter tuning and generative configurations to enhance precision and efficacy. GPU acceleration is used for efficient training models and execution. This research proposes methods such as replacing emojis with text descriptions and removing special characters while retaining emojis to improve interpretation and context preservation. Data is acquired from social media via APIs and data providers before being preprocessed for noise removal, deduplication, normalization, tokenization, stop word removal, and lemmatization. With text features designed for analysis, the data is separated into training, validation, and testing sets. Numerical representations are constructed utilizing TF-IDF and word embeddings, such as Word2Vec and GloVe. Convolutional Neural Networks (CNNs) were used to detect specific sentences and Long Short-Term Memory Networks (LSTMs) to grasp the context. Both models are trained and optimized, and their efficiency is measured using accuracy, precision, recall, and F1-score. NLTK (Natural Language Toolkit) is a powerful Python tool for developing NLP baseline models. The baseline model, the Gradient Boosting Classifier, achieved an accuracy of 0.93, demonstrating excellent performance in traditional machine learning techniques. Our strategy will soon improve hate speech identification using code-mixed languages. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing, Sabaragamuwa University of Sri Lanka, P.O. Box 02, Belihuloya, 70140, Sri Lanka. en_US
dc.subject Hate Speech en_US
dc.subject Deep Learning en_US
dc.subject LSTM en_US
dc.subject Text Embedding en_US
dc.title Advanced Hate Speech Detection in Social Media Content Using LSTM and CNN en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account