Sabaragamuwa University of Sri Lanka

Improving the Performance of Machine Learning Classifiers for Imbalanced Multiclass Reputation Analysis

Show simple item record

dc.contributor.author Nugekotuwa, C.K.
dc.contributor.author Ishanka, U.A. P.
dc.date.accessioned 2025-12-12T09:08:08Z
dc.date.available 2025-12-12T09:08:08Z
dc.date.issued 2025-02-19
dc.identifier.citation Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025, Faculty of Computing, Sabaragamuwa University of Sri Lanka. en_US
dc.identifier.isbn 978-624-5727-57-5
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4955
dc.description.abstract Class imbalance in multiclass datasets remains a persistent challenge, often leading to biased models with reduced generalization, particularly affecting the minority class most. This research proposes a hybrid approach to addressing class imbalance by integrating data-level techniques, including custom oversampling and noise cleaning, with algorithm-level techniques such as cost-sensitive self-paced learning and a deep neural network architecture. Custom oversampling balances class distributions by generating synthetic samples for minority classes, while noise cleaning identifies and removes outliers to enhance data quality. Cost-sensitive self-paced learning assigns dynamic weights to samples based on their difficulty and rarity, enabling the model to focus on underrepresented classes while mitigating overfitting. The dataset comprises 11,276 mobile application reviews used for brand reputation analysis, exhibiting a highly imbalanced distribution. The dataset consists of 4,981 negative reviews (44.16%), 3,658 neutral reviews (32.45%), and 2,637 positive reviews (23.39%), making it an ideal benchmark for evaluating this approach. The dataset is trained using a multi-layer perceptron feedforward network, incorporating progressive neuron reduction, batch normalization, and dropout layers to improve learning efficiency and prevent overfitting. The proposed method extends the baseline model, Random Oversampling with Neighborhood Cleaning Rule (ROS-NCL), which effectively balances datasets but does not optimize the decision boundary for minority classes. Experimental results show that the proposed method outperforms the baseline model, improving accuracy from 91.62% to 94.47%, while achieving class-wise precision scores of 95.1% and 91.3% for the two minority classes and 96.2% for the majority class. The results confirm that this hybrid approach is able to perform well on multiclass-imbalanced datasets and has strong implications for real-world applications, particularly in brand reputation monitoring, customer sentiment analysis, and business decision-making, where accurate classification of opinions is crucial for maintaining the competitive advantage. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing, Sabaragamuwa University of Sri Lanka en_US
dc.subject Class Imbalance en_US
dc.subject Cost-Sensitive Learning en_US
dc.subject Noise Cleaning en_US
dc.subject Oversampling en_US
dc.subject Reputation Analysis en_US
dc.title Improving the Performance of Machine Learning Classifiers for Imbalanced Multiclass Reputation Analysis en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account