Abstract:
Class imbalance in multiclass datasets remains a persistent challenge, often leading to biased models with reduced generalization, particularly affecting the minority class most. This research proposes a hybrid approach to addressing class imbalance by integrating data-level techniques, including custom oversampling and noise cleaning, with algorithm-level techniques such as cost-sensitive self-paced learning and a deep neural network architecture. Custom oversampling balances class distributions by generating synthetic samples for minority classes, while noise cleaning identifies and removes outliers to enhance data quality. Cost-sensitive self-paced learning assigns dynamic weights to samples based on their difficulty and rarity, enabling the model to focus on underrepresented classes while mitigating overfitting. The dataset comprises 11,276 mobile application reviews used for brand reputation analysis, exhibiting a highly imbalanced distribution. The dataset consists of 4,981 negative reviews (44.16%), 3,658 neutral reviews (32.45%), and 2,637 positive reviews (23.39%), making it an ideal benchmark for evaluating this approach. The dataset is trained using a multi-layer perceptron feedforward network, incorporating progressive neuron reduction, batch normalization, and dropout layers to improve learning efficiency and prevent overfitting. The proposed method extends the baseline model, Random Oversampling with Neighborhood Cleaning Rule (ROS-NCL), which effectively balances datasets but does not optimize the decision boundary for minority classes. Experimental results show that the proposed method outperforms the baseline model, improving accuracy from 91.62% to 94.47%, while achieving class-wise precision scores of 95.1% and 91.3% for the two minority classes and 96.2% for the majority class. The results confirm that this hybrid approach is able to perform well on multiclass-imbalanced datasets and has strong implications for real-world applications, particularly in brand reputation monitoring, customer sentiment analysis, and business decision-making, where accurate classification of opinions is crucial for maintaining the competitive advantage.