Sabaragamuwa University of Sri Lanka

Augmentation Techniques for Personality Type Classification Using Social Media Text

Show simple item record

dc.contributor.author Dambawinna, W.R.P.W.M.A.K.B.
dc.contributor.author Malkanthi, A.M.C.
dc.date.accessioned 2023-09-14T06:33:03Z
dc.date.available 2023-09-14T06:33:03Z
dc.date.issued 2023-05-31
dc.identifier.isbn 978-624-5727-36-0
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/3860
dc.description.abstract Social media platforms have evolved into a ubiquitous hub for individuals to convey their thoughts, emotions, and behaviours, allowing to generate insights into their personalities. However, the available social media datasets are not diverse enough as they often tend to over-represent certain groups of individuals such as young people. Further class distribution is disproportionate, potentially restricting the generalizability and accuracy of personality analysis. To address these challenges, this study suggests a method using text augmentation techniques and machine learning to expand the dataset and improve the effectiveness of the analysis using the Myers-Briggs Type Indicator (MBTI) dataset of 430000 posts belonging to 8600 individuals from personalitycafe forum. The dataset was split into 80% training and 20% testing sets and the training dataset was later augmented and fed into the models. All the models were evaluated using standard metrics such as accuracy, precision, and recall. Among the evaluated models, Linear Regression classifier outperformed the other three machine learning algorithms with an accuracy of 68.41%. The results are more uniformly distributed across the classes when compared with the other three machine learning algorithms Random Forest, Gradient Descent and XGBoost. Additionally, results showed that the text augmentation strategies employing BERT contextual word embeddings improved the model accuracy by 0.2%. A meagre improvement was observed due to the low quality of the dataset, and lack of contextual understanding in augmented data. Computational cost hindered the possibility of further improvement. Synonym-based augmentation showed poor performance due to a lack of contextual understanding, whereas BERTbased augmentation produced semantically and contextually relevant data, resulting in improved performance. For future work, self-training reinforced models and transfer learning need to be investigated to increase the model performance. en_US
dc.language.iso en en_US
dc.publisher Sabaragamuwa University of Sri Lanka en_US
dc.subject Deep Learning en_US
dc.subject MBTI en_US
dc.subject Personality Classification en_US
dc.subject Text Classification en_US
dc.subject Transfer Learning en_US
dc.title Augmentation Techniques for Personality Type Classification Using Social Media Text en_US
dc.type Book en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account