Augmentation Techniques for Personality Type Classification Using Social Media Text

Dambawinna, W.R.P.W.M.A.K.B.; Malkanthi, A.M.C.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Applied Sciences
→
2nd Applied Sciences Undergraduate Research Symposium (APSURS) 2023
→
View Item

Augmentation Techniques for Personality Type Classification Using Social Media Text

Dambawinna, W.R.P.W.M.A.K.B.; Malkanthi, A.M.C.

URI: http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/3860

Date: 2023-05-31

Abstract:

Social media platforms have evolved into a ubiquitous hub for individuals to convey their thoughts, emotions, and behaviours, allowing to generate insights into their personalities. However, the available social media datasets are not diverse enough as they often tend to over-represent certain groups of individuals such as young people. Further class distribution is disproportionate, potentially restricting the generalizability and accuracy of personality analysis. To address these challenges, this study suggests a method using text augmentation techniques and machine learning to expand the dataset and improve the effectiveness of the analysis using the Myers-Briggs Type Indicator (MBTI) dataset of 430000 posts belonging to 8600 individuals from personalitycafe forum. The dataset was split into 80% training and 20% testing sets and the training dataset was later augmented and fed into the models. All the models were evaluated using standard metrics such as accuracy, precision, and recall. Among the evaluated models, Linear Regression classifier outperformed the other three machine learning algorithms with an accuracy of 68.41%. The results are more uniformly distributed across the classes when compared with the other three machine learning algorithms Random Forest, Gradient Descent and XGBoost. Additionally, results showed that the text augmentation strategies employing BERT contextual word embeddings improved the model accuracy by 0.2%. A meagre improvement was observed due to the low quality of the dataset, and lack of contextual understanding in augmented data. Computational cost hindered the possibility of further improvement. Synonym-based augmentation showed poor performance due to a lack of contextual understanding, whereas BERTbased augmentation produced semantically and contextually relevant data, resulting in improved performance. For future work, self-training reinforced models and transfer learning need to be investigated to increase the model performance.

Show full item record

Files in this item

Name: 2nd APSURS_Book_o ...

Size: 1.100Mb

Format: PDF

Description: APSURS - 63

View/Open

This item appears in the following Collection(s)

2nd Applied Sciences Undergraduate Research Symposium (APSURS) 2023 [67]
Empowering the next generation leaders of empowering

Augmentation Techniques for Personality Type Classification Using Social Media Text

Augmentation Techniques for Personality Type Classification Using Social Media Text

Abstract:

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account