Augmentation Techniques for Personality Type Classification Using Social Media Text

Dambawinna, W.R.P.W.M.A.K.B.; Malkanthi, A.M.C.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Applied Sciences
→
2nd Applied Sciences Undergraduate Research Symposium (APSURS) 2023
→
View Item

dc.contributor.author	Dambawinna, W.R.P.W.M.A.K.B.
dc.contributor.author	Malkanthi, A.M.C.
dc.date.accessioned	2023-09-14T06:33:03Z
dc.date.available	2023-09-14T06:33:03Z
dc.date.issued	2023-05-31
dc.identifier.isbn	978-624-5727-36-0
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/3860
dc.description.abstract	Social media platforms have evolved into a ubiquitous hub for individuals to convey their thoughts, emotions, and behaviours, allowing to generate insights into their personalities. However, the available social media datasets are not diverse enough as they often tend to over-represent certain groups of individuals such as young people. Further class distribution is disproportionate, potentially restricting the generalizability and accuracy of personality analysis. To address these challenges, this study suggests a method using text augmentation techniques and machine learning to expand the dataset and improve the effectiveness of the analysis using the Myers-Briggs Type Indicator (MBTI) dataset of 430000 posts belonging to 8600 individuals from personalitycafe forum. The dataset was split into 80% training and 20% testing sets and the training dataset was later augmented and fed into the models. All the models were evaluated using standard metrics such as accuracy, precision, and recall. Among the evaluated models, Linear Regression classifier outperformed the other three machine learning algorithms with an accuracy of 68.41%. The results are more uniformly distributed across the classes when compared with the other three machine learning algorithms Random Forest, Gradient Descent and XGBoost. Additionally, results showed that the text augmentation strategies employing BERT contextual word embeddings improved the model accuracy by 0.2%. A meagre improvement was observed due to the low quality of the dataset, and lack of contextual understanding in augmented data. Computational cost hindered the possibility of further improvement. Synonym-based augmentation showed poor performance due to a lack of contextual understanding, whereas BERTbased augmentation produced semantically and contextually relevant data, resulting in improved performance. For future work, self-training reinforced models and transfer learning need to be investigated to increase the model performance.	en_US
dc.language.iso	en	en_US
dc.publisher	Sabaragamuwa University of Sri Lanka	en_US
dc.subject	Deep Learning	en_US
dc.subject	MBTI	en_US
dc.subject	Personality Classification	en_US
dc.subject	Text Classification	en_US
dc.subject	Transfer Learning	en_US
dc.title	Augmentation Techniques for Personality Type Classification Using Social Media Text	en_US
dc.type	Book	en_US