A Study of Machine Learning Models for Text-Based Mental Health Prediction in Sri Lanka

Vitharana, K.S.N.; Kumara, P.G.P.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
COMPUTING UNDERGRADUATE RESEARCH SYMPOSIUM
→
ComURS2026 Computing Undergraduate Research Symposium : Abstracts
→
View Item

dc.contributor.author	Vitharana, K.S.N.
dc.contributor.author	Kumara, P.G.P.
dc.date.accessioned	2026-06-02T05:00:56Z
dc.date.available	2026-06-02T05:00:56Z
dc.date.issued	2026-01-28
dc.identifier.isbn	978-624-5727-44-5
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5312
dc.description.abstract	The widespread use of social media has revolutionized the way people share personal problems, which is a new line of identifying the occurrence of mental illness at an early stage. This approach becomes especially vital in such circumstances as the Sri Lankan context, when cultural stigma is a real obstacle in the process of seeking help. In order to fill the gap of machine learning applications in this field, this paper explores the automated detection of mental health conditions on Facebook posts in Sinhalese. A mental health expert annotated a corpus of 3,096 posts with a multi-label classification schema (Anxiety, Depression, Suicidal Ideation, Irrelevant) to indicate possible comorbidities. A traditional Random Forest classifier and the new transformer-based models, BERT and RoBERTa with explicit hyperparameter settings, were tested and compared to perform this multilabel classification task. The performance analysis indicated that there are serious gaps. The Random Forest model obtained a low efficacy, indicated by its macro F1-score of 0.45, which is poor at predicting the important suicidal ideation class (F1-score: 0.33). This baseline was significantly low in comparison to the transformer models. The BERT model had a strong macro F1-score of 0.83, and the RoBERTa model had the best overall score of 0.85. These findings indicate the superiority of transformer-based models, namely RoBERTa, in this sensitive classification task. The analysis shows that natural language processing has the potential to be successfully used to detect the indicators of mental distress in the specific sociolinguistic environment of Sri Lanka. While limitations include depending on a single annotator and platform-specific data, ethical issues associated with the application in the real world. This study serves as a foundation for developing proactive digital solutions that can enforce mental health surveillance and early intervention, potentially overcoming the stigmatization barrier.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Computing. Sabaragamuwa University of Sri Lanka.	en_US
dc.subject	Mental Health Detection	en_US
dc.subject	Multi-Label Classification	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.subject	RoBERTa	en_US
dc.subject	Transformer Models	en_US
dc.title	A Study of Machine Learning Models for Text-Based Mental Health Prediction in Sri Lanka	en_US
dc.type	Article	en_US