A Study of Machine Learning Models for Text-Based Mental Health Prediction in Sri Lanka

Vitharana, K.S.N.; Kumara, P.G.P.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
COMPUTING UNDERGRADUATE RESEARCH SYMPOSIUM
→
ComURS2026 Computing Undergraduate Research Symposium : Abstracts
→
View Item

A Study of Machine Learning Models for Text-Based Mental Health Prediction in Sri Lanka

Vitharana, K.S.N.; Kumara, P.G.P.

URI: http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5312

Date: 2026-01-28

Abstract:

The widespread use of social media has revolutionized the way people share personal problems, which is a new line of identifying the occurrence of mental illness at an early stage. This approach becomes especially vital in such circumstances as the Sri Lankan context, when cultural stigma is a real obstacle in the process of seeking help. In order to fill the gap of machine learning applications in this field, this paper explores the automated detection of mental health conditions on Facebook posts in Sinhalese. A mental health expert annotated a corpus of 3,096 posts with a multi-label classification schema (Anxiety, Depression, Suicidal Ideation, Irrelevant) to indicate possible comorbidities. A traditional Random Forest classifier and the new transformer-based models, BERT and RoBERTa with explicit hyperparameter settings, were tested and compared to perform this multilabel classification task. The performance analysis indicated that there are serious gaps. The Random Forest model obtained a low efficacy, indicated by its macro F1-score of 0.45, which is poor at predicting the important suicidal ideation class (F1-score: 0.33). This baseline was significantly low in comparison to the transformer models. The BERT model had a strong macro F1-score of 0.83, and the RoBERTa model had the best overall score of 0.85. These findings indicate the superiority of transformer-based models, namely RoBERTa, in this sensitive classification task. The analysis shows that natural language processing has the potential to be successfully used to detect the indicators of mental distress in the specific sociolinguistic environment of Sri Lanka. While limitations include depending on a single annotator and platform-specific data, ethical issues associated with the application in the real world. This study serves as a foundation for developing proactive digital solutions that can enforce mental health surveillance and early intervention, potentially overcoming the stigmatization barrier.

Show full item record

Files in this item

Name: ComURS-2026_(2)-p ...

Size: 28.07Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

ComURS2026 Computing Undergraduate Research Symposium : Abstracts [54]
"Next-Gen Solutions for a Digitally Connected World"

A Study of Machine Learning Models for Text-Based Mental Health Prediction in Sri Lanka

A Study of Machine Learning Models for Text-Based Mental Health Prediction in Sri Lanka

Abstract:

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account