Abstract:
The widespread use of social media has revolutionized the way people share personal problems,
which is a new line of identifying the occurrence of mental illness at an early stage. This approach
becomes especially vital in such circumstances as the Sri Lankan context, when cultural
stigma is a real obstacle in the process of seeking help. In order to fill the gap of machine
learning applications in this field, this paper explores the automated detection of mental health
conditions on Facebook posts in Sinhalese. A mental health expert annotated a corpus of 3,096
posts with a multi-label classification schema (Anxiety, Depression, Suicidal Ideation, Irrelevant)
to indicate possible comorbidities. A traditional Random Forest classifier and the new
transformer-based models, BERT and RoBERTa with explicit hyperparameter settings, were
tested and compared to perform this multilabel classification task. The performance analysis indicated
that there are serious gaps. The Random Forest model obtained a low efficacy, indicated
by its macro F1-score of 0.45, which is poor at predicting the important suicidal ideation class
(F1-score: 0.33). This baseline was significantly low in comparison to the transformer models.
The BERT model had a strong macro F1-score of 0.83, and the RoBERTa model had the
best overall score of 0.85. These findings indicate the superiority of transformer-based models,
namely RoBERTa, in this sensitive classification task. The analysis shows that natural language
processing has the potential to be successfully used to detect the indicators of mental distress
in the specific sociolinguistic environment of Sri Lanka. While limitations include depending
on a single annotator and platform-specific data, ethical issues associated with the application
in the real world. This study serves as a foundation for developing proactive digital solutions
that can enforce mental health surveillance and early intervention, potentially overcoming the
stigmatization barrier.