Sabaragamuwa University of Sri Lanka

A Comparative Analysis of Deep Learning Algorithms for Formality Classification in Texts Using Linguistic Features

Show simple item record

dc.contributor.author Karunarathna, K.M.G.S.
dc.contributor.author Rupasingha, R.A.H.M.
dc.contributor.author Kumara, B.T.G.S.
dc.date.accessioned 2025-02-25T09:45:35Z
dc.date.available 2025-02-25T09:45:35Z
dc.date.issued 2025-02-19
dc.identifier.issn 3084-8911
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4871
dc.description.abstract Because of the wide variety of formal and informal writing styles brought about by the rapid growth of digital communication, the classification of documents based on it becomes a challenging task. Using a variety of variables, this work seeks to increase the accuracy of formality classification algorithms. Grammar, vocabulary, punctuation, and sentence structure are some stylistic components that define various writing styles, and traditional approaches have trouble distinguishing between them. Differentiating between formal and informal language is becoming increasingly important in applications such as research papers, legal documents, informal letters, NEWS, etc. The objective of this approach is to use linguistic features, examines how well deep learning algorithms classify documents as formal and informal. The study collected dataset of 5,000 text samples. The text files contained 2500 formal letters, news items as formal documents, and remaining are personal blogs, personal letters as informal documents. Next pre-processed all data using stop word removal, lemmatization, tokenization and lowercasing. Formal and informal categories which include pronouns, grammar, vocabulary, slang, acronyms, language and initialisms seven linguistic features were targeted for this study and those features are extracted. Then these seven features are combined to generated the feature vector for each document. The generated feature vector was applied and in order to classify documents, three deep learning models Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks are trained. Here, ANN learns nonlinear patterns in data, CNN identifies text sections, and LSTM considers word position and those are selected based on the literature review. The performance of each model is compared using different test splitting methods and cross-validation techniques. According to experimental data, the LSTM model outperforms ANN and CNN in terms of precision, recall and f measure metrics, achieving the highest classification accuracy of 89.4% with an epoch size of 100 and a batch size of 32 with lowest error rate for Mean Absolute Error and Root Mean Squared Error. The results highlight how well LSTM can detect linguistic subtleties and offer suggestions for improving formality recognition in Natural Language Processing applications, which will help with more context sensitive text classification. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing, Sabaragamuwa University of Sri Lanka, P.O. Box 02, Belihuloya, 70140, Sri Lanka. en_US
dc.subject Classification en_US
dc.subject Deep Learning en_US
dc.subject Formal documents en_US
dc.subject Informal documents en_US
dc.subject Linguistic Features en_US
dc.title A Comparative Analysis of Deep Learning Algorithms for Formality Classification in Texts Using Linguistic Features en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account