Deep Learning Approach for Classifying AI-generated and Human-written Sinhala Answers

Ranathunga, R.A.D.K; Rupasingha, R.A.H.M.; Kumara, B.T.G.S.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
COMPUTING UNDERGRADUATE RESEARCH SYMPOSIUM
→
Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025
→
View Item

dc.contributor.author	Ranathunga, R.A.D.K
dc.contributor.author	Rupasingha, R.A.H.M.
dc.contributor.author	Kumara, B.T.G.S.
dc.date.accessioned	2025-12-15T07:47:35Z
dc.date.available	2025-12-15T07:47:35Z
dc.date.issued	2025-02-19
dc.identifier.citation	Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025, Faculty of Computing, Sabaragamuwa University of Sri Lanka.	en_US
dc.identifier.isbn	978-624-5727-57-5
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4986
dc.description.abstract	With increased use of AI on answering academic questions, a real concern arises on cases of cheating especially where languages such as Sinhala are hardly supported by effective detection systems. As a result, it becomes difficult to distinguish content written by AI from human written content, which compromises and distorts fair evaluations and originality within education. To address this issue, this study introduces a deep learning model to differentiate AI-generated and human-written Sinhala answers. A proposed model is presented that enables the recognition of real Sinhala answers written by humans and those generated by AI. This help to prevent cheating in academic settings while overcoming shortages of resources for the Sinhala language study. A stepwise methodology is used, with data gathering at the first stage, which includes 1000 questions from academic areas of history, science, business studies and Buddhism with human-provided answers, and artificially intelligent responses. The text pre-processing like stemming, tokenization and elimination of stop-words are imposed on the data. Term frequency-inverse document frequency transforms the textual data into numerical forms that can be fed to actual learning algorithms. Then, the two algorithms were used such as Artificial Neural Networks (ANN) and the Long Short-Term Memory (LSTM). Based on the results LSTM with 86% accuracy out performs the accuracy of the ANN, therefore can be conclude to LSTM is better than the ANN. As well as the recall, F1-score and error values better in LSTM. Different hyperparameters and percentage split used for the evaluations. Data collection and computer issues are few challenges faced during this research. This research provides a solution for cheat detection in low frequent languages. It forms the basis for subsequent work that aims to detect content in AI underrepresented languages; further work will use cosine similarity to explore the relationship between lecturer and AI responses.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Computing, Sabaragamuwa University of Sri Lanka	en_US
dc.subject	AI-generated	en_US
dc.subject	Classification	en_US
dc.subject	Deep learning	en_US
dc.subject	Human-written	en_US
dc.subject	Sinhala language	en_US
dc.title	Deep Learning Approach for Classifying AI-generated and Human-written Sinhala Answers	en_US
dc.type	Article	en_US