Abstract:
With increased use of AI on answering academic questions, a real concern arises on cases of cheating especially where languages such as Sinhala are hardly supported by effective detection systems. As a result, it becomes difficult to distinguish content written by AI from human written content, which compromises and distorts fair evaluations and originality within education. To address this issue, this study introduces a deep learning model to differentiate AI-generated and human-written Sinhala answers. A proposed model is presented that enables the recognition of real Sinhala answers written by humans and those generated by AI. This help to prevent cheating in academic settings while overcoming shortages of resources for the Sinhala language study. A stepwise methodology is used, with data gathering at the first stage, which includes 1000 questions from academic areas of history, science, business studies and Buddhism with human-provided answers, and artificially intelligent responses. The text pre-processing like stemming, tokenization and elimination of stop-words are imposed on the data. Term frequency-inverse document frequency transforms the textual data into numerical forms that can be fed to actual learning algorithms. Then, the two algorithms were used such as Artificial Neural Networks (ANN) and the Long Short-Term Memory (LSTM). Based on the results LSTM with 86% accuracy out performs the accuracy of the ANN, therefore can be conclude to LSTM is better than the ANN. As well as the recall, F1-score and error values better in LSTM. Different hyperparameters and percentage split used for the evaluations. Data collection and computer issues are few challenges faced during this research. This research provides a solution for cheat detection in low frequent languages. It forms the basis for subsequent work that aims to detect content in AI underrepresented languages; further work will use cosine similarity to explore the relationship between lecturer and AI responses.