Multilayer Perceptron-based Source Code Classification

Mohamed, I.; Kumara, B.T.G.S.; Banujan, K.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Applied Sciences
→
Applied Sciences Undergraduate Research Symposium (APSURS) 2022
→
View Item

dc.contributor.author	Mohamed, I.
dc.contributor.author	Kumara, B.T.G.S.
dc.contributor.author	Banujan, K.
dc.date.accessioned	2023-09-16T06:38:05Z
dc.date.available	2023-09-16T06:38:05Z
dc.date.issued	2022-04-06
dc.identifier.isbn	978-624-5727-21-6
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/3937
dc.description.abstract	One of the most crucial stages in the software development life cycle is the implementation stage. Source code is the most critical component in a software application. Developers develop new source code from scratch or reuse old program code functionalities according to project’s requirements. Instead of developing source code functionalities, most programmers devote considerable time seeking and searching old source files. Therefore, it is critical to have an effective and efficient way for searching source code functions. Topic modeling is one way for extracting topics from source code. Even though statistical modeling techniques have been used to implement several topic modeling approaches, they possess several limitations. Non-formal code components such as method names, identifiers, and comments are used in this regard. The syntax of a language refers to the rules that define its structure. Without syntax, the semantics of a language are nearly impossible to comprehend. Addressing these concerns, the author used a machine-learning algorithm to predict the source code functionality names. The results are solely dependent on the syntax or algorithm of the source code. This study focuses on three Java project functionalities: primary number, Selection sort, and Fibonacci number. The data set was acquired from the Git open-source repository which is an open-source platform supported by developers worldwide. Four hundred and fifty software projects were analyzed, and 23 variables were considered. The source code components are extracted using the Java parser library, creating an abstract syntax tree to extract the source code features precisely. Then an algorithm is developed to get the count matrices of source code features. The data set was then fed into an Artificial Neural Network machine learning model which yielded 95.4% accuracy rate, 95.5% precision, 95.4% recall, and 95.4% F1-score, with a low error rate of 0.033.	en_US
dc.language.iso	en	en_US
dc.publisher	Sabaragamuwa University of Sri Lanka	en_US
dc.subject	Artificial Neural Network	en_US
dc.subject	Source Code	en_US
dc.subject	Java Parser library	en_US
dc.subject	Abstract Syntax Tree	en_US
dc.title	Multilayer Perceptron-based Source Code Classification	en_US
dc.type	Book	en_US