Sabaragamuwa University of Sri Lanka

Enhanced U-Net Architecture for Analyzing Sinhala Document Layouts and Styles

Show simple item record

dc.contributor.author Hulathdoowage, S.K.D.
dc.contributor.author Kumara, B.T.G.S.
dc.date.accessioned 2025-12-12T09:47:26Z
dc.date.available 2025-12-12T09:47:26Z
dc.date.issued 2025-02-19
dc.identifier.citation Abstracts of the ComURS2025 Computing Undergraduate Research Symposium 2025, Faculty of Computing, Sabaragamuwa University of Sri Lanka. en_US
dc.identifier.isbn 978-624-5727-57-5
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/4966
dc.description.abstract Document layout analysis is a process of identifying and segmenting various elements within a document. However, accurate digitization and information extraction from documents require effective analysis of complex layouts, particularly in documents with diverse elements such as titles, images, paragraphs, tables, and mathematical expressions. Additionally, comprehensive layout analysis alone is insufficient for document digitization; style analysis plays a critical role in preserving structural and typographical integrity, which is essential for accurate text recognition in Sinhala script. This research proposes an enhanced U-Net architecture for the semantic segmentation of Sinhala document layouts and font styles. A dataset of 600 manually annotated Sinhala document images with 27 labels, including Title Level 1, Title Level 2, Title Level 3, Paragraph, Table, Image, Text Bold, and Text Italic, was used. Furthermore, it improves optical character recognition performance by element-wise integration of optical character recognition technologies, ensuring improved text extraction accuracy. The initial convolutional layers of the U-Net encoder were integrated with a vision transformer block. The input image was divided into patches, which were flattened and processed by the vision transformer block with adaptive positional embedding. The accuracy, precision, recall, and F1-score for the test dataset were 79.27%, 71.12%, 69.85%, and 70.48%, respectively. These modifications enabled the model to capture long-range dependencies and global context in the input images, potentially improving feature extraction. Compared to conventional U-Net models, this approach demonstrated superior segmentation accuracy, particularly in complex document structures. Finally, this study contributes to Sinhala document digitization by providing a comprehensive framework for layout and style analysis, enhancing OCR performance, and offering adaptability for multilingual document processing in real-world applications such as automated archiving and digital library systems. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing, Sabaragamuwa University of Sri Lanka en_US
dc.subject Document layout analysis en_US
dc.subject Optical character recognition en_US
dc.subject Semantic segmentation en_US
dc.subject U-net architecture en_US
dc.subject Vision transformers en_US
dc.title Enhanced U-Net Architecture for Analyzing Sinhala Document Layouts and Styles en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account