Sabaragamuwa University of Sri Lanka

Entropy-driven attribute selection with SVM for robust breast cancer classification across continuous and discrete datasets

Show simple item record

dc.contributor.author Subawickrama, H.D.A.W.
dc.contributor.author Udagedara, U.G.I.G.K.
dc.contributor.author Nishantha, S.A.A.
dc.contributor.author Abeysundara, S.P.
dc.date.accessioned 2026-01-02T09:11:03Z
dc.date.available 2026-01-02T09:11:03Z
dc.date.issued 2025-12-01
dc.identifier.issn 2815-0341
dc.identifier.uri http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5111
dc.description.abstract Breast cancer is the second most common type of cancer among women. According to the World Health Organisation, it is reported that over 2.3 million new cases and 600,000 deaths occur annually worldwide, and approximately 3,000 people develop it each year in Sri Lanka. These figures highlight the importance of extremely sensitive testing methods in the improvement of early detection and survival. This study proposes a hybrid attribute selection and classification method with Shannon Entropy-based attribute selection combined with a Support Vector Machine (SVM) classifier for the detection of breast cancer. The aim of this study is to enhance classification performance while preserving the comprehensibility of original clinical features, thereby providing a robust tool that can be applied in real-world medical settings. Unlike other dimensionality reduction techniques, such as Principal Component Analysis (PCA), where the original features are transformed into new ones, the entropy-based approach proposed here retains the intuitive interpretable nature of the original features so that model output is easier to understand and more clinically relevant. Two benchmark datasets were used to evaluate the methodology: the Wisconsin Diagnostic Breast Cancer (WDBC) dataset with 30 continuous features, and the Wisconsin Breast Cancer (WBC) dataset with 9 discrete features. Dataset-specific preprocessing techniques were applied. Shannon Entropy was used to calculate information gain for all features, and informative features were selected visually based on bar plots and cumulative curves. Thirteen features were selected for the WDBC dataset and 5 features for the WBC dataset. The selected features were used to train and test the SVM classifiers. The model achieved a measure of accuracy as 94.17% on the WDBC dataset and 96.10% on the WBC dataset. Precision, recall, and F1-metrics for the benign and malignant classes demonstrated strong classification performance. The suggested method represents an extremely accurate, explainable, and computationally lightweight technique for the diagnosis of breast cancer. Moreover, the Shannon entropy-based method is useful for continuous as well as discrete datasets. en_US
dc.language.iso en en_US
dc.publisher Sabaragamuwa University of Sri Lanka en_US
dc.subject Attribute selection en_US
dc.subject Breast cancer detection en_US
dc.subject Classification en_US
dc.subject Shannon entropy en_US
dc.subject Support vector machine en_US
dc.title Entropy-driven attribute selection with SVM for robust breast cancer classification across continuous and discrete datasets en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account