Sentiment Analysis of Self-Published vs. Traditionally Published Books using Machine Learning

Jayasundara, J.M.G.N.; Adeeba, S.

Digital Library | SUSL Home
→
Research Publications
→
Proceedings
→
Workshops, Seminars, Symposiums ect
→
Faculty of Computing
→
COMPUTING UNDERGRADUATE RESEARCH SYMPOSIUM
→
ComURS2026 Computing Undergraduate Research Symposium : Abstracts
→
View Item

dc.contributor.author	Jayasundara, J.M.G.N.
dc.contributor.author	Adeeba, S.
dc.date.accessioned	2026-05-27T05:40:49Z
dc.date.available	2026-05-27T05:40:49Z
dc.date.issued	2026-01-28
dc.identifier.isbn	9786245727445
dc.identifier.uri	http://repo.lib.sab.ac.lk:8080/xmlui/handle/susl/5306
dc.description.abstract	The rapid growth of the self-publishing channels, such as Amazon Kindle Direct Publishing (KDP), has greatly changed the model of distribution of books across the world today because authors are able to bypass the traditional publishing framework. The conventional publishers have well-established editorial and marketing procedures, but the self-published authors have full creative freedom with more or less quality control. However, in spite of this change, there is a shortage of academic studies that use computational sentiment analysis to compare the per- ception of books by the readers under these two models in a systematic way. In this work, this gap is filled by comparing the attitudes of readers to self-published and traditionally published books based on the large dataset of Goodreads reviews. The study aims at (1) determining the patterns of sentiment between the two publishing models, (2) identifying the critical themes that determine the perceptions of the reader, and (3) assessing the contribution of platform vis- ibility and metadata in modulating the trend of sentiment. Text normalization, tokenization, and publisher classification by metadata preprocessed the reviews. In identifying the reviews in self-published and traditionally published classes, a TF-IDF vectorizer and a Logistic Regres- sion classifier were used. This model was able to accomplish an accuracy of 0.80 with a test sample of 125,757. The performance measures showed a precision of 0.82 and a recall of 0.78 for self-published books and a precision of 0.79 and a recall of 0.82 for traditionally published books. Furthermore, a DistilBERT model was used as an additional robustness test. The find- ings indicate that the sentiment of readers is fairly equal on both publishing models; however, selfpublished books have a greater diversification of sentiment distribution. The consistency of traditional books is probably higher because of the professional editing and the publication organization. The research has implications for those publishing and those being published in terms of marketing approaches, content suggestions and implications to authors in their choice of publication pathway.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Computing. Sabaragamuwa University of Sri Lanka.	en_US
dc.subject	Goodreads	en_US
dc.subject	Machine Learning	en_US
dc.subject	Sentiment Analysis	en_US
dc.subject	SelfPublishing	en_US
dc.subject	Traditional Publishing	en_US
dc.title	Sentiment Analysis of Self-Published vs. Traditionally Published Books using Machine Learning	en_US
dc.type	Article	en_US