A Study on Movie Users’ Perception Dimensions and Sentiment Orientation Based on LDA Topic Modeling and SnowNLP Sentiment Analysis: Evidence from Douban Movie Reviews

Abstract

With the sustained prosperity of the Chinese film market and the deepening development of internet social platforms, Douban Movie, as one of the most influential film review platforms in China, has accumulated a vast amount of user-generated content (UGC), serving as a critical data source for understanding audience film consumption behaviors and emotional attitudes. However, traditional manual analysis methods are limited in both efficiency and depth when confronted with massive volumes of unstructured review text. This study employs text mining techniques, integrating Latent Dirichlet Allocation (LDA) topic modeling with SnowNLP sentiment analysis, to systematically analyze 4,306 user reviews from the Douban Movie platform spanning 2020 to 2025. The original dataset encompasses reviews of 484 films (412 after preprocessing). Following preprocessing steps including text cleaning, jieba word segmentation, stopword filtering, and synonym normalization, the optimal number of topics was determined using the CV coherence metric, ultimately identifying five core thematic dimensions: Narrative Structure and Plot Evaluation, Acting Performance and Character Portrayal, Audio-Visual Effects and Production Quality, Social Value and Cultural Expression, and Viewing Experience and Platform Interaction. Sentiment analysis results indicate that audiences exhibit the highest emotional engagement with narrative structure and social value themes, with positive review proportions of 42.3% and 38.7%, respectively. The audio-visual effects theme shows the highest proportion of negative reviews at 58.2%, reflecting elevated audience expectations for the technical quality of domestic Chinese films. Annual trend analysis reveals fluctuating audience sentiment from 2020 to 2025, with a notable increase in review activity during 2023–2024. This study constructs a text mining analytical framework applicable to Chinese-language film reviews, providing data-driven empirical support for content creation and marketing decision-making in the film industry.

https://doi.org/10.70693/itphss.v3i2.367
PDF

References

Andono, P. N., Sunardi, S., Nugroho, R. A., & Harjo, B. (2022). Aspect-based sentiment analysis for hotel review using LDA, semantic similarity, and BERT. International Journal of Intelligent Engineering and Systems, 15(5), 232–243. https://doi.org/10.22266/ijies2022.1031.21

Bi, J. W., Liu, Y., Fan, Z. P., & Zhang, J. (2019). Exploring asymmetric effects of attribute performance on customer satisfaction in the hotel industry. Tourism Management, 77, 104006. https://doi.org/10.1016/j.tourman.2019.104006

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

Cheng, L. C., & Yang, Y. (2022). The effect of online reviews on movie box office sales: An integration of aspect-based sentiment analysis and economic modeling. Journal of Global Information Management, 30(1), 1–25. https://doi.org/10.4018/JGIM.298652

Chevalier, J. A., & Mayzlin, D. (2006). The effect of word of mouth on sales: Online book reviews. Journal of Marketing Research, 43(3), 345–354. https://doi.org/10.1509/jmkr.43.3.345

China Film Administration. (2024). 2023 nian zhongguo dianying shichang tongji baogao [2023 China film market statistical report]. https://www.chinafilm.gov.cn

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/N19-1423

Douban. (2024). Douban dianying [Douban Movie]. https://movie.douban.com

Elbarachi, S., Lakhrissi, Y., El Alami, J., & Talbi, F. (2022). Analyzing tourism reviews using an LDA topic-based sentiment analysis approach. International Journal of Advanced Computer Science and Applications, 13(11), 436–447. https://doi.org/10.14569/IJACSA.2022.0131151

Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge University Press.

Green, M. C., & Brock, T. C. (2000). The role of transportation in the persuasiveness of public narratives. Journal of Personality and Social Psychology, 79(5), 701–721. https://doi.org/10.1037/0022-3514.79.5.701

Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl 1), 5228–5235. https://doi.org/10.1073/pnas.0307752101

Guo, Y., Barnes, S. J., & Jia, Q. (2017). Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent Dirichlet allocation. Tourism Management, 59, 467–483. https://doi.org/10.1016/j.tourman.2016.09.009

Hassani, H., Bennis, S., & Moussaid, N. (2022). Weighted joint sentiment-topic model for sentiment analysis compared to ALGA: Adaptive lexicon learning using genetic algorithm. Applied Sciences, 12(16), 8176. https://doi.org/10.3390/app12168176

He, A., & Abisado, M. (2024). Text sentiment analysis of Douban film short comments based on BERT-CNN-BiLSTM-Att model. IEEE Access, 12, 45229–45237. https://doi.org/10.1109/ACCESS.2024.3381515

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4

Lee, J., Park, H., & Kim, J. (2021). LDA-based topic modeling sentiment analysis using topic/document/sentence (TDS) model. Applied Sciences, 11(23), 11091. https://doi.org/10.3390/app112311091

Li, L., Guan, Z., & Xing, F. (2024). Text sentiment analysis of film reviews using Word2Vec-LSTM. Proceedings of SPIE, 13181, 131810T. https://doi.org/10.1117/12.3031046

Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. Proceedings of the 18th ACM Conference on Information and Knowledge Management, 375–384. https://doi.org/10.1145/1645953.1646003

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016

Mao, H., Fan, Y., & Tong, M. (2024). Research on aspect-based sentiment analysis of movie reviews based on deep learning. Journal of Information Science. https://doi.org/10.1177/01655515241292353

Mudambi, S. M., & Schuff, D. (2010). What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Quarterly, 34(1), 185–200. https://doi.org/10.2307/20721420

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. https://doi.org/10.1561/1500000011

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 79–86. https://doi.org/10.3115/1118693.1118704

Rita, P., Moro, S., & Cavalcanti, G. (2022). The impact of COVID-19 on tourism: Analysis of online reviews in the airlines sector. Journal of Air Transport Management, 104, 102277. https://doi.org/10.1016/j.jairtraman.2022.102277

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399–408. https://doi.org/10.1145/2684822.2685324

Setiadi, D. R. I. M., Marutho, D., & Setiyanto, N. A. (2024). Comprehensive exploration of machine and deep learning classification methods for aspect-based sentiment analysis with LDA topic modeling. Journal of Future Artificial Intelligence and Technologies, 1(1), 1–15. https://doi.org/10.62411/faith.2024-13

Sun, J. (2012). Jieba Chinese text segmentation. GitHub Repository. https://github.com/fxsjy/jieba

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/COLI_a_00049

Wu, T., Li, Y., & Zhang, S. (2022). Typical opinions mining based on Douban film comments in animated movies. Chaos, Solitons & Fractals, 160, 112204. https://doi.org/10.1016/j.chaos.2022.112204

Yang, B., & Yecies, B. (2016). Mining Chinese social media UGC: A big-data framework for analyzing Douban movie reviews. Journal of Big Data, 3, 3. https://doi.org/10.1186/s40537-015-0037-9

Zhang, Y., & Zhang, L. (2022). Movie recommendation algorithm based on sentiment analysis and LDA. Procedia Computer Science, 199, 871–878. https://doi.org/10.1016/j.procs.2022.01.092

Zhao, K., Yang, X., Tao, X., Xu, X., & Zhao, J. (2020). Exploring the differential effects of online reviews on film’s box-office success: Source identity and brand equity from an integrated perspective. Frontiers in Psychology, 11, 217. https://doi.org/10.3389/fpsyg.2020.00217

Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E. P., Yan, H., & Li, X. (2011). Comparing Twitter and traditional media using topic models. Proceedings of the 33rd European Conference on Information Retrieval, 338–349. https://doi.org/10.1007/978-3-642-20161-5_34

Zhong, A., & Qiu, R. (2023). Topic analysis on user generated comments of Chinese mainland films. Proceedings of the 2023 2nd International Conference on Social Sciences and Humanities and Arts, 379–386. https://doi.org/10.2991/978-2-38476-062-6_48

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2026 Siyuan Wang, Liyao Xiao