
Massive amounts of unstructured textual data, such as large volumes of online consumer reviews, “provide an unprecedented opportunity for firms to understand consumer word-of-mouth, forecast product sales, and monitor product defects,” say HKUST’s Yi Yang and a colleague. Topic modeling is a popular textual analysis technique to analyze such data. However, such models are not stable and the results are often challenging to replicate. The researchers propose a more stable approach to topic modeling that will benefit a wide range of scholars and practitioners.
Topic modeling is widely used to extract insights from textual data. “In real-world settings, companies collect massive amounts of text data from multiple online platforms,” the authors say. “Topic models are often used in such commercial contexts to extract insights from user-generated content.” These insights can improve companies’ marketing, operations, and profitability.
However, topic models may produce different results each time they are run. This instability “can undermine their usability and potentially decrease scholars’ trust in modeling outcomes.” The researchers therefore propose an approach called Stable LDA that leverages word association relationships to increase the stability of topic modeling. This approach, they say, is “unsupervised, reliable,and minimizes the overall effort of researchers.”
The researchers validated their proposed approach using real-world data and found that it could “significantly improve model stability while maintaining or even improving topic model quality.” Case studies confirmed that their method reduced estimation inconsistencies. The researchers have made their implementation publicly available to enable their work to be replicated and improved and to encourage further research.
This work has valuable practical implications. “The stability of textual analysis is crucial,” the researchers say, because “consistent and accurate results help deliver actionable insights.” Their readily applicable method will benefit scholars and practitioners working in disciplines ranging from management science to marketing and political science.