With the rise of search engines such as Google, it is becoming increasingly important to determine how consumers use these tools to find products, services, or information. Companies that understand their customers’ search behaviour can better tailor the content of their websites and pay for the right advertising on search engines. To help firms make best use of these techniques, HKUST’s Professor Jia Liu and her co-author developed a novel mathematical model to enhance the match between webpages and the terms customers use to search for specific goods and services.

The search queries that customers construct can tell us a lot about their content preferences. In turn, this information can help companies enhance their visibility by predicting search behavior. However, obstacles remain. “Despite the importance of being able to infer consumers’ content preferences from their queries,” say the researchers, “very little research has been done in this area.” Text-based search behavior remains particularly unclear.

With this in mind, the researchers sought to address two key challenges. First, search queries tend to be very short—generally fewer than five words. Second, how people choose words to type into a search engine tends to be complex. “Consumers may not necessarily formulate queries that exactly and directly reflect their content preferences,” the researchers explain.

Overcoming these challenges required a new kind of probabilistic topic model. The technique developed by the authors, called hierarchically dual latent Dirichlet allocation (HDLDA), builds on algorithms that can extract topics from text. This model, say the researchers, can combine “information from multiple sparse search queries and their associated search results and explicitly quantify the mapping between queries and results.” This is particularly “useful for firms interested in customizing content (e.g., display or search advertising) based on a consumer’s query.”

To assess the novel HDLDA model, the authors performed two rigorous tests. First, they ran a laboratory experiment to explore how real-life consumers construct search queries. They set the participants specific tasks involving finding information online, and tracked their search behavior. Encouragingly, the researchers found that the model correctly estimated content preferences from queries.

Second, the authors analyzed field data from a large online travel company that advertises extensively on Google. The results illustrated the practical relevance of the new model. The advantages of HDLDA for companies lie in its ability to analyze the links between queries and webpages “on the fly”—in real time and without human intervention. “By empowering marketers/advertisers to infer consumers’ content preferences from queries,” the researchers explain, “our work can help them create more relevant content and promote that content more effectively.”

Perhaps most significantly, the researchers tell us, HDLDA can be “applied to any context in which one type of documents [is] semantically linked to another type of document.” As online search advertising is a US$90 billion industry, the researchers’ pioneering model has the potential to exert a significant financial impact in a wide range of fields.