A closer look into three main kinds of data-driven marketing analytics – descriptive, predictive and prescriptive.

Assistant Professor LIU Jia
Department of Marketing, HKUST Business School

Data-driven marketing simply put means using quantitative methods to derive meaning from data to make informed marketing decisions. Thanks to the availability of a massive amount of data, data analytics in marketing has become more important than ever. According to a BMO Capital Markets report, marketers spend US$50 billion per year on big data and advanced analytics to improve marketing’s impact on business. Research by McKinsey1 shows companies that invest in big data and analytics yield a five to six percent average increase in profits, which jumps to nine percent for investments spanning five years.

In many cases, companies have focused on more open-ended efforts to gain novel insights from big data. These efforts were fuelled by analytics vendors and data scientists who were eager to take data and run all types of analyses in the hope of finding diamonds. Depending on the stage of the workflow and the requirement of data analysis, there are three main kinds of data driven marketing analytics – descriptive, predictive and prescriptive. Their underlying techniques include, but are not limited to, statistical modelling, data mining, machine learning, and AI.

In this article, I will present some applications of each of the three analytics using recent research in marketing, most of which reflect my own (working) papers on marketing analytics.

Descriptive Analytics

Descriptive analytics refers to the interpretation of historical data to identify trends and patterns, answering the question “What has happened?”. This means descriptive analytics can help identify potential problems or future opportunities for business.

In my collaboration with the Microsoft Bing search engine2, we have developed an interpretable machine learning model that can track, quantify, and interpret users’ topical preferences underlying each search query and across search contexts (e.g., time, location, and demographics). The proposed model leverages data on user queries, subsequent click-through on search results, and all textual information encountered on the search engine. The outputs of our proposed model can help search engines (advertisers) improve their search results (keyword and ad copy strategies).

A group of researchers at Humboldt University Berlin3 proposed a methodology to analyze the co-occurrences of products in consumer shopping baskets, by leveraging the recent advances in natural language processing and machine learning. Their proposed method is well-suited to retailers because it relies on data that is readily available from their checkout systems and facilitates their analyses of cross-category product complementarity, in addition to within-category substitution. The proposed approach also has high usability because it is automated and scalable.

In a similar spirit, my working paper on the vending market4 aims to help retailers improve their product assortment strategies across different vending locations and within each vending location simultaneously. We propose an automatic machine learning model using consumer transaction data across different vending locations. Our proposed model can (i) profile different consumer segments driven by location preferences, (ii) quantify the differences in product preferences across these consumer segments, and (iii) understand the relationships (e.g., substitution and complementarity) across products and vending locations.

Predictive Analytics

Predictive analytics refers to the process of using current and/or historical data, combined with statistical techniques, to assess the likelihood of a certain event happening in the future by answering the questions: “What will happen?” or “When will it happen?”. Predictive analytics have been used for a variety of prediction problems in marketing, such as credit scoring, new customer acquisitions, customer churn, customer lifetime value, click-through on ads, purchase incidence, and product recommendations. Therefore, predictive analytics can help improve many areas of a business, including efficiency, customer service, and risk reduction.

For example, the reason that our proposed method in [2] can be useful to search engines and search advertisers is that it can lead to improved predictions of user click-through rates on search results. [5] provides an excellent review on the existing methodologies for user response prediction in online advertising. These methods can also be applied for prediction problems in other contexts.

In particular, Deep Learning (DL) has received significant attention for prediction tasks in marketing over the past few years. DL is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks, in which multiple layers of processing are used to extract progressively higher level features from data. In addition to superior prediction accuracy, DL can process much more information from many different sources. Not only is it able to do this on an unparalleled scale, but it is also capable of bringing together disparate types of information — images, audio, app data, clickstream data, location data, social network data — in a way that other systems cannot handle.

Nevertheless, one of the common complaints about the existing predictive analytics is that the tools are too much like black boxes, with no insight into what goes on during the process after input. Hence, there may be potential negative consequences when blindly relying on algorithms to optimize predictions. For example, a recent study6 shows that algorithms underpinning Facebook’s advertising platform started to discriminate by showing different people different types of housing or employment opportunities. That ultimately resulted in the algorithm showing more ads for secretarial jobs to women and more ads for jobs in the lumber industry to men, as one example. Therefore, it is important for companies to decide how to trade-off between accuracy and interpretability/transparency for a given problem on hand.

Prescriptive Analytics

Prescriptive analytics refer to the application of testing and other techniques to determine which outcome will yield the best result in a given scenario by answering the questions: “why it will happen?” or “What should I do?”. Prescriptive Analytics often extends beyond predictive analytics by specifying both the actions necessary to achieve predicted outcomes and the interrelated effects of each decision. Please see Diaphragm 1 for an illustration of the relationship between predictive and prescriptive analytics.

Diaphragm 1: The Relationship between Predictive and Prescriptive Analytics

Randomized experiments, such as A/B testing, are commonly used by companies to determine the best ad copy, pricing point, targeting strategy, etc. The basic idea is to randomly allocate the experimental units across different treatment conditions, and then identify the treatment condition that yields the best outcome of interests. The biggest advantage of randomization is to reduce bias by equalising other factors that have not been explicitly accounted for the in study design. However, randomized experiments are either too costly or infeasible in most business problems. Therefore, a major class of prescriptive analytics generate proactive decisions on the basis of the predictive analytics outcomes using quasi-experimental or observational data.

In my recent publication7, we aim to help practitioners understand the value of investigating in “moment marketing” which entails the ability to synchronize online advertising (e.g., sponsored search) in real time with relevant offline events such as TV ads. We conduct causal estimation by leveraging large variations in TV advertising expenditure over a long period for a major brand in the U.S. fast food industry. Based on statistical analysis, we show that TV-moment-based search advertising could be effective for optimizing sponsored search advertising for both TV-advertised brands and their competitors. We also document the mechanisms driving such cross-channel advertising effects. Specifically, TV advertising can change the quality of online search traffic (e.g., who searches, where they search, and how they search) in the moments following a TV ad, so that an average searcher responds differently to subsequent search results.

In a similar project8, we measure the effects of major TV advertisers’ temporary discontinuation of TV advertising on consumer keyword search behavior. We leverage a field experiment in the US wireless industry in which the focal brand stopped its TV advertising for one randomly chosen week. We develop a statistical method to simulate the search volume for different topical search keywords under different TV advertising expenditure at one point in time. These simulations can provide insights for advertisers to improve their cross-channel advertising strategies.

In my working paper on livestreaming markets9, we estimate how consumer price elasticity (or willingness-to-pay) for knowledge goods varies over the entire product life-cycle. We achieve this by leveraging the most recent advances in machine learning and statistics. Our proposed model is trained using large-scale historical transaction records, along with high-dimensional information about products, sellers, consumers, and the platform. Our proposed methodology can not only provide descriptive insights on consumer purchase patterns, but also derive optimal pricing point for a give product over each time period of the product life-cycle.

Looking Forward

Businesses are increasingly utilising data to discover insights that can aid them in creating business strategy, making decisions, and delivering better products, services and personalised online experiences. The three distinct marketing analytics are complementary and valuable in this process.

While models and algorithms often outperform humans, domain knowledge, reasoning, and, ultimately, decision making rest in the hands of an end user. Since many of the algorithms underlying the analytics take a black-box approach, it often leaves little room for injecting domain expertise, and can result in frustration from analysts when results seem spurious or confusing. Therefore, it is very important for business leaders to create interdisciplinary teams to continuously monitor and evaluate data for bias.

Last but not least, with the growing concern about privacy, new regulations have driven companies to take steps and examine how they can make their data more transparent to customers, as well as more reliable and relevant. As a result, this calls for efforts and techniques that can deliver a solution that doesn’t need a wealth of data to function but is also able to process, whenever possible, large volumes of data.


1 Gordon, Jonathan, Jesko Perrey, and Dennis Spillecke. "Big data, analytics and the future of marketing and sales." McKinsey: Digital Advantage (2013).
2 Liu, Jia, Olivier Toubia, and Shawndra Hill (2020), “Content-based Model of Web Search Behavior: An Application to TV Show Search.” forthcoming at Management Science.
3 Gabel, Sebastian, Daniel Guhl, and Daniel Klapper. "P2V-MAP: Mapping market structures for large retail assortments." Journal of Marketing Research 56.4 (2019): 557-580.
4 Liu, Jia, and Kohei Kawaguchi, “Location-Based Market Structure: A Dynamic Analysis of Product Assortment and Consumer Purchases in Panel Data.” Working paper.
5 Gharibshah, Zhabiz, and Xingquan Zhu. "User Response Prediction in Online Advertising." ACM Computing Surveys (CSUR) 54.3 (2021): 1-43.
6 Ali, Muhammad, et al. "Discrimination through optimization: How Facebook's Ad delivery can lead to biased outcomes." Proceedings of the ACM on Human-Computer Interaction 3.CSCW (2019): 1-30.
7 Liu, Jia, and Shawndra Hill (2021), “Moment Marketing: Measuring Dynamics in Cross-channel Ad Effectiveness.” Marketing Science, 40 (1), 13-22.
8 Liu, Jia, Shawndra Hill, and David Rothschild (2021), “The Impact of Temporally Turning off TV Ad on Search: A Generalized Synthetic Control Estimator under Interference.” Working paper.
9 Cong, Ziwei, Jia Liu, and Puneet Manchanda. "The Role of 'Live' in Livestreaming Markets: Evidence Using Orthogonal Random Forest." Working paper.