Read Full Paper

Data-driven technology has the potential to not only predict how people will behave but also to predict what actions might encourage desirable behaviors. For example, the music app Spotify uses this technology to encourage its users to upgrade their accounts and engage with the platform. Focusing on Spotify’s playlist generation systems, HKUST’s Carlos Fernández-Loría and co-authors used data from a massive-scale experiment to predict which playlist generation system can lead each user to listen to the most songs.

One of their key findings is that “choosing [for everyone] the system that performs best, on average, does not significantly increase the total number of song streams.” In contrast, when data-driven technology is used to predict what would work best for each user, the number of song streams increases by up to 3.7%. As the authors explain, “different variants may work better for different users: If system A is best for new users and system B is best for more experienced users, then deploying the same system for all users would lead to suboptimal [system] assignments.” So, results are better if system A is deployed for new users and system B is deployed for experienced users.

The researchers describe how the task of deploying the best system for each user is an instance of a “treatment assignment problem,” where each possible playlist generation system corresponds to a different “treatment,” and ideally each individual is assigned to the treatment associated with the most beneficial outcome, such as the number of song streams in their study. The authors categorize existing data-driven algorithms for individualized treatment assignment into three types of “metalearners”: the Outcome Learner (O-learner), which learns to predict the outcome of each treatment; the Effect Learner (E-learner), which learns to estimate the causal effect of each treatment; and the Assignment Learner (A-learner), which directly learns which treatments are more likely to have the best outcome.

“At a first glance,” the authors state, “the policies estimated by the metalearners […] may look the same.” In practice, however, these types of algorithms display two important differences. First, they differ in their level of generality. O-learners are the most general (and therefore useful for multiple purposes) and A-learners are the least general—only useful for predicting optimal treatment assignments. Second, the metalearners vary in how they learn. Specifically, when O-learners and E-learners attempt improvements based on predicted outcomes and effects, the researchers warn, “the improvements may occur at the expense of worse treatment assignments!”

O-learners and E-learners may function well when it is helpful to estimate the precise outcome or effect of a treatment. However, the researchers point out, “[algorithms] that optimize things other than treatment-assignment prediction (i.e., better outcome or causal effect predictions) do not necessarily favor better treatment assignments.” In their massive-scale experiment, the researchers found that the A-learner policy produced the largest number of streams. “This is the case even with training data consisting of more than half a billion observations,” they note, “a surprising finding given that, in theory, all metalearners should converge to the same treatment-assignment policy with large enough data.”

“To our knowledge,” say the researchers, “no prior study has compared the three metalearners as defined in our study, either analytically or on a real-world application at scale.” Based on these novel findings, the authors encourage researchers and practitioners to pay more attention to the power of A-learners across fields—ranging from computer science to economics and the social sciences.