Existing studies into ordinal optimization fall into one of two categories: fixed budget or fixed confidence. With fixed budget, the aim is to minimize the probability of false selection given a sampling budget. In a fixed confidence setting, the target is to come up with a sampling procedure that satisfies a desired guarantee on the probability of false selection by taking as few samples as possible.
A recent study by Shin, Broadie, and Zeevi looks at an ordinal optimization problem where, given a limited number of populations or ‘systems’, the objective is to dynamically learn the statistical characteristics of the systems in order to select the one with the highest mean—the best from several competing alternative systems. This is a situation where the systems cannot be evaluated analytically but it is possible to sequentially sample from each system subject to a given sampling budget. In other words, when the probability distributions governing each system’s performance are unknown, but can be learned through sampling. A commonly used performance measure for sampling policies is the probability of false selection—the chance of a policy mistakenly selecting a suboptimal system. It is important to note that this objective is not analytically easy to handle.
To overcome the challenges faced by previous studies, the research focuses on sampling procedures that are able to be practically implemented, that are not restricted by assumptions, and that simultaneously exhibit asymptotic performance guarantees. Specifically, the authors show that if the difference in means between the best and second-best systems is sufficiently small, the probability of false selection can be approximated by the rate function corresponding to a Gaussian distribution, which is structured around the first two moments of the underlying probability distributions. Second, based on the two-moment approximation, the study proposes a dynamic sampling policy called the Welch Divergence (WD), and analyses its asymptotic performance as the sampling budget grows large. Finally, building upon the structural properties of WD, the authors provide an adaptive variant of WD that performs more efficiently and exhibits attractive numerical performance when the number of systems is large.
As with most of the extant literature, the study’s analysis is performed in a setting where samples are taken independently within a system and across systems. Methodologies, such as common random numbers, are outside the scope of the analysis. Nevertheless, the authors allow for sampling policies to take multiple samples in each stage. As such, the study’s algorithms are able to be used in parallel computing environments. From a theoretical perspective, the authors characterize the class of problem instances where misspecification due to the Gaussian assumption is not a major concern. Practically, they address the implementation challenges regarding the rate function; approximated by estimating the first two moments of the underlying probability distributions, thereby alleviating the need to estimate cumulant generating functions.
Ultimately, by looking at the asymptotics of the probability of false selection, the study was able to obtain a close approximation to the large deviations rate function, and leverage it to design well-performing dynamic sampling policies compared to other benchmark policies. An important conclusion the authors made is that the rate function for general distributions can be closely approximated by using only the first two moments. This allowed them to construct WD and adaptive WD policies that were easy to deal with for implementation purposes. While the study primarily focused on the probability of false selection for a single best system, the authors are confident that their developed methodology can be extended to fit other criteria.