Lai-Robbins lower bound (Lai-Robbins 1985 *Adv Appl Math* 6, 4) is the asymptotic information-theoretic floor on cumulative regret R_T = T mu^* - E[sum_t mu_{A_t}] in stochastic multi-armed-bandit problems. For any uniformly-good policy…
Lai-Robbins lower bound (Lai-Robbins 1985 *Adv Appl Math* 6, 4) is the asymptotic information-theoretic floor on cumulative regret R_T = T mu^* - E[sum_t mu_{A_t}] in stochastic multi-armed-bandit problems. For any uniformly-good policy…