Allerton 2015 Paper Abstract


Paper ThD3.2

Vakili, Sattar (Cornell Univeristy), Zhao, Qing (Cornell University)

Mean-Variance and Value at Risk in Multi-Armed Bandit Problems

Scheduled for presentation during the Regular Session "Machine Learning II" (ThD3), Thursday, October 1, 2015, 15:50−16:10, Butternut

53rd Annual Allerton Conference on Communication, Control, and Computing, Sept 29-Oct 2, 2015, Allerton Park and Retreat Center, Monticello, IL, USA

This information is tentative and subject to change. Compiled on December 5, 2021

Keywords Statistical Signal Processing, Universal Algorithms and Machine Learning, Optimization


We study risk-averse multi-armed bandit problems under different risk measures. We consider three risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second and the third models, the quantity of interest is the total reward at the end of the time horizon, and the objective is to minimize the mean-variance and maximize the value at risk of the total reward, respectively. We develop risk-averse online learning policies and analyze their regret performance. We also provide tight lower bounds on regret under the model of mean-variance of observations.



All Content © PaperCept, Inc..

This site is protected by copyright and trademark laws under US and International law.
All rights reserved. © 2002-2021 PaperCept, Inc.
Page generated 2021-12-05  09:33:29 PST  Terms of use