printable pdf
比利时vs摩洛哥足彩 ,
university of california san diego

****************************

math 288 - stochastic systems seminar

angela yu

ucsd

three wrongs make a right: reward underestimation mitigates idiosyncrasies in human bandit behavior

abstract:

combining a multi-armed bandit task and bayesian computational modeling, we find that humans systematically under-estimate reward availability in the environment. this apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision-making under uncertainty, such as a default tendency to assume non-stationarity in environmental statistics as well as the adoption of a simplistic decision policy. in particular, reward rate underestimation discourages the decision-maker from switching away from a ``good'' option, thus achieving near-optimal behavior (which never switches away after a win). furthermore, we demonstrate that the bayesian model that best predicts human behavior is equivalent to a particular class of reinforcement learning models, thus giving statistical, normative grounding to phenomenological models of human behavior.

host: ruth williams

january 23, 2020

2:00 pm

ap&m 7218

****************************