比利时vs摩洛哥足彩
,
university of california san diego
****************************
math 288 - stochastic systems seminar
angela yu
ucsd
three wrongs make a right: reward underestimation mitigates idiosyncrasies in human bandit behavior
abstract:
combining a multi-armed bandit task and bayesian computational modeling, we find that humans systematically under-estimate reward availability in the environment. this apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision-making under uncertainty, such as a default tendency to assume non-stationarity in environmental statistics as well as the adoption of a simplistic decision policy. in particular, reward rate underestimation discourages the decision-maker from switching away from a ``good'' option, thus achieving near-optimal behavior (which never switches away after a win). furthermore, we demonstrate that the bayesian model that best predicts human behavior is equivalent to a particular class of reinforcement learning models, thus giving statistical, normative grounding to phenomenological models of human behavior.
host: ruth williams
january 23, 2020
2:00 pm
ap&m 7218
****************************