比利时vs摩洛哥足彩
,
university of california san diego
****************************
center for computational mathematics seminar
yuhua zhu
ucsd
a pde based bellman equation for continuous-time reinforcement learning
abstract:
in this paper, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. when the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively perform policy evaluation? we first demonstrate that the commonly used bellman equation is a first-order approximation to the true value function. we then introduce a higher order pde-based bellman equation called phibe. we show that the solution to the i-th order phibe is an i-th order approximation to the true value function. additionally, even the first-order phibe outperforms the bellman equation in approximating the true value function when the system dynamics change slowly. we develop a numerical algorithm based on galerkin method to solve phibe when we possess only discrete-time trajectory data. numerical experiments are provided to validate the theoretical guarantees we propose.
february 6, 2024
11:00 am
zoom only, id 990 3560 4352
****************************