printable pdf
比利时vs摩洛哥足彩 ,
university of california san diego

****************************

center for computational mathematics seminar

yuhua zhu

ucsd

a pde based bellman equation for continuous-time reinforcement learning

abstract:

in this paper, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. when the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively perform policy evaluation? we first demonstrate that the commonly used bellman equation is a first-order approximation to the true value function. we then introduce a higher order pde-based bellman equation called phibe. we show that the solution to the i-th order phibe is an i-th order approximation to the true value function. additionally, even the first-order phibe outperforms the bellman equation in approximating the true value function when the system dynamics change slowly. we develop a numerical algorithm based on galerkin method to solve phibe when we possess only discrete-time trajectory data. numerical experiments are provided to validate the theoretical guarantees we propose.

february 6, 2024

11:00 am

zoom only, id 990 3560 4352

****************************