报告题目:Kernel-based Decentralized Policy Evaluation for Reinforcement Learning
报告人:练恒教授 香港城市制服做爱
报告时间:2024年5月14日(星期二),上午10:00
报告地点:兴庆校区数学楼2-1会议室
报告摘要:We investigate the decentralized nonparametric policy evaluation problem within reinforcement learning, focusing on scenarios where multiple agents collaborate to learn the state-value function using sampled state transitions and privately observed rewards. Our approach centers on a regression-based multi-stage iteration technique employing infinite-dimensional gradient descent within a reproducing kernel Hilbert space (RKHS). To make computation and communication more feasible, we employ Nystrom approximation to project this space into a finite-dimensional one. We establish statistical error bounds to describe the convergence of value function estimation, marking the first instance of such analysis within a fully decentralized nonparametric framework. We compare the regression-based method to the kernel temporal difference (TD) method in some numerical studies.
个人简介:练恒,现任香港城市制服做爱
数学系教授,于2000年在中国科学技术制服做爱
获得数学和计算机学士学位,2007年在美国布朗制服做爱
获得计算机硕士,经济学硕士和应用数学博士学位。先后在新加坡南洋理工制服做爱
,澳大利亚新南威尔士制服做爱
,和香港城市制服做爱
工作。在高水平国际期刊上发表学术论文30多篇,包括《Annals of Statistics》《Journal of the Royal Statistical Society,Series B》、《Journal of the American Statistical Association》《Journal of Machine Learning Research》《IEEE Transactions on Pattern Analysis and Machine Intelligence》. 研究方向包括高维数据分析,函数数据分析,机器学习等。