A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents
Published in Advances in Neural Information Processing Systems (NeurIPS), 2018
This paper studies efficient policy detecting and reusing techniques when playing against non-stationary agents in Markov games. We propose a new deep BPR+ algorithm by extending the recent BPR+ algorithm with a neural network as the value-function approximator.
Recommended citation: Yan Zheng and Zhaopeng Meng and Jianye Hao and Zongzhang Zhang and Tianpei Yang and Changjie Fan(2019). “A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents.” Advances in Neural Information Processing Systems. 962–972.