Papers and Books
- Sutton, Barto, Reinforcement Learning an Introduction. (classic textbook)
- White, Real applications of markov decision processes
- Kober, Bagnell, Peters, Reinforcement learning in robotics: a survey, 2013
Policy gradient
- Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, 1992
- Sutton et al. Policy gradient methods for reinforcement learning with function approximation, 2000
- Kakade, A natural policy gradient, 2001**
- Kakade, Langford, Approximately optimal approximate reinforcement learning, 2002
- Schulman et al. Trust region policy optimization, 2015**
- Schulman et al. High-dimensional continuous control using generalized advantage estimation, 2016
- Rajeswaran et al. Towards generalization and simplicity in continuous control, 2017**
- Schulman et al. Proximal Policy Optimization Algorithms, 2017
- Mnih et al. Asynchronous Methods for Deep Reinforcement Learning, 2016
- Toussaint, Gradient descent lecture notes, 2012**
MCTS
- A Survey of MCTS Methods
- Bandit Based Monte-Carlo Planning
- DeepRL course at UC Berkeley, Fall 2017
- DeepRL course at CMU, Spring 2017
- Intelligent Control course at UW, Spring 2015
- RL course at Stanford, Winter 2017
- RL course at IIT Madras, Fall 2016
- RL course at UCL, 2015