This course is a basic introduction to reinforcement learning algorithms and their applications. Topics include: multi-armed bandits; finite Markov decision processes; dynamic programming; Monte-Carlo methods; temporal-difference learning; actor-critic methods; off-policy learning; introduction to deep variants of the aforementioned algorithms, including deep Q-learning, policy gradient methods, and actor-critic methods.
Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms. This second edition has been significantly expanded and updated, presenting new topics and updating coverage of other topics.