(S, A, P, R, gamma) tuple: state-action-transition-reward-discount; Bellman equation V*(s) = max_a [R(s,a) + gamma sum_s' P(s'|s,a) V*(s')]; basis of reinforcement learning.
(S, A, P, R, gamma) tuple: state-action-transition-reward-discount; Bellman equation V*(s) = max_a [R(s,a) + gamma sum_s' P(s'|s,a) V*(s')]; basis of reinforcement learning.