Markov decision process

Layer 0 — Mathematicsin the Stochastic Processes subtree

(S, A, P, R, gamma) tuple: state-action-transition-reward-discount; Bellman equation V*(s) = max_a [R(s,a) + gamma sum_s' P(s'|s,a) V*(s')]; basis of reinforcement learning.

Related concepts

Matrix and determinant

Explore Markov decision process on the interactive knowledge graph →