Foundations of probability and statistics: sigma-algebras, probability measures (Kolmogorov axioms), random variables, expectation, and the two pillar limit theorems (LLN and CLT). Plus a bridge to statistical inference (MLE).
probability-statistics
Sample space Ω
The set of all possible outcomes of a random experiment.
σ-algebra F
A collection of subsets of Ω closed under complement and countable union (and containing Ω). The measurable events.
Probability measure P
A function P: F → [0,1] with P(Ω) = 1 and countable additivity (P(⋃ Aᵢ) = ΣP(Aᵢ) for pairwise disjoint events). Kolmogorov's 1933…
Conditional probability
P(A|B) = P(A ∩ B) / P(B) for P(B) > 0. Operationalises updating in light of partial information.
Bayes' theorem
P(H|D) = P(D|H) P(H) / P(D). Bridges prior to posterior given a likelihood. Central engine of Bayesian inference and of agent belief…
Random variable
A measurable function X: Ω → ℝ. Pushes a probability measure on Ω to a distribution on ℝ (F_X(x) = P(X ≤ x)).
Expectation E[X]
∫_Ω X dP — the Lebesgue integral of a random variable against its probability measure. Linear functional and workhorse of probability.
Law of large numbers
For iid X₁,…,Xₙ with E[Xᵢ] = μ, the sample mean X̄ₙ → μ (weakly in probability, strongly almost surely as n → ∞). Justifies…
Central limit theorem
For iid X₁,…,Xₙ with mean μ and finite variance σ², the normalised sum √n (X̄ₙ - μ)/σ converges in distribution to N(0,1). Universality…
Gaussian (normal) distribution
Continuous distribution with density (2πσ²)^(-1/2) exp(-(x-μ)²/2σ²). Attractor of the CLT and maximum-entropy distribution for fixed mean…
Maximum-likelihood estimation
Point estimation rule: given iid sample x from parametric family f(·|θ), estimate θ̂ = argmax_θ L(θ; x) = argmax_θ ∏ f(xᵢ|θ). Consistent…
Moment generating function
M_X(t) = E[e^{tX}]. When finite on an open neighbourhood of 0, generates all moments and uniquely determines the distribution.
Characteristic function
φ_X(t) = E[e^{itX}]. Always finite, uniquely determines the distribution, and its pointwise convergence characterises convergence in…
Martingale
A process (X_n) adapted to a filtration (ℱ_n) with E[|X_n|]<∞ and E[X_{n+1}|ℱ_n]=X_n. Abstracts 'fair game'; Doob's theorems drive much of…
Stopping time
A random time τ with {τ ≤ n} ∈ ℱ_n for all n. Encodes decisions that depend only on observed history; enables optional-stopping arguments.
Doob's maximal inequality
For a non-negative submartingale (X_n), P(max_{k≤n} X_k ≥ λ) ≤ E[X_n]/λ, and the L^p version ‖max_{k≤n} X_k‖_p ≤ (p/(p−1)) ‖X_n‖_p for p>1.
Markov chain
A sequence (X_n) on state space S satisfying the Markov property P(X_{n+1}=y | X_n=x, …) = P(X_{n+1}=y | X_n=x). Stationary distributions,…
Brownian motion (Wiener process)
A continuous-time stochastic process B_t with B_0=0, independent increments, B_t−B_s ∼ N(0, t−s), and continuous sample paths. Canonical…
Itô integral
The stochastic integral ∫_0^t H_s dB_s defined for adapted H ∈ L²([0,T]×Ω) as an L²-limit of left-endpoint Riemann sums. Basis of…
Itô's lemma
For f ∈ C², df(t,B_t) = ∂_t f dt + ∂_x f dB_t + ½ ∂_{xx} f dt — the chain rule of stochastic calculus, correcting classical calculus by ½…
Stochastic differential equation (SDE)
An equation dX_t = μ(t,X_t)dt + σ(t,X_t)dB_t interpreted via the Itô integral; under Lipschitz/growth conditions admits unique strong…
Statistical hypothesis testing
Neyman–Pearson framework: design a decision rule between H₀ and H₁ controlling the type-I error at level α while maximising power. …
Bayesian inference
Posterior ∝ Prior × Likelihood (Bayes' rule): updating subjective belief over parameters θ given data x. Conjugate priors, MCMC, and…
Shannon entropy
For a discrete random variable X with probability mass p, the Shannon entropy is H(X) = −Σ p(x) log p(x). Measures average uncertainty /…
Kolmogorov probability axioms
Triple (Ω, ℱ, P) with P(Ω)=1, countably additive on σ-algebra ℱ. Foundation 1933; random variables = measurable functions.
Strong / weak law of large numbers
IID X_n with E|X|<∞: (1/n) Σ X_i → E X in L¹ (weak, Khinchin) and a.s. (strong, Kolmogorov). Ergodic analogue Birkhoff.
Martingale & Doob convergence
(X_n, ℱ_n) with E[X_{n+1}|ℱ_n] = X_n. Doob: sup E|X_n| < ∞ ⟹ a.s. convergence. Optional stopping theorem.
Conditional expectation E[X|𝒢]
𝒢-measurable r.v. with ∫_G Y dP = ∫_G X dP for all G ∈ 𝒢. Radon-Nikodym existence; ℒ²-projection.
Characteristic function
φ_X(t) = E[e^{itX}]. Uniquely determines distribution; Lévy continuity thm; φ_{X+Y} = φ_X φ_Y for independent.
Poisson process
Counting process N_t with independent increments, N_{t+s}−N_t ∼ Poisson(λs). Exponential waiting times; compound Poisson jump processes.
Law of iterated logarithm
For IID mean 0, var 1: limsup S_n/√(2n log log n) = 1 a.s. Refines CLT fluctuations; Hartman-Wintner.
Large deviations & Cramér theorem
IID X_n, rate function I(x) = sup_λ[λx − log E e^{λX}]: P(S_n/n ≈ x) ~ e^{−n I(x)}. Sanov, Freidlin-Wentzell for SDEs.
Maximum likelihood estimation
θ̂ = argmax L(θ; x) = argmax Π f(x_i;θ). Asymptotically normal, efficient (Cramér-Rao) under regularity. Fisher information.
Neyman-Pearson lemma
Most powerful test of simple H_0 vs H_1 at level α is likelihood ratio test L_1/L_0 > k. UMP existence via monotone LR.
Borel-Cantelli lemmas
(First) If Σ P(A_n) < ∞ then almost surely only finitely many of the A_n occur. (Second) If the A_n are independent and Σ P(A_n) = ∞ then…
Kolmogorov's zero-one law
Every event in the tail σ-algebra of an independent sequence of random variables has probability 0 or 1. Kolmogorov 1928. Examples:…
Jensen's inequality (probability)
For any convex function φ and integrable random variable X, φ(E[X]) ≤ E[φ(X)]; equality iff X is almost surely constant on the interior…
Markov's inequality
For any non-negative random variable X and a > 0, P(X ≥ a) ≤ E[X]/a. A. Markov 1884. Immediate from E[X] ≥ E[X·1_{X≥a}] ≥ a·P(X ≥ a). …
Chebyshev's inequality (probability)
Any random variable X with finite variance σ² satisfies P(|X − E[X]| ≥ kσ) ≤ 1/k². Chebyshev 1867. Distribution-free; loose for…
Hoeffding's inequality
For independent bounded random variables X_i ∈ [a_i, b_i], sums deviate sub-Gaussianly: P(|S_n − E S_n| ≥ nt) ≤ 2·exp(−2n²t²/Σ(b_i−a_i)²). …
Cramér-Rao lower bound
Under regularity, the variance of any unbiased estimator θ̂ of a parameter θ based on n i.i.d. samples is bounded below by 1/(n·I(θ)),…
Slutsky's theorem
If X_n converges in distribution to X and Y_n converges in probability to a constant c, then X_n + Y_n ⇒ X + c and X_n · Y_n ⇒ c · X (and…