probability-statistics

Layer 0 — Mathematics42 concepts in this subtree

Foundations of probability and statistics: sigma-algebras, probability measures (Kolmogorov axioms), random variables, expectation, and the two pillar limit theorems (LLN and CLT). Plus a bridge to statistical inference (MLE).

Sample space Ω

The set of all possible outcomes of a random experiment.

A collection of subsets of Ω closed under complement and countable union (and containing Ω). The measurable events.

Probability measure P

A function P: F → [0,1] with P(Ω) = 1 and countable additivity (P(⋃ Aᵢ) = ΣP(Aᵢ) for pairwise disjoint events). Kolmogorov's 1933…

Conditional probability

P(A|B) = P(A ∩ B) / P(B) for P(B) > 0. Operationalises updating in light of partial information.

P(H|D) = P(D|H) P(H) / P(D). Bridges prior to posterior given a likelihood. Central engine of Bayesian inference and of agent belief…

Random variable

A measurable function X: Ω → ℝ. Pushes a probability measure on Ω to a distribution on ℝ (F_X(x) = P(X ≤ x)).

Expectation E[X]

∫_Ω X dP — the Lebesgue integral of a random variable against its probability measure. Linear functional and workhorse of probability.

Law of large numbers

For iid X₁,…,Xₙ with E[Xᵢ] = μ, the sample mean X̄ₙ → μ (weakly in probability, strongly almost surely as n → ∞). Justifies…

Central limit theorem

For iid X₁,…,Xₙ with mean μ and finite variance σ², the normalised sum √n (X̄ₙ - μ)/σ converges in distribution to N(0,1). Universality…

Gaussian (normal) distribution

Continuous distribution with density (2πσ²)^(-1/2) exp(-(x-μ)²/2σ²). Attractor of the CLT and maximum-entropy distribution for fixed mean…

Maximum-likelihood estimation

Point estimation rule: given iid sample x from parametric family f(·|θ), estimate θ̂ = argmax_θ L(θ; x) = argmax_θ ∏ f(xᵢ|θ). Consistent…

Moment generating function

M_X(t) = E[e^{tX}]. When finite on an open neighbourhood of 0, generates all moments and uniquely determines the distribution.

Characteristic function

φ_X(t) = E[e^{itX}]. Always finite, uniquely determines the distribution, and its pointwise convergence characterises convergence in…

A process (X_n) adapted to a filtration (ℱ_n) with E[|X_n|]<∞ and E[X_{n+1}|ℱ_n]=X_n. Abstracts 'fair game'; Doob's theorems drive much of…

A random time τ with {τ ≤ n} ∈ ℱ_n for all n. Encodes decisions that depend only on observed history; enables optional-stopping arguments.

Doob's maximal inequality

For a non-negative submartingale (X_n), P(max_{k≤n} X_k ≥ λ) ≤ E[X_n]/λ, and the L^p version ‖max_{k≤n} X_k‖_p ≤ (p/(p−1)) ‖X_n‖_p for p>1.

A sequence (X_n) on state space S satisfying the Markov property P(X_{n+1}=y | X_n=x, …) = P(X_{n+1}=y | X_n=x). Stationary distributions,…

Brownian motion (Wiener process)

A continuous-time stochastic process B_t with B_0=0, independent increments, B_t−B_s ∼ N(0, t−s), and continuous sample paths. Canonical…

The stochastic integral ∫_0^t H_s dB_s defined for adapted H ∈ L²([0,T]×Ω) as an L²-limit of left-endpoint Riemann sums. Basis of…

For f ∈ C², df(t,B_t) = ∂_t f dt + ∂_x f dB_t + ½ ∂_{xx} f dt — the chain rule of stochastic calculus, correcting classical calculus by ½…

Stochastic differential equation (SDE)

An equation dX_t = μ(t,X_t)dt + σ(t,X_t)dB_t interpreted via the Itô integral; under Lipschitz/growth conditions admits unique strong…

Statistical hypothesis testing

Neyman–Pearson framework: design a decision rule between H₀ and H₁ controlling the type-I error at level α while maximising power. …

Bayesian inference

Posterior ∝ Prior × Likelihood (Bayes' rule): updating subjective belief over parameters θ given data x. Conjugate priors, MCMC, and…

Shannon entropy

For a discrete random variable X with probability mass p, the Shannon entropy is H(X) = −Σ p(x) log p(x). Measures average uncertainty /…

Kolmogorov probability axioms

Triple (Ω, ℱ, P) with P(Ω)=1, countably additive on σ-algebra ℱ. Foundation 1933; random variables = measurable functions.

Strong / weak law of large numbers

IID X_n with E|X|<∞: (1/n) Σ X_i → E X in L¹ (weak, Khinchin) and a.s. (strong, Kolmogorov). Ergodic analogue Birkhoff.

Martingale & Doob convergence

(X_n, ℱ_n) with E[X_{n+1}|ℱ_n] = X_n. Doob: sup E|X_n| < ∞ ⟹ a.s. convergence. Optional stopping theorem.

Conditional expectation E[X|𝒢]

𝒢-measurable r.v. with ∫_G Y dP = ∫_G X dP for all G ∈ 𝒢. Radon-Nikodym existence; ℒ²-projection.

Characteristic function

φ_X(t) = E[e^{itX}]. Uniquely determines distribution; Lévy continuity thm; φ_{X+Y} = φ_X φ_Y for independent.

Poisson process

Counting process N_t with independent increments, N_{t+s}−N_t ∼ Poisson(λs). Exponential waiting times; compound Poisson jump processes.

Law of iterated logarithm

For IID mean 0, var 1: limsup S_n/√(2n log log n) = 1 a.s. Refines CLT fluctuations; Hartman-Wintner.

Large deviations & Cramér theorem

IID X_n, rate function I(x) = sup_λ[λx − log E e^{λX}]: P(S_n/n ≈ x) ~ e^{−n I(x)}. Sanov, Freidlin-Wentzell for SDEs.

Maximum likelihood estimation

θ̂ = argmax L(θ; x) = argmax Π f(x_i;θ). Asymptotically normal, efficient (Cramér-Rao) under regularity. Fisher information.

Neyman-Pearson lemma

Most powerful test of simple H_0 vs H_1 at level α is likelihood ratio test L_1/L_0 > k. UMP existence via monotone LR.

Borel-Cantelli lemmas

(First) If Σ P(A_n) < ∞ then almost surely only finitely many of the A_n occur. (Second) If the A_n are independent and Σ P(A_n) = ∞ then…

Kolmogorov's zero-one law

Every event in the tail σ-algebra of an independent sequence of random variables has probability 0 or 1. Kolmogorov 1928. Examples:…

Jensen's inequality (probability)

For any convex function φ and integrable random variable X, φ(E[X]) ≤ E[φ(X)]; equality iff X is almost surely constant on the interior…

Markov's inequality

For any non-negative random variable X and a > 0, P(X ≥ a) ≤ E[X]/a. A. Markov 1884. Immediate from E[X] ≥ E[X·1_{X≥a}] ≥ a·P(X ≥ a). …

Chebyshev's inequality (probability)

Any random variable X with finite variance σ² satisfies P(|X − E[X]| ≥ kσ) ≤ 1/k². Chebyshev 1867. Distribution-free; loose for…

Hoeffding's inequality

For independent bounded random variables X_i ∈ [a_i, b_i], sums deviate sub-Gaussianly: P(|S_n − E S_n| ≥ nt) ≤ 2·exp(−2n²t²/Σ(b_i−a_i)²). …

Cramér-Rao lower bound

Under regularity, the variance of any unbiased estimator θ̂ of a parameter θ based on n i.i.d. samples is bounded below by 1/(n·I(θ)),…

Slutsky's theorem

If X_n converges in distribution to X and Y_n converges in probability to a constant c, then X_n + Y_n ⇒ X + c and X_n · Y_n ⇒ c · X (and…

Explore the probability-statistics subtree on the interactive graph →