Computational biology — NSF BIO / DBI biological-informatics grain; computational methods applied to biological data (sequences, structures, networks, cellular state). Foundations: (1) Dynamic-programming sequence alignment:…
computational-biology
Needleman-Wunsch DP: dp[i][j] = max(dp[i−1,j−1]+s, dp[i−1,j]+g, dp[i,j−1]+g)
Needleman-Wunsch (1970) is the foundational global-alignment algorithm for biological-sequence analysis. Inputs: two sequences a (length…
Levenshtein edit distance: min{ins, del, sub} ops; metric on Σ*
Levenshtein distance (Levenshtein 1966) counts the minimum number of single-character edits — insertions, deletions, or substitutions — to…
Hamming distance: d_H(x,y) = #{i: x_i ≠ y_i}; upper-bounds Levenshtein
Hamming distance (Hamming 1950, in the context of error-correcting codes) counts the positions at which two equal-length strings differ. …
NW canonical anchors: NW(AG,AG)=2, NW(AG,AC)=0, NW(ATGC,ATGC)=4
Sympy/stdlib-exact witness of Needleman-Wunsch scoring on three small DNA anchor pairs, under the standard scoring scheme (match = +1,…
Levenshtein metric on canonical anchors: d(kitten,sitting) = 3, d(x,x)=0, symmetric
Sympy/stdlib-exact witness of the Levenshtein metric on the canonical 'kitten' → 'sitting' anchor (Levenshtein 1966 original paper…
Hamming bound: d_L(AACT,AGCT) = d_H(AACT,AGCT) = 1; d_L ≤ d_H for |x|=|y|
Sympy/stdlib-exact witness of the Hamming-upper-bounds-Levenshtein inequality on DNA anchor pairs. Setup: (i) AACT vs AGCT — differ only…
BLAST Karlin-Altschul: E = K·m·n·exp(−λ·S); local-alignment-score EVT (Gumbel) framework
BLAST Karlin-Altschul extreme-value framework — the canonical local-alignment significance theory underlying BLAST (Altschul, Gish, Miller,…
Jukes-Cantor JC69: dP/dt = Q·P; p(t) = 3/4·(1−e^{−4αt}); first-order Markov substitution model
Jukes-Cantor 1969 (JC69) model — the foundational molecular-evolution substitution framework. Physical setup: DNA sites evolve…
Nussinov 1978 RNA folding: M(i,j) = max{M(i+1,j), M(i,j−1), M(i+1,j−1)+δ_{ij}, max_k M(i,k)+M(k+1,j)}
Nussinov 1978 RNA-secondary-structure folding — the canonical dynamic-programming (DP) framework for maximum-base-pair RNA folding. …
BLAST: E=K·m·n·exp(−λS); doubling n doubles E; S'=(λS−lnK)/ln2; anchor K=0.1,m=1000,n=10⁶,λ=1,S=20
Sympy-exact symbolic witness of the BLAST Karlin-Altschul E-value canonical form together with its database-size-doubling invariant and…
JC69: p(t)=3/4·(1−exp(−4αt)); d(p)=−3/4·ln(1−4p/3); p_∞=3/4; dp/dt|₀=3α; d(p=3/8)=(3/4)ln2
Sympy-exact symbolic witness of the Jukes-Cantor 1969 (JC69) transition-probability formula, its inverse distance estimator, the 3/4…
Nussinov: ACGU→2 pairs, GCAU→2, AAUU→2, GGGAAAUCC→3; O(n³) DP verifiable byte-exact
Integer-anchor witness of the Nussinov 1978 RNA-folding maximum-base-pair DP recursion on five canonical sequences. Setup: Nussinov…
SpliceAI (Jaganathan 2019)
Jaganathan 2019 SpliceAI; modern Enformer + Borzoi + DNABERT genomic-language-models.
AlphaFold-2 (Jumper 2021)
DeepMind 2021 AlphaFold-2; modern AF3 multi-modal 2024 + RoseTTAFold + ESMFold; revolution in computational-biology.
HMM (Rabiner 1989)
L Rabiner 1989 HMM tutorial; modern HMMER + GENSCAN + Pfam HMM-profiles; foundational sequence-analysis.
Smith-Waterman (1981)
T Smith-M Waterman 1981 local alignment; modern SSW-vector + GPU-accelerated; basis of all sequence-similarity tools.
Network biology (Barabasi-Oltvai 2004)
Barabasi-Oltvai 2004 'Network biology'; modern systems-biology + interactomics + co-essentiality maps.
DeepSEA (Zhou 2015)
Zhou-Troyanskaya 2015 DeepSEA non-coding variant function; modern Sei + Enformer + DeepSEA-2 cell-type-specific.
Needleman-Wunsch (1970)
S Needleman-C Wunsch 1970 dynamic-programming global-alignment; modern modern foundational text + Smith-Waterman 1981 local + 2024…
Profile HMM (Krogh 1994)
A Krogh-D Haussler 1994 profile-HMM; modern modern foundational text + HMMer + Pfam + 2024 protein-language-models ESM-cluster.
PSI-BLAST (Altschul 1997)
S Altschul 1997 PSI-BLAST iterative; modern modern foundational text + 2024 deep-learning protein-sequence-search FoldSeek + ProtTrans.
Genome assembly (Pevzner 2001)
P Pevzner 2001 + Compeau-Pevzner 2011 de-Bruijn-graph; modern modern foundational text + 2024 nanopore-Long-read T2T human-genome 2022.
ML phylogeny (Felsenstein 1981)
J Felsenstein 1981 + 1985 bootstrap; modern modern foundational text + RAxML + IQ-TREE + 2024 BEAST3 + tree-likelihood-supercomputer…
AlphaFold2 (Jumper 2021)
Jumper-Hassabis 2021 AlphaFold2 (Nobel 2024); modern modern foundational text + AF3 + RoseTTAFold-AllAtom 2024 + ESMFold…