Computational biology — NSF BIO / DBI biological-informatics grain; computational methods applied to biological data (sequences, structures, networks, cellular state). Foundations: (1) Dynamic-programming sequence alignment:…
computational-biology
Needleman-Wunsch DP: dp[i][j] = max(dp[i−1,j−1]+s, dp[i−1,j]+g, dp[i,j−1]+g)
Needleman-Wunsch (1970) is the foundational global-alignment algorithm for biological-sequence analysis. Inputs: two sequences a (length…
Levenshtein edit distance: min{ins, del, sub} ops; metric on Σ*
Levenshtein distance (Levenshtein 1966) counts the minimum number of single-character edits — insertions, deletions, or substitutions — to…
Hamming distance: d_H(x,y) = #{i: x_i ≠ y_i}; upper-bounds Levenshtein
Hamming distance (Hamming 1950, in the context of error-correcting codes) counts the positions at which two equal-length strings differ. …
NW canonical anchors: NW(AG,AG)=2, NW(AG,AC)=0, NW(ATGC,ATGC)=4
Sympy/stdlib-exact witness of Needleman-Wunsch scoring on three small DNA anchor pairs, under the standard scoring scheme (match = +1,…
Levenshtein metric on canonical anchors: d(kitten,sitting) = 3, d(x,x)=0, symmetric
Sympy/stdlib-exact witness of the Levenshtein metric on the canonical 'kitten' → 'sitting' anchor (Levenshtein 1966 original paper…
Hamming bound: d_L(AACT,AGCT) = d_H(AACT,AGCT) = 1; d_L ≤ d_H for |x|=|y|
Sympy/stdlib-exact witness of the Hamming-upper-bounds-Levenshtein inequality on DNA anchor pairs. Setup: (i) AACT vs AGCT — differ only…