Sequence alignment, BLAST, HMMs, genome assembly, variant calling, RNA-seq, ChIP-seq, proteomics.
Omics And Bioinformatics
Sequence alignment
Arrangement of biological sequences (DNA, RNA, protein) to identify homologous residues. Global (Needleman-Wunsch 1970) or local…
BLAST (Basic Local Alignment Search Tool)
Altschul-Gish-Miller-Myers-Lipman 1990 heuristic sequence-similarity search. Seeds high-scoring words, extends ungapped, reports…
Hidden Markov model (bio applications)
Probabilistic model with hidden states emitting observations. Profile HMMs (HMMER) for family membership; gene prediction (GeneMark,…
Genome assembly
Reconstruction of a genome from sequencing reads. De novo (no reference) via OLC or de Bruijn graph; reference-based (mapping). Challenges:…
Variant calling
Detection of SNVs, indels, structural variants from aligned reads. Bayesian genotype likelihoods (GATK, DeepVariant). Filters: allele…
RNA-seq
Transcriptome sequencing: fragment cDNA → sequence → align → count. Differential expression by DESeq2, edgeR (negative binomial).…
ChIP-seq
Genome-wide protein-DNA-binding assay: crosslink → ChIP with antibody → sequence → align → peak call (MACS). Maps TF binding sites, histone…
Single-cell RNA-seq (scRNA-seq)
Transcriptome profiling at single-cell resolution via droplet (10x Chromium) or plate (Smart-seq2) methods. Analysis: doublet detection,…
Multiple sequence alignment
Progressive alignment (ClustalW), consistency-based (T-Coffee, MAFFT), profile methods. Foundation for phylogenetic inference, motif…
Functional annotation (GO, KEGG)
Gene Ontology: structured vocabulary of biological process, molecular function, cellular component. KEGG pathways. Enrichment analysis…
Smith-Waterman / Needleman-Wunsch dynamic programming
Needleman-Wunsch 1970 (global) and Smith-Waterman 1981 (local) dynamic-programming alignment with score recurrence H(i,j) =…
Karlin-Altschul BLAST significance statistics
Karlin-Altschul 1990: number of HSPs with score ≥ S follows Poisson with mean E = K m n e^{-λS}; λ and K depend only on substitution matrix…
BLAST (Altschul 1990)
Altschul-Gish-Miller-Myers-Lipman 1990 BLAST; modern most-cited bioinformatics tool 100k+ citations.
ClustalW (Thompson 1994)
Thompson-Higgins-Gibson 1994 ClustalW progressive-alignment; modern MUSCLE + MAFFT + T-Coffee competitors.
GENSCAN (Burge-Karlin 1997)
Burge-Karlin 1997 GENSCAN HMM gene-prediction; modern AUGUSTUS + StringTie + transformer-based tools.
Microarray (deRisi 1997)
deRisi-Iyer-Brown 1997 cDNA-microarray glucose-shift S cerevisiae; modern Illumina-arrays + RNA-seq replacement.
Proteomics (Aebersold-Mann 2003)
Aebersold-Mann 2003 'Mass spectrometry-based proteomics'; modern label-free + DIA + cross-linking-MS.
scRNA-seq (Tang 2009)
F Tang 2009 + Macosko-Drop-seq 2015 + 10x Chromium; modern atlas-of-cells Human Cell Atlas + Mouse Cell Atlas.
BLAST detail (Altschul 1990)
S Altschul 1990 BLAST + Karlin-Altschul 1990 statistics; modern foundational alignment-search + UniRef + DIAMOND fast 2024.
Smith-Waterman (1981)
T Smith-M Waterman 1981 local-alignment dynamic-programming; modern modern foundational-text + GPU-accelerated 2024.
HMM (Baum-Welch 1972)
L Baum 1966-1972 + Krogh 1994 hidden-Markov-models; modern HMMer + PFAM + 2024 protein-profile classification.
NGS (Shendure 2008)
J Shendure-J Rogers 2008 next-gen-sequencing; modern Illumina + Oxford-Nanopore + PacBio + 2024 Element-Aviti $200/genome.
PCA in genomics (Pearson 1981+)
Cavalli-Sforza-Edwards 1964 + Patterson 2006 EigenSoft; modern modern PCA-clustering + UMAP McInnes 2018 + scVI.
AlphaFold (Jumper 2021)
Jumper-Hassabis 2021 AlphaFold2; modern modern AF3 + Boltz-1 + ESMFold + 2024 RoseTTAFold-AllAtom + design-pipeline.