"생명 = 매 우주 의 가장 복잡한 algorithm". 매 DNA / RNA / protein 의 computable model. AlphaFold 의 protein folding 의 50 year problem 의 solve. 매 신약 개발 / disease research / synthetic biology 의 가속화.
📖 핵심
매 sub-domain
Sequence alignment: 매 DNA / RNA / protein 의 evolutionary relation.
Protein folding: 매 amino acid → 3D structure 의 예측.
Genome assembly: 매 short read 의 puzzle 의 reconstruct.
Phylogenetics: 매 species 의 evolutionary tree.
Systems biology: 매 gene regulatory network / metabolic pathway.
Cellular automata: 매 simple rule → 매 complex pattern (Conway's Life).
Synthetic biology: 매 genetic circuit 의 design.
매 ML 응용
AlphaFold (DeepMind): 매 protein structure 의 atomic-accuracy 예측 (CASP14 win).
ESMFold (Meta): 매 large protein language model.
RoseTTAFold (Baker): 매 multi-track architecture.
AlphaMissense: 매 missense variant 의 pathogenic 예측.
Geneformer / scGPT: 매 single-cell transcriptomics 의 foundation model.
매 algorithm 기초
Sequence alignment
Needleman-Wunsch (global): 매 dynamic programming.
Smith-Waterman (local): 매 local match.
BLAST (heuristic): 매 fast database search.
Phylogenetics
UPGMA / Neighbor-joining: 매 distance-based.
Maximum likelihood / Bayesian: 매 model-based.
Folding
Energy minimization: 매 force field (Amber, Charmm).
Molecular dynamics: 매 atomic simulation.
Deep learning: 매 sequence → structure (AlphaFold).
매 data 의 challenge
매 noise (sequencing error, batch effect).
매 high dimensionality (10K+ gene).
매 small sample (rare disease).
매 ground truth 없음 (in vivo 어려움).
매 ethics (germline editing).
→ 매 PINN (Physics-informed Neural Networks) 의 prior 의 inject.
언제: 매 biological data 의 ML 적용. 매 protein / sequence / genome analysis. 매 drug discovery pipeline.
언제 X: 매 clinical diagnosis (FDA-approved tool only). 매 wet lab experiment 의 substitute.
❌ 안티패턴
Data leakage: 매 train / test 의 sequence similarity → 매 fake performance.
No biological prior: 매 ML 의 black box 의 wet lab 의 reject.
Single dataset overfitting: 매 cross-population 의 generalize X.