--- id: wiki-2026-0508-algorithmic-biology title: Algorithmic Biology category: 10_Wiki/Topics status: verified canonical_id: self aliases: [알고리즘 생물학, computational biology, bioinformatics, algorithmic life] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: conceptual tags: [biology, bioinformatics, computational-biology, alphafold, sequence-alignment, cellular-automata, ml-bio] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python / R / C++ framework: BioPython / AlphaFold / Rosetta / Biopython --- # Algorithmic Biology ## 📌 한 줄 통찰 > **"생명 = 매 우주 의 가장 복잡한 algorithm"**. 매 DNA / RNA / protein 의 computable model. AlphaFold 의 protein folding 의 50 year problem 의 solve. 매 신약 개발 / disease research / synthetic biology 의 가속화. ## 📖 핵심 ### 매 sub-domain 1. **Sequence alignment**: 매 DNA / RNA / protein 의 evolutionary relation. 2. **Protein folding**: 매 amino acid → 3D structure 의 예측. 3. **Genome assembly**: 매 short read 의 puzzle 의 reconstruct. 4. **Phylogenetics**: 매 species 의 evolutionary tree. 5. **Systems biology**: 매 gene regulatory network / metabolic pathway. 6. **Cellular automata**: 매 simple rule → 매 complex pattern (Conway's Life). 7. **Synthetic biology**: 매 genetic circuit 의 design. ### 매 ML 응용 - **AlphaFold (DeepMind)**: 매 protein structure 의 atomic-accuracy 예측 (CASP14 win). - **ESMFold (Meta)**: 매 large protein language model. - **RoseTTAFold (Baker)**: 매 multi-track architecture. - **AlphaMissense**: 매 missense variant 의 pathogenic 예측. - **Geneformer / scGPT**: 매 single-cell transcriptomics 의 foundation model. ### 매 algorithm 기초 #### Sequence alignment - **Needleman-Wunsch** (global): 매 dynamic programming. - **Smith-Waterman** (local): 매 local match. - **BLAST** (heuristic): 매 fast database search. #### Phylogenetics - **UPGMA / Neighbor-joining**: 매 distance-based. - **Maximum likelihood / Bayesian**: 매 model-based. #### Folding - **Energy minimization**: 매 force field (Amber, Charmm). - **Molecular dynamics**: 매 atomic simulation. - **Deep learning**: 매 sequence → structure (AlphaFold). ### 매 data 의 challenge - 매 noise (sequencing error, batch effect). - 매 high dimensionality (10K+ gene). - 매 small sample (rare disease). - 매 ground truth 없음 (in vivo 어려움). - 매 ethics (germline editing). → 매 PINN (Physics-informed Neural Networks) 의 prior 의 inject. ## 💻 패턴 ### Sequence alignment (BioPython) ```python from Bio import pairwise2 from Bio.pairwise2 import format_alignment alignments = pairwise2.align.globalxx('ACGTACGT', 'ACGTGCGT') print(format_alignment(*alignments[0])) # ACGTACGT # |||| ||| # ACGT-GCGT ``` ### AlphaFold inference ```python # 매 ColabFold (open-source AlphaFold2) from colabfold.batch import run run( queries=[('my_protein', 'MKTAYIAKQRQISFVKSHFSRQ...', None)], result_dir='./results', use_templates=False, num_recycles=3, ) # Output: PDB file + confidence (pLDDT). ``` ### Genome assembly (de Bruijn graph) ```python def build_de_bruijn(reads, k): graph = {} for read in reads: for i in range(len(read) - k + 1): kmer = read[i:i+k] prefix, suffix = kmer[:-1], kmer[1:] graph.setdefault(prefix, []).append(suffix) return graph ``` ### Cellular automata (Conway's Life) ```python import numpy as np def step(grid): neighbors = sum(np.roll(grid, (i, j), (0, 1)) for i in (-1, 0, 1) for j in (-1, 0, 1) if (i, j) != (0, 0)) return ((grid & (neighbors == 2)) | (neighbors == 3)).astype(int) ``` ### Single-cell analysis (scanpy) ```python import scanpy as sc adata = sc.read_h5ad('data.h5ad') sc.pp.normalize_total(adata) sc.pp.log1p(adata) sc.pp.neighbors(adata) sc.tl.umap(adata) sc.tl.leiden(adata) sc.pl.umap(adata, color='leiden') ``` ## 🤔 결정 기준 | 문제 | Tool | |---|---| | Protein structure | AlphaFold / RoseTTAFold | | Sequence search | BLAST | | Genome assembly | SPAdes / Canu | | RNA-seq | DESeq2 / edgeR | | Single-cell | scanpy / Seurat | | Phylogenetics | RAxML / BEAST | | Synthetic biology | SBOL / Cello | **기본값**: BioPython + scanpy + AlphaFold (Colab) 의 entry stack. ## 🔗 Graph - 부모: [[Bioinformatics]] · [[Computational-Biology]] · [[Systems Biology]] - 변형: [[AlphaFold]] - Adjacent: [[Computational-Neuroscience-RL|Computational-Neuroscience]] · [[Cellular Automata]] · [[Physics-Informed-Neural-Networks]] ## 🤖 LLM 활용 **언제**: 매 biological data 의 ML 적용. 매 protein / sequence / genome analysis. 매 drug discovery pipeline. **언제 X**: 매 clinical diagnosis (FDA-approved tool only). 매 wet lab experiment 의 substitute. ## ❌ 안티패턴 - **Data leakage**: 매 train / test 의 sequence similarity → 매 fake performance. - **No biological prior**: 매 ML 의 black box 의 wet lab 의 reject. - **Single dataset overfitting**: 매 cross-population 의 generalize X. - **Ignoring batch effect**: 매 batch 가 confound. - **No reproducibility**: 매 seed / version 의 lock. ## 🧪 검증 / 중복 - Verified (concept-level). - 신뢰도 B (rapidly evolving field). - Related: [[Bioinformatics]] · [[AlphaFold]] · [[Synthetic-Biology]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — sub-domain + ML 응용 + algorithm + code |