Files
2nd/10_Wiki/Topics/AI_and_ML/Algorithmic-Biology.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.5 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-algorithmic-biology Algorithmic Biology 10_Wiki/Topics verified self
알고리즘 생물학
computational biology
bioinformatics
algorithmic life
none B 0.85 conceptual
biology
bioinformatics
computational-biology
alphafold
sequence-alignment
cellular-automata
ml-bio
2026-05-10 pending
language framework
Python / R / C++ BioPython / AlphaFold / Rosetta / Biopython

Algorithmic Biology

📌 한 줄 통찰

"생명 = 매 우주 의 가장 복잡한 algorithm". 매 DNA / RNA / protein 의 computable model. AlphaFold 의 protein folding 의 50 year problem 의 solve. 매 신약 개발 / disease research / synthetic biology 의 가속화.

📖 핵심

매 sub-domain

  1. Sequence alignment: 매 DNA / RNA / protein 의 evolutionary relation.
  2. Protein folding: 매 amino acid → 3D structure 의 예측.
  3. Genome assembly: 매 short read 의 puzzle 의 reconstruct.
  4. Phylogenetics: 매 species 의 evolutionary tree.
  5. Systems biology: 매 gene regulatory network / metabolic pathway.
  6. Cellular automata: 매 simple rule → 매 complex pattern (Conway's Life).
  7. Synthetic biology: 매 genetic circuit 의 design.

매 ML 응용

  • AlphaFold (DeepMind): 매 protein structure 의 atomic-accuracy 예측 (CASP14 win).
  • ESMFold (Meta): 매 large protein language model.
  • RoseTTAFold (Baker): 매 multi-track architecture.
  • AlphaMissense: 매 missense variant 의 pathogenic 예측.
  • Geneformer / scGPT: 매 single-cell transcriptomics 의 foundation model.

매 algorithm 기초

Sequence alignment

  • Needleman-Wunsch (global): 매 dynamic programming.
  • Smith-Waterman (local): 매 local match.
  • BLAST (heuristic): 매 fast database search.

Phylogenetics

  • UPGMA / Neighbor-joining: 매 distance-based.
  • Maximum likelihood / Bayesian: 매 model-based.

Folding

  • Energy minimization: 매 force field (Amber, Charmm).
  • Molecular dynamics: 매 atomic simulation.
  • Deep learning: 매 sequence → structure (AlphaFold).

매 data 의 challenge

  • 매 noise (sequencing error, batch effect).
  • 매 high dimensionality (10K+ gene).
  • 매 small sample (rare disease).
  • 매 ground truth 없음 (in vivo 어려움).
  • 매 ethics (germline editing).

→ 매 PINN (Physics-informed Neural Networks) 의 prior 의 inject.

💻 패턴

Sequence alignment (BioPython)

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

alignments = pairwise2.align.globalxx('ACGTACGT', 'ACGTGCGT')
print(format_alignment(*alignments[0]))
# ACGTACGT
# |||| |||
# ACGT-GCGT

AlphaFold inference

# 매 ColabFold (open-source AlphaFold2)
from colabfold.batch import run

run(
    queries=[('my_protein', 'MKTAYIAKQRQISFVKSHFSRQ...', None)],
    result_dir='./results',
    use_templates=False,
    num_recycles=3,
)
# Output: PDB file + confidence (pLDDT).

Genome assembly (de Bruijn graph)

def build_de_bruijn(reads, k):
    graph = {}
    for read in reads:
        for i in range(len(read) - k + 1):
            kmer = read[i:i+k]
            prefix, suffix = kmer[:-1], kmer[1:]
            graph.setdefault(prefix, []).append(suffix)
    return graph

Cellular automata (Conway's Life)

import numpy as np
def step(grid):
    neighbors = sum(np.roll(grid, (i, j), (0, 1))
                    for i in (-1, 0, 1) for j in (-1, 0, 1)
                    if (i, j) != (0, 0))
    return ((grid & (neighbors == 2)) | (neighbors == 3)).astype(int)

Single-cell analysis (scanpy)

import scanpy as sc

adata = sc.read_h5ad('data.h5ad')
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
sc.pl.umap(adata, color='leiden')

🤔 결정 기준

문제 Tool
Protein structure AlphaFold / RoseTTAFold
Sequence search BLAST
Genome assembly SPAdes / Canu
RNA-seq DESeq2 / edgeR
Single-cell scanpy / Seurat
Phylogenetics RAxML / BEAST
Synthetic biology SBOL / Cello

기본값: BioPython + scanpy + AlphaFold (Colab) 의 entry stack.

🔗 Graph

🤖 LLM 활용

언제: 매 biological data 의 ML 적용. 매 protein / sequence / genome analysis. 매 drug discovery pipeline. 언제 X: 매 clinical diagnosis (FDA-approved tool only). 매 wet lab experiment 의 substitute.

안티패턴

  • Data leakage: 매 train / test 의 sequence similarity → 매 fake performance.
  • No biological prior: 매 ML 의 black box 의 wet lab 의 reject.
  • Single dataset overfitting: 매 cross-population 의 generalize X.
  • Ignoring batch effect: 매 batch 가 confound.
  • No reproducibility: 매 seed / version 의 lock.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — sub-domain + ML 응용 + algorithm + code