Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

5.6 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Algorithmic Biology

📌 한 줄 통찰

"생명 = 매 우주 의 가장 복잡한 algorithm". 매 DNA / RNA / protein 의 computable model. AlphaFold 의 protein folding 의 50 year problem 의 solve. 매 신약 개발 / disease research / synthetic biology 의 가속화.

📖 핵심

매 sub-domain

Sequence alignment: 매 DNA / RNA / protein 의 evolutionary relation.
Protein folding: 매 amino acid → 3D structure 의 예측.
Genome assembly: 매 short read 의 puzzle 의 reconstruct.
Phylogenetics: 매 species 의 evolutionary tree.
Systems biology: 매 gene regulatory network / metabolic pathway.
Cellular automata: 매 simple rule → 매 complex pattern (Conway's Life).
Synthetic biology: 매 genetic circuit 의 design.

매 ML 응용

AlphaFold (DeepMind): 매 protein structure 의 atomic-accuracy 예측 (CASP14 win).
ESMFold (Meta): 매 large protein language model.
RoseTTAFold (Baker): 매 multi-track architecture.
AlphaMissense: 매 missense variant 의 pathogenic 예측.
Geneformer / scGPT: 매 single-cell transcriptomics 의 foundation model.

매 algorithm 기초

Sequence alignment

Needleman-Wunsch (global): 매 dynamic programming.
Smith-Waterman (local): 매 local match.
BLAST (heuristic): 매 fast database search.

Phylogenetics

UPGMA / Neighbor-joining: 매 distance-based.
Maximum likelihood / Bayesian: 매 model-based.

Folding

Energy minimization: 매 force field (Amber, Charmm).
Molecular dynamics: 매 atomic simulation.
Deep learning: 매 sequence → structure (AlphaFold).

매 data 의 challenge

매 noise (sequencing error, batch effect).
매 high dimensionality (10K+ gene).
매 small sample (rare disease).
매 ground truth 없음 (in vivo 어려움).
매 ethics (germline editing).

→ 매 PINN (Physics-informed Neural Networks) 의 prior 의 inject.

💻 패턴

Sequence alignment (BioPython)

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

alignments = pairwise2.align.globalxx('ACGTACGT', 'ACGTGCGT')
print(format_alignment(*alignments[0]))
# ACGTACGT
# |||| |||
# ACGT-GCGT

AlphaFold inference

# 매 ColabFold (open-source AlphaFold2)
from colabfold.batch import run

run(
    queries=[('my_protein', 'MKTAYIAKQRQISFVKSHFSRQ...', None)],
    result_dir='./results',
    use_templates=False,
    num_recycles=3,
)
# Output: PDB file + confidence (pLDDT).

Genome assembly (de Bruijn graph)

def build_de_bruijn(reads, k):
    graph = {}
    for read in reads:
        for i in range(len(read) - k + 1):
            kmer = read[i:i+k]
            prefix, suffix = kmer[:-1], kmer[1:]
            graph.setdefault(prefix, []).append(suffix)
    return graph

Cellular automata (Conway's Life)

import numpy as np
def step(grid):
    neighbors = sum(np.roll(grid, (i, j), (0, 1))
                    for i in (-1, 0, 1) for j in (-1, 0, 1)
                    if (i, j) != (0, 0))
    return ((grid & (neighbors == 2)) | (neighbors == 3)).astype(int)

Single-cell analysis (scanpy)

import scanpy as sc

adata = sc.read_h5ad('data.h5ad')
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
sc.pl.umap(adata, color='leiden')

🤔 결정 기준

문제	Tool
Protein structure	AlphaFold / RoseTTAFold
Sequence search	BLAST
Genome assembly	SPAdes / Canu
RNA-seq	DESeq2 / edgeR
Single-cell	scanpy / Seurat
Phylogenetics	RAxML / BEAST
Synthetic biology	SBOL / Cello

기본값: BioPython + scanpy + AlphaFold (Colab) 의 entry stack.

🔗 Graph

부모: Bioinformatics · Computational-Biology · Systems-Biology
변형: AlphaFold · Genomics · Proteomics · Synthetic-Biology
응용: Drug-Discovery · Personalized-Medicine · Phylogenetics
Adjacent: Computational-Neuroscience · Cellular-Automata · Physics-Informed-Neural-Networks

🤖 LLM 활용

언제: 매 biological data 의 ML 적용. 매 protein / sequence / genome analysis. 매 drug discovery pipeline. 언제 X: 매 clinical diagnosis (FDA-approved tool only). 매 wet lab experiment 의 substitute.

❌ 안티패턴

Data leakage: 매 train / test 의 sequence similarity → 매 fake performance.
No biological prior: 매 ML 의 black box 의 wet lab 의 reject.
Single dataset overfitting: 매 cross-population 의 generalize X.
Ignoring batch effect: 매 batch 가 confound.
No reproducibility: 매 seed / version 의 lock.

🧪 검증 / 중복

Verified (concept-level).
신뢰도 B (rapidly evolving field).
Related: Bioinformatics · AlphaFold · Synthetic-Biology.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — sub-domain + ML 응용 + algorithm + code

5.6 KiB Raw Blame History