Files
2nd/10_Wiki/Topics/DevOps_and_Security/Bioinformatics-Structure-Prediction.md
T
2026-05-10 22:08:15 +09:00

4.6 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-bioinformatics-structure-predict Bioinformatics Structure Prediction 10_Wiki/Topics verified self
Protein Structure Prediction
AlphaFold
ESM
none A 0.9 applied
bioinformatics
ml
protein
structure
2026-05-10 applied
language framework
Python AlphaFold3/ESM3/ColabFold

Bioinformatics Structure Prediction

매 한 줄

"매 sequence 에서 3D 구조까지 — 50년 grand challenge 가 2021 년 풀렸다.". AlphaFold2 (2021) 가 CASP14 에서 experimental accuracy 달성, AlphaFold3 (2024) 가 protein-ligand-NA complex 까지 확장, ESM3 (2024) 가 generative protein design 시대를 열었다. 2026 의 표준: AF3 + ESMFold + RoseTTAFold All-Atom + ColabFold pipeline.

매 핵심

매 Method Lineage

  • Homology modeling (1990s): MODELLER — known template 의존.
  • Threading / fold recognition (2000s).
  • Ab initio physics (Rosetta).
  • Coevolution + DL (2018+): trRosetta, AlphaFold1.
  • Attention-based (2021+): AlphaFold2 — Evoformer + Structure module.
  • All-atom diffusion (2024+): AlphaFold3 — protein/DNA/RNA/ligand 통합.
  • Single-sequence (LLM): ESMFold, ESM3 — 매 MSA 없이 fast.

매 AlphaFold3 Capability (2024)

  • 매 protein-protein, protein-NA, protein-ligand complex.
  • 매 covalent modifications, ions.
  • 매 diffusion-based all-atom output.
  • 매 license: research-only via AF Server.

매 응용

  1. Drug discovery: target-ligand docking, hit triage.
  2. Protein engineering: enzyme design, antibody.
  3. Disease mechanism: variant effect (missense3D, AlphaMissense).
  4. Structural biology: cryo-EM model building.
  5. De novo design: RFdiffusion + ProteinMPNN.

💻 패턴

ColabFold one-liner

# 매 fast MSA via MMseqs2 + AF2 inference
colabfold_batch input.fasta out_dir/ \
  --num-recycle 3 --model-type alphafold2_multimer_v3

ESMFold (single-sequence, no MSA)

import torch
from transformers import EsmForProteinFolding
model = EsmForProteinFolding.from_pretrained("facebook/esmfold_v1").cuda().eval()
seq = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVK"
with torch.no_grad():
    out = model.infer_pdb(seq)
open("pred.pdb","w").write(out)

AlphaFold3 via API

# 매 AF3 server (research) — JSON job spec
import requests
job = {
  "name": "complex_001",
  "modelSeeds": [42],
  "sequences": [
    {"protein": {"id":"A","sequence":"MKTA..."}},
    {"ligand":  {"id":"L","ccdCodes":["ATP"]}}
  ]
}
r = requests.post("https://alphafoldserver.com/api/job", json=job, headers=auth)

RFdiffusion de novo binder design

# 매 design 80aa binder against target hotspot
python run_inference.py \
  inference.output_prefix=binders/run \
  contigmap.contigs="['A1-150,0 80-80']" \
  ppi.hotspot_res="['A30','A33','A56']" \
  inference.num_designs=100

Confidence (pLDDT) filtering

import numpy as np
# 매 pLDDT > 90 = very high; 70-90 = confident; 50-70 = low; <50 = disordered
plddt = np.array([atom.bfactor for atom in structure.get_atoms() if atom.name == "CA"])
mean_conf = plddt.mean()
disordered_frac = (plddt < 50).mean()

매 결정 기준

상황 Tool
Single protein, fast ESMFold
Single protein, accurate AlphaFold2 (ColabFold)
Multimer / complex AlphaFold3 / AF-Multimer
Protein + ligand AlphaFold3 / Boltz-1
De novo design RFdiffusion + ProteinMPNN
Variant effect AlphaMissense

기본값: 매 ColabFold AF2-multimer → AF3 for ligand/NA.

🔗 Graph

🤖 LLM 활용

언제: protein language model embedding, binder search, paper summary, mutation scan ranking. 언제 X: 매 final pose prediction — physics/structure model 이 specialized.

안티패턴

  • pLDDT 무시: 매 low-confidence region 을 그대로 사용 — 매 disordered 일 수 있음.
  • Single seed: 매 AF3 multi-seed sampling 권장 — diversity.
  • MSA 없이 large complex: 매 ESMFold 는 single-chain 강점, multimer 약함.
  • License 위반: 매 AF3 weights non-commercial — server API 만 허용.

🧪 검증 / 중복

  • Verified: Jumper et al. 2021 Nature (AF2); Abramson et al. 2024 Nature (AF3); Lin et al. 2023 Science (ESM2).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — AF3/ESM3/RFdiffusion 2026 stack