[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,97 +1,250 @@
 ---
 id: wiki-2026-0508-deepcode-ai
-title: DeepCode AI
+title: DeepCode AI (Snyk Code)
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-A75F29]
+aliases: [DeepCode AI, Snyk Code, symbolic AI security, neuro-symbolic SAST, AI Fix]
 duplicate_of: none
-source_trust_level: A
-confidence_score: 0.9
-tags: [auto-reinforced]
+source_trust_level: B
+confidence_score: 0.85
+verification_status: applied
+tags: [security, sast, snyk, deepcode, neuro-symbolic, ml-security, autofix, ai-code-analysis]
 raw_sources: []
-last_reinforced: 2026-04-20
-github_commit: "[P-Reinforce] Continuous Worker - DeepCode AI"
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+last_reinforced: 2026-05-10
+github_commit: pending
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: SaaS
+  framework: Snyk Code / DeepCode
 ---

-# [[DeepCode AI|DeepCode AI]]
+# DeepCode AI (Snyk Code)

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> DeepCode AI는 수백만 개의 오픈소스 코드 커밋을 학습하여 취약점을 탐지하고 수정 사항을 제안하는 머신러닝(ML) 기반의 목적 맞춤형 보안 AI 엔진입니다 [1-3]. 2020년 보안 기업 Snyk이 스위스 AI 스타트업인 DeepCode를 인수하여 자사의 정적 애플리케이션 보안 테스트([[SAST|SAST]]) 도구인 Snyk Code의 핵심 인텔리전스 계층으로 통합했습니다 [1, 2, 4]. 이 엔진은 단순한 규칙 기반 패턴 매칭을 넘어 기호적 AI(Symbolic AI)와 신경망을 결합하여 코드의 의미(semantics)와 데이터 흐름을 깊이 있게 이해합니다 [4, 5].
+## 매 한 줄
+> **"매 LLM 의 X — 매 symbolic + neural 의 결합"**. 매 25M+ data flow + 매 19+ language. 매 interfile analysis. 매 commit-based 의 verified fix pattern. 매 modern hybrid 의 example (vs LLM-only Corgea).

-## 📖 구조화된 지식 (Synthesized Content)
- **엔진의 작동 방식 및 분석 기법**
-  DeepCode AI 엔진은 코드를 텍스트로 생성하는 일반적인 대형 언어 모델(LLM) 방식이 아니라, 기호적 추론(symbolic [[Reasoning|Reasoning]])과 신경망을 결합해 코드의 의미론적 표현(semantic representation)을 구축합니다 [1, 5]. 2,500만 개 이상의 데이터 흐름 사례와 19개 이상의 프로그래밍 언어를 지원하며, 파일 간 데이터 흐름(interfile dataflow [[Analysis|Analysis]])을 추적하여 여러 파일이나 모듈의 경계를 넘나드는 복잡한 취약점을 파악합니다 [4, 6, 7]. 고정된 패턴이 아닌 코드의 의도를 이해하므로 대규모 변종 탐지(variant detection)에 매우 뛰어납니다 [6, 8].
+## 매 핵심 differentiator

- **딥코드 AI 픽스(DeepCode AI Fix)를 통한 자동 수정**
-  이 엔진의 가장 눈에 띄는 기능 중 하나는 취약점 발견 시 해결 방안을 제안하는 'DeepCode AI Fix'입니다 [9]. 일반적인 LLM 생성 수정안과 달리, DeepCode AI Fix는 실제 오픈소스 프로젝트에서 개발자들이 해당 취약점을 해결했던 검증된 패턴들을 특별히 학습했습니다 [9]. 이를 통해 가짜 정보(Hallucination) 위험을 줄이고 보다 신뢰할 수 있으며 문맥에 알맞은 수정안을 제시합니다 [7, 9].
+### Hybrid AI (vs LLM-only)
+- 매 symbolic reasoning + 매 NN.
+- 매 semantic representation 의 build.
+- 매 hallucination ↓.
+- 매 interpretable.

- **낮은 오탐률(False Positive Rate)과 개발자 친화성**
-  수많은 실제 오픈소스 커밋의 취약점 패턴과 그에 대응하는 수정 데이터를 학습했기 때문에, 단순히 의심스러워 보이는 코드와 실제로 악용 가능한 코드를 정확히 식별합니다 [3, 10]. 그 결과, 기존의 전통적인 규칙 기반 SAST 도구들을 사용할 때 흔히 겪는 수많은 오탐(False Positives)과 노이즈를 크게 줄일 수 있습니다 [5, 10]. 또한 스캔 속도가 매우 빨라 IDE 내에서 개발 워크플로우를 방해하지 않고 실시간으로 실행될 수 있습니다 [2].
+### Interfile dataflow
+- 매 file boundary 의 cross.
+- 매 multi-module vulnerability 의 catch.

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌:** 자동화 엔진에 의해 매핑된 지식으로, 추후 정밀 검증 필요.
- **정책 변화:** AI 분야의 자동 자산화 수행.
+### Commit-based fix pattern
+- 매 OSS 의 actual fix commits 의 학습.
+- 매 verified pattern.
+- 매 LLM hallucination 의 avoid.

-## 🔗 지식 연결 (Graph)
- **Related Topics:** Snyk Code, [[SAST (Static Application Security Testing)|SAST (Static Application Security [[Testing]])]], Symbolic AI, Machine Learning
- **Projects/Contexts:** Snyk 플랫폼을 통한 IDE 및 CI/CD 파이프라인 통합 보안 검토 프로젝트
- **Contradictions/Notes:** DeepCode AI가 자동으로 취약점을 감지하고 수정안을 제시하지만, 일부 결과는 여전히 수동 검증이 필요하며 분석의 깊이는 언어에 따라 다를 수 있다는 점이 지적됩니다 [6]. 
+### 매 history
+- 매 2017 ETH spinoff (DeepCode).
+- 매 2020 Snyk 의 acquire.
+- 매 2024 DeepCode AI Fix.

---
-*Last updated: 2026-04-19*
+### 매 Snyk 의 stack
+- **Snyk Code** (DeepCode-powered SAST).
+- **Snyk Open Source** (SCA).
+- **Snyk Container** (image scan).
+- **Snyk IaC** (Terraform / K8s).

---
+### 매 vs alternative
+| Tool | Approach | Strength |
+|---|---|---|
+| Snyk Code (DeepCode) | Hybrid neuro-symbolic | Verified fix + low FP |
+| Corgea | LLM-native | Business logic + autofix |
+| Semgrep | Pattern + custom | Speed + control |
+| SonarQube | Rule-based + AI | Quality gate |
+| GitHub Advanced | Code scanning + Copilot Autofix | GitHub integration |

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 limitation
+- 매 LLM-native 의 emerging features (Corgea) 의 less.
+- 매 enterprise SaaS pricing.
+- 매 language-specific depth varies.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+## 💻 패턴 (응용 — Snyk integration)

-**언제 쓰면 안 되는가:**
- *(TODO)*
-
-## 🧪 검증 상태 (Validation)
-
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
-
-## 🧬 중복 검사 (Duplicate Check)
-
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
-
-## 🕓 변경 이력 (Changelog)
-
-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
-
-## 💻 코드 패턴 (Code Patterns)
-
-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
-
-```text
-# TODO
+### CLI scan
+```bash
+npm install -g snyk
+snyk auth
+snyk code test                  # 매 SAST
+snyk code test --json           # 매 JSON output
+snyk code test --severity-threshold=high
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### CI integration
+```yaml
+- name: Snyk Code
+  uses: snyk/actions/node@master
+  env: { SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} }
+  with:
+    command: code test
+    args: --severity-threshold=high --sarif-file-output=snyk-code.sarif

-**선택 A를 써야 할 때:**
- *(TODO)*
+- name: Upload SARIF
+  uses: github/codeql-action/upload-sarif@v3
+  with: { sarif_file: snyk-code.sarif }
+```

-**선택 B를 써야 할 때:**
- *(TODO)*
+### IDE integration
+```
+- VS Code: Snyk Security extension.
+- IntelliJ / WebStorm: Snyk plugin.
+- 매 inline 의 finding + fix 의 click.
+```

-**기본값:**
-> *(TODO)*
+### DeepCode AI Fix workflow
+```
+1. Vulnerability detected (e.g., SQL injection).
+2. AI Fix 의 verified pattern 의 retrieve.
+3. PR comment 의 diff 의 propose.
+4. Developer 의 review + merge.
+5. Snyk 의 re-test 의 confirm fix.
+```

-## ❌ 안티패턴 (Anti-Patterns)
+### Multi-tool layered security
+```yaml
+security_pipeline:
+  pre_commit:
+    - gitleaks  # 매 secret
+  
+  pr:
+    - snyk_code  # 매 SAST (DeepCode)
+    - snyk_open_source  # 매 SCA (CVE)
+    - semgrep  # 매 custom rule
+    - corgea  # 매 LLM-native (optional, parallel)
+  
+  pre_deploy:
+    - snyk_container  # 매 image
+    - cosign  # 매 sign
+  
+  runtime:
+    - falco
+```

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+### Custom rule (Snyk + Semgrep complementary)
+```yaml
+# 매 .snyk policy
+ignore:
+  'SNYK-CC-K8S-1':
+    - '*':
+        reason: 'Internal dev cluster — non-prod'
+        expires: '2026-12-31T00:00:00Z'
+
+# 매 semgrep for org-specific
+rules:
+  - id: internal-deprecated-api
+    pattern: oldClient.deprecatedMethod(...)
+    message: Use newClient instead.
+    severity: WARNING
+```
+
+### Vulnerability triage
+```python
+def triage_findings(snyk_findings):
+    triaged = []
+    for f in snyk_findings:
+        priority = (
+            f['severity_score'] *
+            f['exploit_maturity_factor'] *  # 매 0.5-2
+            f['reachability_factor']         # 매 0.3-1.5
+        )
+        triaged.append({
+            **f,
+            'priority': priority,
+            'sla_hours': sla_for_severity(f['severity']),
+        })
+    return sorted(triaged, key=lambda x: -x['priority'])
+```
+
+### Auto-fix verification
+```python
+def verify_fix(original_code, ai_proposed_fix):
+    # 매 1. syntax check
+    if not parses_correctly(ai_proposed_fix): return 'invalid syntax'
+    
+    # 매 2. test still passes
+    if not run_tests(ai_proposed_fix): return 'tests fail'
+    
+    # 매 3. vulnerability resolved
+    if scan(ai_proposed_fix).has_vuln: return 'vuln remains'
+    
+    # 매 4. no new vuln introduced
+    new_vulns = set(scan(ai_proposed_fix).vulns) - set(scan(original_code).vulns)
+    if new_vulns: return f'introduces new: {new_vulns}'
+    
+    return 'verified'
+```
+
+### SARIF (standard output)
+```python
+import json
+
+def parse_sarif(sarif_file):
+    with open(sarif_file) as f:
+        data = json.load(f)
+    
+    findings = []
+    for run in data['runs']:
+        for result in run['results']:
+            findings.append({
+                'rule': result['ruleId'],
+                'severity': result['level'],
+                'message': result['message']['text'],
+                'file': result['locations'][0]['physicalLocation']['artifactLocation']['uri'],
+                'line': result['locations'][0]['physicalLocation']['region']['startLine'],
+            })
+    return findings
+```
+
+### Suppress false positives
+```js
+// 매 Snyk 의 inline ignore
+function safe_html(input) {
+  // snyk-ignore: javascript/xss — 매 input 의 sanitized at boundary
+  return `<div>${input}</div>`;
+}
+```
+
+## 매 결정 기준
+| 상황 | Tool |
+|---|---|
+| Mid-large + budget | Snyk Code (DeepCode) |
+| AI-native focus | Corgea |
+| Custom rules speed | Semgrep |
+| Open-source self-host | SemGrep |
+| GitHub native | GitHub Advanced Security |
+| Enterprise compliance | Veracode / Checkmarx |
+
+**기본값**: 매 Snyk + Semgrep complementary.
+
+## 🔗 Graph
+- 부모: [[SAST]] · [[DevSecOps]]
+- 변형: [[Snyk-Code]] · [[Symbolic-AI]] · [[Hybrid-AI]] · [[Neuro-Symbolic-AI]]
+- 응용: [[Corgea]] · [[Semgrep]] · [[SonarQube]] · [[CI_CD 파이프라인 및 IDE 통합 보안]]
+- Adjacent: [[AI 코드 리뷰 및 보안 취약점 점검(DevSecOps)]] · [[Custom-ESLint-Rules-Development]] · [[CodeScene]] · [[AI 생성 코드 검증(AI Code Assurance)]]
+
+## 🤖 LLM 활용
+**언제**: 매 enterprise SAST. 매 multi-language. 매 verified autofix.
+**언제 X**: 매 budget-tight (Semgrep). 매 air-gapped.
+
+## ❌ 안티패턴
+- **Single tool**: 매 layered defense X.
+- **No triage**: 매 alert fatigue.
+- **AI Fix 의 blind merge**: 매 verify 의 still 필요.
+- **No SARIF integration**: 매 dashboard 의 single source X.
+
+## 🧪 검증 / 중복
+- Verified (Snyk docs, DeepCode papers, ETH spinoff history).
+- 신뢰도 B.
+- Related: [[Corgea]] · [[CI_CD 파이프라인 및 IDE 통합 보안]] · [[Custom-ESLint-Rules-Development]] · [[CodeScene]].
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — neuro-symbolic + 매 CI / SARIF / triage / verify code |