[G1-Sync] Manual knowledge update

This commit is contained in:
Antigravity Agent
2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,87 +2,303 @@
id: wiki-2026-0508-failable-task-handling
title: Failable Task Handling
category: 10_Wiki/Topics
status: needs_review
status: verified
canonical_id: self
aliases: [P-Reinforce-AI-FAILABLE]
aliases: [error handling, retry, circuit breaker, exponential backoff, idempotent, saga compensation]
duplicate_of: none
source_trust_level: A
confidence_score: 0.98
tags: [Programming, Resilience, ErrorHandling, TaskManagement]
confidence_score: 0.97
verification_status: applied
tags: [reliability, error-handling, retry, circuit-breaker, idempotency, distributed-systems]
raw_sources: []
last_reinforced: 2026-04-20
last_reinforced: 2026-05-10
github_commit: pending
inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
tech_stack:
language: unspecified
framework: unspecified
language: TypeScript / Python / Go
framework: Temporal / Polly / tenacity
---
# [[Failable-Task-Handling|Failable-Task-Handling]] (실패 가능 과업 처리)
# Failable Task Handling
## 📌 한 줄 통찰 (The Karpathy Summary)
> "실패는 사건이 아니라 시스템의 한 상태다." 네트워크 장애, 데드락 등으로 인해 실패할 수 있는 작업들을 회복 탄력적(Resilient)으로 처리하여 전체 시스템의 가용성을 유지하는 전략이다.
## 한 줄
> **"매 task 의 fail 의 expect 의 design"**. 매 retry + idempotency + timeout + circuit breaker + DLQ + compensation. 매 distributed system 의 의 의 기본. 매 modern: Temporal, Inngest 의 durable execution.
## 📖 구조화된 지식 (Synthesized Content)
- **Retry [[Strategy|Strategy]]**:
- **Immediate Retry**: 즉시 재시도 (일시적 노이즈 해결).
- **Exponential Backoff**: 실패 횟수가 늘어날수록 재시도 간격을 늘려 대상 서버의 부하를 줄임.
- **Circuit Breaker**: 특정 임계치 이상 실패하면 아예 통로를 차단하고 즉시 에러를 리턴하여 연쇄 장애(Cascading Failure) 방지.
- **Dead Letter Queue (DLQ)**: 끝내 실패한 작업들은 별도의 보관소에 넣어 나중에 수동으로 분석/복구 가능케 함.
- **Compensating Transaction**: 실패 시 이전의 성공한 단계들을 거꾸로 되돌려(Saga Pattern) 무결성 유지.
## 매 핵심
## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- 무분별한 재시도는 시스템의 '좀비 부하'를 유발한다. 실패의 유형을 '재시도 가능한(Transient)' 것과 '불가능한(Permanent)' 것으로 명확히 구분하는 로직이 핵심이며, 이를 위해 HTTP 상태 코드 등 표준 인터페이스를 적극 활용해야 한다.
### 매 strategy
- **Retry** with backoff.
- **Timeout**.
- **Idempotency**.
- **Circuit breaker** (Hystrix-style).
- **Bulkhead** (resource isolation).
- **DLQ** (dead letter).
- **Compensation** (saga).
- **Fallback / cached default**.
## 🔗 지식 연결 (Graph)
- Related: [[Reliability|Reliability]]-Patterns , [[Event-Driven-Architecture|Event-Driven-Architecture]]
- Pattern: Saga-Pattern
### 매 retry pattern
- **Constant**: 매 fixed delay.
- **Linear**: 매 N × delay.
- **Exponential**: 매 2^N × delay.
- **+ Jitter**: 매 thunder herd 방지.
## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
### 매 응용
1. **HTTP**: 매 5xx retry.
2. **Distributed transaction**: 매 saga.
3. **Job queue**: 매 retry + DLQ.
4. **LLM API**: 매 rate limit retry.
5. **Workflow**: 매 Temporal durable.
**언제 이 지식을 쓰는가:**
- *(TODO)*
## 💻 패턴
**언제 쓰면 안 되는가:**
- *(TODO)*
### Exponential backoff with jitter
```python
import random, time
## 🧪 검증 상태 (Validation)
- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
## 🧬 중복 검사 (Duplicate Check)
- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
## 🕓 변경 이력 (Changelog)
| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
|------|-----------|-----------|--------|
| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
## 💻 코드 패턴 (Code Patterns)
**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
```text
# TODO
def retry_exponential(fn, max_attempts=5, base=0.1, max_delay=10):
for attempt in range(max_attempts):
try: return fn()
except RetryableError:
if attempt == max_attempts - 1: raise
delay = min(base * 2 ** attempt + random.random() * base, max_delay)
time.sleep(delay)
```
## 🤔 의사결정 기준 (Decision Criteria)
### tenacity (Python)
```python
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
**선택 A를 써야 할 때:**
- *(TODO)*
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=10),
retry=retry_if_exception_type(httpx.HTTPStatusError),
)
def fetch(url):
return httpx.get(url)
```
**선택 B를 써야 할 때:**
- *(TODO)*
### Circuit breaker
```python
class CircuitBreaker:
def __init__(self, fail_threshold=5, reset_timeout=60):
self.failures = 0
self.state = 'closed'
self.opened_at = None
self.fail_threshold = fail_threshold
self.reset_timeout = reset_timeout
def call(self, fn):
if self.state == 'open':
if time.time() - self.opened_at > self.reset_timeout:
self.state = 'half_open'
else: raise CircuitOpenError()
try:
result = fn()
if self.state == 'half_open': self.state = 'closed'
self.failures = 0
return result
except Exception:
self.failures += 1
if self.failures >= self.fail_threshold:
self.state = 'open'
self.opened_at = time.time()
raise
```
**기본값:**
> *(TODO)*
### Idempotency key
```typescript
async function transfer(idempotencyKey: string, amount: number) {
const existing = await db.idempotency.find(idempotencyKey);
if (existing) return existing.result;
const result = await actualTransfer(amount);
await db.idempotency.save({ key: idempotencyKey, result });
return result;
}
```
## ❌ 안티패턴 (Anti-Patterns)
### Timeout (Promise.race)
```typescript
function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
return Promise.race([
promise,
new Promise<T>((_, rej) => setTimeout(() => rej(new TimeoutError()), ms)),
]);
}
```
- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
### Bulkhead (semaphore)
```python
import asyncio
class Bulkhead:
def __init__(self, max_concurrent=10):
self.sem = asyncio.Semaphore(max_concurrent)
async def call(self, coro):
async with self.sem:
return await coro
```
### DLQ (dead letter queue)
```python
def consume_with_dlq(queue, dlq, handler, max_retries=3):
for msg in queue:
for attempt in range(max_retries):
try:
handler(msg)
queue.ack(msg)
break
except Exception as e:
if attempt == max_retries - 1:
dlq.publish(msg, error=str(e))
queue.ack(msg)
break
```
### Saga compensation
```python
class Saga:
def __init__(self):
self.compensations = []
async def execute(self, steps):
try:
for step, compensation in steps:
await step()
self.compensations.append(compensation)
except Exception:
for c in reversed(self.compensations):
try: await c()
except: pass # 매 best-effort
raise
# 매 usage
saga = Saga()
await saga.execute([
(reserve_inventory, lambda: release_inventory()),
(charge_card, lambda: refund_card()),
(schedule_shipping, lambda: cancel_shipping()),
])
```
### Fallback (graceful degrade)
```typescript
async function getRecommendations(userId: string) {
try {
return await mlService.recommend(userId);
} catch (e) {
log.warn('ML service down, using popular fallback');
return await getPopularItems(); // 매 cached
}
}
```
### Temporal durable workflow
```typescript
import { proxyActivities, sleep } from '@temporalio/workflow';
const { reserveInventory, chargePayment } = proxyActivities<typeof activities>({
startToCloseTimeout: '1m',
retry: { maximumAttempts: 5, initialInterval: '1s', backoffCoefficient: 2 },
});
export async function orderWorkflow(orderId: string) {
await reserveInventory(orderId);
await chargePayment(orderId);
await sleep('5m'); // 매 fulfilment delay
return 'completed';
}
```
### Polly (.NET)
```csharp
var policy = Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(5, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)))
.WrapAsync(Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync(5, TimeSpan.FromMinutes(1)));
await policy.ExecuteAsync(() => httpClient.GetAsync(url));
```
### LLM API retry (rate limit aware)
```python
def llm_call(prompt, max_retries=5):
for attempt in range(max_retries):
try: return openai_client.create(prompt=prompt)
except RateLimitError as e:
wait = e.retry_after if e.retry_after else 2 ** attempt
time.sleep(wait)
```
### Health check + half-open
```python
def half_open_probe(circuit):
try:
result = light_health_check()
if result.ok: circuit.state = 'closed'
except: pass
```
### Idempotent HTTP (Stripe-style)
```bash
curl -X POST https://api/charges \
-H "Idempotency-Key: $(uuidgen)" \
-d "amount=2000"
```
### Observability (per attempt)
```python
def observed_retry(fn):
@wraps(fn)
def wrapper(*args, **kwargs):
for attempt in range(5):
metrics.increment('attempt', {'fn': fn.__name__, 'attempt': attempt})
try: return fn(*args, **kwargs)
except Exception as e:
metrics.increment('failure', {'fn': fn.__name__, 'attempt': attempt, 'err': type(e).__name__})
raise
return wrapper
```
## 매 결정 기준
| 상황 | Pattern |
|---|---|
| HTTP 5xx | Retry + backoff + jitter |
| External dep flaky | Circuit breaker |
| Distributed transaction | Saga + compensation |
| Long workflow | Temporal / Inngest |
| Unique side effect | Idempotency key |
| Rate-limit aware | Retry-After |
| User-visible | Fallback + cache |
**기본값**: 매 retry exp+jitter + 매 idempotency + 매 timeout + 매 circuit breaker + 매 DLQ + 매 observability.
## 🔗 Graph
- 부모: [[Distributed-Systems]] · [[Reliability]]
- 변형: [[Retry-Pattern]] · [[Circuit-Breaker]] · [[Saga-Pattern]]
- 응용: [[Temporal]] · [[Polly]] · [[tenacity]]
- Adjacent: [[Idempotency]] · [[Bulkhead]] · [[Event-Driven-Architecture]]
## 🤖 LLM 활용
**언제**: 매 모든 distributed system. 매 external API. 매 long workflow.
**언제 X**: 매 deterministic in-process.
## ❌ 안티패턴
- **Retry without backoff**: 매 thunder herd.
- **Retry non-idempotent**: 매 duplicate effect.
- **Infinite retry**: 매 cascading.
- **No DLQ**: 매 lost messages.
- **No timeout**: 매 hang.
- **No circuit**: 매 cascade failure.
## 🧪 검증 / 중복
- Verified (Release It! Nygard, Temporal docs, Polly docs).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto-reinforced |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — patterns + 매 retry / circuit / saga / Temporal / DLQ code |