f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
305 lines
8.4 KiB
Markdown
305 lines
8.4 KiB
Markdown
---
|
||
id: wiki-2026-0508-failable-task-handling
|
||
title: Failable Task Handling
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [error handling, retry, circuit breaker, exponential backoff, idempotent, saga compensation]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.97
|
||
verification_status: applied
|
||
tags: [reliability, error-handling, retry, circuit-breaker, idempotency, distributed-systems]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: TypeScript / Python / Go
|
||
framework: Temporal / Polly / tenacity
|
||
---
|
||
|
||
# Failable Task Handling
|
||
|
||
## 매 한 줄
|
||
> **"매 task 의 fail 의 expect 의 design"**. 매 retry + idempotency + timeout + circuit breaker + DLQ + compensation. 매 distributed system 의 의 의 기본. 매 modern: Temporal, Inngest 의 durable execution.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 strategy
|
||
- **Retry** with backoff.
|
||
- **Timeout**.
|
||
- **Idempotency**.
|
||
- **Circuit breaker** (Hystrix-style).
|
||
- **Bulkhead** (resource isolation).
|
||
- **DLQ** (dead letter).
|
||
- **Compensation** (saga).
|
||
- **Fallback / cached default**.
|
||
|
||
### 매 retry pattern
|
||
- **Constant**: 매 fixed delay.
|
||
- **Linear**: 매 N × delay.
|
||
- **Exponential**: 매 2^N × delay.
|
||
- **+ Jitter**: 매 thunder herd 방지.
|
||
|
||
### 매 응용
|
||
1. **HTTP**: 매 5xx retry.
|
||
2. **Distributed transaction**: 매 saga.
|
||
3. **Job queue**: 매 retry + DLQ.
|
||
4. **LLM API**: 매 rate limit retry.
|
||
5. **Workflow**: 매 Temporal durable.
|
||
|
||
## 💻 패턴
|
||
|
||
### Exponential backoff with jitter
|
||
```python
|
||
import random, time
|
||
|
||
def retry_exponential(fn, max_attempts=5, base=0.1, max_delay=10):
|
||
for attempt in range(max_attempts):
|
||
try: return fn()
|
||
except RetryableError:
|
||
if attempt == max_attempts - 1: raise
|
||
delay = min(base * 2 ** attempt + random.random() * base, max_delay)
|
||
time.sleep(delay)
|
||
```
|
||
|
||
### tenacity (Python)
|
||
```python
|
||
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
|
||
|
||
@retry(
|
||
stop=stop_after_attempt(5),
|
||
wait=wait_exponential(multiplier=1, min=1, max=10),
|
||
retry=retry_if_exception_type(httpx.HTTPStatusError),
|
||
)
|
||
def fetch(url):
|
||
return httpx.get(url)
|
||
```
|
||
|
||
### Circuit breaker
|
||
```python
|
||
class CircuitBreaker:
|
||
def __init__(self, fail_threshold=5, reset_timeout=60):
|
||
self.failures = 0
|
||
self.state = 'closed'
|
||
self.opened_at = None
|
||
self.fail_threshold = fail_threshold
|
||
self.reset_timeout = reset_timeout
|
||
|
||
def call(self, fn):
|
||
if self.state == 'open':
|
||
if time.time() - self.opened_at > self.reset_timeout:
|
||
self.state = 'half_open'
|
||
else: raise CircuitOpenError()
|
||
|
||
try:
|
||
result = fn()
|
||
if self.state == 'half_open': self.state = 'closed'
|
||
self.failures = 0
|
||
return result
|
||
except Exception:
|
||
self.failures += 1
|
||
if self.failures >= self.fail_threshold:
|
||
self.state = 'open'
|
||
self.opened_at = time.time()
|
||
raise
|
||
```
|
||
|
||
### Idempotency key
|
||
```typescript
|
||
async function transfer(idempotencyKey: string, amount: number) {
|
||
const existing = await db.idempotency.find(idempotencyKey);
|
||
if (existing) return existing.result;
|
||
|
||
const result = await actualTransfer(amount);
|
||
await db.idempotency.save({ key: idempotencyKey, result });
|
||
return result;
|
||
}
|
||
```
|
||
|
||
### Timeout (Promise.race)
|
||
```typescript
|
||
function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
|
||
return Promise.race([
|
||
promise,
|
||
new Promise<T>((_, rej) => setTimeout(() => rej(new TimeoutError()), ms)),
|
||
]);
|
||
}
|
||
```
|
||
|
||
### Bulkhead (semaphore)
|
||
```python
|
||
import asyncio
|
||
class Bulkhead:
|
||
def __init__(self, max_concurrent=10):
|
||
self.sem = asyncio.Semaphore(max_concurrent)
|
||
|
||
async def call(self, coro):
|
||
async with self.sem:
|
||
return await coro
|
||
```
|
||
|
||
### DLQ (dead letter queue)
|
||
```python
|
||
def consume_with_dlq(queue, dlq, handler, max_retries=3):
|
||
for msg in queue:
|
||
for attempt in range(max_retries):
|
||
try:
|
||
handler(msg)
|
||
queue.ack(msg)
|
||
break
|
||
except Exception as e:
|
||
if attempt == max_retries - 1:
|
||
dlq.publish(msg, error=str(e))
|
||
queue.ack(msg)
|
||
break
|
||
```
|
||
|
||
### Saga compensation
|
||
```python
|
||
class Saga:
|
||
def __init__(self):
|
||
self.compensations = []
|
||
|
||
async def execute(self, steps):
|
||
try:
|
||
for step, compensation in steps:
|
||
await step()
|
||
self.compensations.append(compensation)
|
||
except Exception:
|
||
for c in reversed(self.compensations):
|
||
try: await c()
|
||
except: pass # 매 best-effort
|
||
raise
|
||
|
||
# 매 usage
|
||
saga = Saga()
|
||
await saga.execute([
|
||
(reserve_inventory, lambda: release_inventory()),
|
||
(charge_card, lambda: refund_card()),
|
||
(schedule_shipping, lambda: cancel_shipping()),
|
||
])
|
||
```
|
||
|
||
### Fallback (graceful degrade)
|
||
```typescript
|
||
async function getRecommendations(userId: string) {
|
||
try {
|
||
return await mlService.recommend(userId);
|
||
} catch (e) {
|
||
log.warn('ML service down, using popular fallback');
|
||
return await getPopularItems(); // 매 cached
|
||
}
|
||
}
|
||
```
|
||
|
||
### Temporal durable workflow
|
||
```typescript
|
||
import { proxyActivities, sleep } from '@temporalio/workflow';
|
||
const { reserveInventory, chargePayment } = proxyActivities<typeof activities>({
|
||
startToCloseTimeout: '1m',
|
||
retry: { maximumAttempts: 5, initialInterval: '1s', backoffCoefficient: 2 },
|
||
});
|
||
|
||
export async function orderWorkflow(orderId: string) {
|
||
await reserveInventory(orderId);
|
||
await chargePayment(orderId);
|
||
await sleep('5m'); // 매 fulfilment delay
|
||
return 'completed';
|
||
}
|
||
```
|
||
|
||
### Polly (.NET)
|
||
```csharp
|
||
var policy = Policy
|
||
.Handle<HttpRequestException>()
|
||
.WaitAndRetryAsync(5, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)))
|
||
.WrapAsync(Policy
|
||
.Handle<HttpRequestException>()
|
||
.CircuitBreakerAsync(5, TimeSpan.FromMinutes(1)));
|
||
|
||
await policy.ExecuteAsync(() => httpClient.GetAsync(url));
|
||
```
|
||
|
||
### LLM API retry (rate limit aware)
|
||
```python
|
||
def llm_call(prompt, max_retries=5):
|
||
for attempt in range(max_retries):
|
||
try: return openai_client.create(prompt=prompt)
|
||
except RateLimitError as e:
|
||
wait = e.retry_after if e.retry_after else 2 ** attempt
|
||
time.sleep(wait)
|
||
```
|
||
|
||
### Health check + half-open
|
||
```python
|
||
def half_open_probe(circuit):
|
||
try:
|
||
result = light_health_check()
|
||
if result.ok: circuit.state = 'closed'
|
||
except: pass
|
||
```
|
||
|
||
### Idempotent HTTP (Stripe-style)
|
||
```bash
|
||
curl -X POST https://api/charges \
|
||
-H "Idempotency-Key: $(uuidgen)" \
|
||
-d "amount=2000"
|
||
```
|
||
|
||
### Observability (per attempt)
|
||
```python
|
||
def observed_retry(fn):
|
||
@wraps(fn)
|
||
def wrapper(*args, **kwargs):
|
||
for attempt in range(5):
|
||
metrics.increment('attempt', {'fn': fn.__name__, 'attempt': attempt})
|
||
try: return fn(*args, **kwargs)
|
||
except Exception as e:
|
||
metrics.increment('failure', {'fn': fn.__name__, 'attempt': attempt, 'err': type(e).__name__})
|
||
raise
|
||
return wrapper
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Pattern |
|
||
|---|---|
|
||
| HTTP 5xx | Retry + backoff + jitter |
|
||
| External dep flaky | Circuit breaker |
|
||
| Distributed transaction | Saga + compensation |
|
||
| Long workflow | Temporal / Inngest |
|
||
| Unique side effect | Idempotency key |
|
||
| Rate-limit aware | Retry-After |
|
||
| User-visible | Fallback + cache |
|
||
|
||
**기본값**: 매 retry exp+jitter + 매 idempotency + 매 timeout + 매 circuit breaker + 매 DLQ + 매 observability.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Distributed-Systems]] · [[Reliability]]
|
||
- 변형: [[Circuit-Breaker]]
|
||
- 응용: [[Temporal]]
|
||
- Adjacent: [[Idempotency]] · [[Bulkhead]] · [[Event-Driven-Architecture]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: 매 모든 distributed system. 매 external API. 매 long workflow.
|
||
**언제 X**: 매 deterministic in-process.
|
||
|
||
## ❌ 안티패턴
|
||
- **Retry without backoff**: 매 thunder herd.
|
||
- **Retry non-idempotent**: 매 duplicate effect.
|
||
- **Infinite retry**: 매 cascading.
|
||
- **No DLQ**: 매 lost messages.
|
||
- **No timeout**: 매 hang.
|
||
- **No circuit**: 매 cascade failure.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Release It! Nygard, Temporal docs, Polly docs).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-04-20 | Auto-reinforced |
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — patterns + 매 retry / circuit / saga / Temporal / DLQ code |
|