Files
2nd/10_Wiki/Topics/AI_and_ML/Failable-Task-Handling.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

8.4 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-failable-task-handling Failable Task Handling 10_Wiki/Topics verified self
error handling
retry
circuit breaker
exponential backoff
idempotent
saga compensation
none A 0.97 applied
reliability
error-handling
retry
circuit-breaker
idempotency
distributed-systems
2026-05-10 pending
language framework
TypeScript / Python / Go Temporal / Polly / tenacity

Failable Task Handling

매 한 줄

"매 task 의 fail 의 expect 의 design". 매 retry + idempotency + timeout + circuit breaker + DLQ + compensation. 매 distributed system 의 의 의 기본. 매 modern: Temporal, Inngest 의 durable execution.

매 핵심

매 strategy

  • Retry with backoff.
  • Timeout.
  • Idempotency.
  • Circuit breaker (Hystrix-style).
  • Bulkhead (resource isolation).
  • DLQ (dead letter).
  • Compensation (saga).
  • Fallback / cached default.

매 retry pattern

  • Constant: 매 fixed delay.
  • Linear: 매 N × delay.
  • Exponential: 매 2^N × delay.
  • + Jitter: 매 thunder herd 방지.

매 응용

  1. HTTP: 매 5xx retry.
  2. Distributed transaction: 매 saga.
  3. Job queue: 매 retry + DLQ.
  4. LLM API: 매 rate limit retry.
  5. Workflow: 매 Temporal durable.

💻 패턴

Exponential backoff with jitter

import random, time

def retry_exponential(fn, max_attempts=5, base=0.1, max_delay=10):
    for attempt in range(max_attempts):
        try: return fn()
        except RetryableError:
            if attempt == max_attempts - 1: raise
            delay = min(base * 2 ** attempt + random.random() * base, max_delay)
            time.sleep(delay)

tenacity (Python)

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=10),
    retry=retry_if_exception_type(httpx.HTTPStatusError),
)
def fetch(url):
    return httpx.get(url)

Circuit breaker

class CircuitBreaker:
    def __init__(self, fail_threshold=5, reset_timeout=60):
        self.failures = 0
        self.state = 'closed'
        self.opened_at = None
        self.fail_threshold = fail_threshold
        self.reset_timeout = reset_timeout
    
    def call(self, fn):
        if self.state == 'open':
            if time.time() - self.opened_at > self.reset_timeout:
                self.state = 'half_open'
            else: raise CircuitOpenError()
        
        try:
            result = fn()
            if self.state == 'half_open': self.state = 'closed'
            self.failures = 0
            return result
        except Exception:
            self.failures += 1
            if self.failures >= self.fail_threshold:
                self.state = 'open'
                self.opened_at = time.time()
            raise

Idempotency key

async function transfer(idempotencyKey: string, amount: number) {
  const existing = await db.idempotency.find(idempotencyKey);
  if (existing) return existing.result;
  
  const result = await actualTransfer(amount);
  await db.idempotency.save({ key: idempotencyKey, result });
  return result;
}

Timeout (Promise.race)

function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
  return Promise.race([
    promise,
    new Promise<T>((_, rej) => setTimeout(() => rej(new TimeoutError()), ms)),
  ]);
}

Bulkhead (semaphore)

import asyncio
class Bulkhead:
    def __init__(self, max_concurrent=10):
        self.sem = asyncio.Semaphore(max_concurrent)
    
    async def call(self, coro):
        async with self.sem:
            return await coro

DLQ (dead letter queue)

def consume_with_dlq(queue, dlq, handler, max_retries=3):
    for msg in queue:
        for attempt in range(max_retries):
            try:
                handler(msg)
                queue.ack(msg)
                break
            except Exception as e:
                if attempt == max_retries - 1:
                    dlq.publish(msg, error=str(e))
                    queue.ack(msg)
                    break

Saga compensation

class Saga:
    def __init__(self):
        self.compensations = []
    
    async def execute(self, steps):
        try:
            for step, compensation in steps:
                await step()
                self.compensations.append(compensation)
        except Exception:
            for c in reversed(self.compensations):
                try: await c()
                except: pass  # 매 best-effort
            raise

# 매 usage
saga = Saga()
await saga.execute([
    (reserve_inventory, lambda: release_inventory()),
    (charge_card, lambda: refund_card()),
    (schedule_shipping, lambda: cancel_shipping()),
])

Fallback (graceful degrade)

async function getRecommendations(userId: string) {
  try {
    return await mlService.recommend(userId);
  } catch (e) {
    log.warn('ML service down, using popular fallback');
    return await getPopularItems();  // 매 cached
  }
}

Temporal durable workflow

import { proxyActivities, sleep } from '@temporalio/workflow';
const { reserveInventory, chargePayment } = proxyActivities<typeof activities>({
  startToCloseTimeout: '1m',
  retry: { maximumAttempts: 5, initialInterval: '1s', backoffCoefficient: 2 },
});

export async function orderWorkflow(orderId: string) {
  await reserveInventory(orderId);
  await chargePayment(orderId);
  await sleep('5m');  // 매 fulfilment delay
  return 'completed';
}

Polly (.NET)

var policy = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(5, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)))
    .WrapAsync(Policy
        .Handle<HttpRequestException>()
        .CircuitBreakerAsync(5, TimeSpan.FromMinutes(1)));

await policy.ExecuteAsync(() => httpClient.GetAsync(url));

LLM API retry (rate limit aware)

def llm_call(prompt, max_retries=5):
    for attempt in range(max_retries):
        try: return openai_client.create(prompt=prompt)
        except RateLimitError as e:
            wait = e.retry_after if e.retry_after else 2 ** attempt
            time.sleep(wait)

Health check + half-open

def half_open_probe(circuit):
    try:
        result = light_health_check()
        if result.ok: circuit.state = 'closed'
    except: pass

Idempotent HTTP (Stripe-style)

curl -X POST https://api/charges \
  -H "Idempotency-Key: $(uuidgen)" \
  -d "amount=2000"

Observability (per attempt)

def observed_retry(fn):
    @wraps(fn)
    def wrapper(*args, **kwargs):
        for attempt in range(5):
            metrics.increment('attempt', {'fn': fn.__name__, 'attempt': attempt})
            try: return fn(*args, **kwargs)
            except Exception as e:
                metrics.increment('failure', {'fn': fn.__name__, 'attempt': attempt, 'err': type(e).__name__})
        raise
    return wrapper

매 결정 기준

상황 Pattern
HTTP 5xx Retry + backoff + jitter
External dep flaky Circuit breaker
Distributed transaction Saga + compensation
Long workflow Temporal / Inngest
Unique side effect Idempotency key
Rate-limit aware Retry-After
User-visible Fallback + cache

기본값: 매 retry exp+jitter + 매 idempotency + 매 timeout + 매 circuit breaker + 매 DLQ + 매 observability.

🔗 Graph

🤖 LLM 활용

언제: 매 모든 distributed system. 매 external API. 매 long workflow. 언제 X: 매 deterministic in-process.

안티패턴

  • Retry without backoff: 매 thunder herd.
  • Retry non-idempotent: 매 duplicate effect.
  • Infinite retry: 매 cascading.
  • No DLQ: 매 lost messages.
  • No timeout: 매 hang.
  • No circuit: 매 cascade failure.

🧪 검증 / 중복

  • Verified (Release It! Nygard, Temporal docs, Polly docs).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-20 Auto-reinforced
2026-05-08 Phase 1
2026-05-10 Manual cleanup — patterns + 매 retry / circuit / saga / Temporal / DLQ code