Files

T

Antigravity Agent 93ec7e9056 [G1-Sync] Manual knowledge update

2026-05-09 21:08:02 +09:00

5.1 KiB

Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases

title

Dead Letter Queue

N번 재시도 후 실패한 메시지 = DLQ 로 보냄. 재시도 무한 / 메시지 손실 둘 다 방지. SQS / RabbitMQ DLX / Kafka retry topics.

📖 핵심 개념

Poison message: 처리 불가 — 영원 재시도 시 큐 막힘.
Max retries 후 DLQ 로 이동.
DLQ 모니터링 + 알람 + redrive (재처리).

💻 코드 패턴

SQS — built-in DLQ

resource "aws_sqs_queue" "main" {
  name                       = "orders"
  visibility_timeout_seconds = 60
  message_retention_seconds  = 4 * 24 * 3600

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq.arn
    maxReceiveCount     = 5  # 5번 후 DLQ
  })
}

resource "aws_sqs_queue" "dlq" {
  name                      = "orders-dlq"
  message_retention_seconds = 14 * 24 * 3600
}

CloudWatch alarm: ApproximateNumberOfMessagesVisible > 0 in DLQ.

Redrive (DLQ → main)

# AWS console: Start redrive
# 또는 CLI
aws sqs start-message-move-task \
  --source-arn $DLQ_ARN \
  --destination-arn $MAIN_ARN

RabbitMQ — DLX

await ch.assertExchange('orders.dlx', 'direct', { durable: true });
await ch.assertQueue('orders.dlq', { durable: true });
await ch.bindQueue('orders.dlq', 'orders.dlx', 'orders');

await ch.assertQueue('orders', {
  durable: true,
  arguments: {
    'x-dead-letter-exchange': 'orders.dlx',
    'x-dead-letter-routing-key': 'orders',
    'x-message-ttl': 60_000,
  },
});

ch.consume('orders', async (msg) => {
  if (!msg) return;
  const retries = msg.properties.headers?.['x-retries'] ?? 0;
  if (retries >= 5) {
    return ch.nack(msg, false, false); // DLX 로
  }
  try {
    await handle(msg);
    ch.ack(msg);
  } catch {
    // 다시 publish with x-retries+1
    ch.publish('', 'orders', msg.content, {
      headers: { ...msg.properties.headers, 'x-retries': retries + 1 },
    });
    ch.ack(msg);
  }
});

Kafka — retry topics 패턴

orders            (메인)
orders.retry.5s
orders.retry.30s
orders.retry.5m
orders.dlq

async function handleWithRetry(msg: KafkaMessage) {
  try {
    await handle(msg);
  } catch (e) {
    const retry = Number(msg.headers!['x-retry'] ?? 0);
    const next = ['orders.retry.5s', 'orders.retry.30s', 'orders.retry.5m'];
    const target = retry < next.length ? next[retry] : 'orders.dlq';
    await producer.send({
      topic: target,
      messages: [{
        key: msg.key, value: msg.value,
        headers: { ...msg.headers, 'x-retry': String(retry + 1), 'x-error': String(e) },
      }],
    });
  }
}

각 retry topic 의 consumer 가 delay 후 main 으로 이동.

DLQ 검사 + 재처리

// CLI 도구
async function inspectDlq() {
  const r = await sqs.send(new ReceiveMessageCommand({ QueueUrl: dlqUrl, MaxNumberOfMessages: 10 }));
  for (const m of r.Messages ?? []) {
    console.log(m.MessageId, m.Body);
    console.log('Error:', m.MessageAttributes?.ErrorMessage?.StringValue);
  }
}

async function redriveOne(msgId: string, fixedBody: string) {
  // DLQ → main 재발행
  await sqs.send(new SendMessageCommand({ QueueUrl: mainUrl, MessageBody: fixedBody }));
  // DLQ 에서 삭제
  await sqs.send(new DeleteMessageCommand({ QueueUrl: dlqUrl, ReceiptHandle: ... }));
}

Error meta 첨부

await producer.send({
  topic: 'orders.dlq',
  messages: [{
    key, value,
    headers: {
      ...origHeaders,
      'x-error-type': e.name,
      'x-error-message': e.message,
      'x-error-stack': e.stack?.slice(0, 1000),
      'x-failed-at': new Date().toISOString(),
      'x-original-topic': 'orders',
    },
  }],
});

Alarm

# Prometheus
- alert: DLQGrowing
  expr: rate(sqs_messages_visible{queue="orders-dlq"}[5m]) > 0
  for: 10m
  annotations: { summary: "Orders DLQ growing" }

🤔 의사결정 기준

상황	추천
AWS SQS	Built-in redrive policy
RabbitMQ	DLX + TTL queue
Kafka	Retry topics 패턴
Pulsar	Built-in retry/DLQ
작은 처리량	단일 DLQ + 수동 검사
큰 + 자동 복구	자동 redrive + 알람

❌ 안티패턴

DLQ 없음: 영원 재시도 → 큐 막힘.
MaxReceiveCount 너무 높음 (100+): poison 처리 늦음.
너무 낮음 (1): 일시 에러도 DLQ.
Error context 없음: 디버깅 불가.
Alarm 없음: DLQ 가득 모름.
자동 redrive 무한: 같은 에러 무한 반복. fix 후 manual.
DLQ retention 짧음: 분석 전에 사라짐. 14일 권장.

🤖 LLM 활용 힌트

maxReceiveCount = 3-5.
Error 메타 헤더 첨부.
DLQ size alarm 필수.

5.1 KiB Raw Blame History