[G1-Sync] Manual knowledge update

This commit is contained in:
Antigravity Agent
2026-05-09 21:08:02 +09:00
parent f0befc887a
commit 93ec7e9056
363 changed files with 68333 additions and 64 deletions
@@ -0,0 +1,331 @@
---
id: cs-protobuf-wire-encoding
title: ProtoBuf Wire — Varint / Field Tag / 작은 size
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [cs, encoding, protobuf, vibe-coding]
tech_stack: { language: "Concept", applicable_to: ["Backend"] }
applied_in: []
aliases: [Protocol Buffers, varint, wire format, field tag, JSON vs ProtoBuf, zigzag]
---
# ProtoBuf Wire Encoding
> JSON 대비 30-70% 작음 + 빠름. **Varint, field tag, 가변 길이**. gRPC / Kafka / 마이크로서비스 표준 binary format.
## 📖 핵심 개념
- Varint: 작은 숫자 = 1 byte, 큰 숫자 = 더.
- Field tag: 이름 X, 숫자 ID.
- Length-delimited: string / bytes / sub-message.
- ZigZag: signed 안 효율.
## 💻 코드 패턴
### Schema
```proto
syntax = "proto3";
message Order {
string id = 1;
string user_id = 2;
double amount = 3;
google.protobuf.Timestamp created_at = 4;
repeated Item items = 5;
message Item {
string product_id = 1;
int32 qty = 2;
double price = 3;
}
}
```
### Wire format
```
Field = (tag << 3 | wire_type) varint + value
Wire types:
0: Varint (int32, int64, bool, enum)
1: 64-bit (double, fixed64)
2: Length-delimited (string, bytes, message, packed)
5: 32-bit (float, fixed32)
Field 1 (string) = (1 << 3 | 2) = 10 (1 byte tag) + length + bytes
Field 2 (string) = (2 << 3 | 2) = 18
```
### Varint 인코딩
```
숫자 → 7-bit 단위, MSB = continuation.
0 = 0x00 (1 byte)
1 = 0x01 (1 byte)
127 = 0x7F (1 byte)
128 = 0x80 0x01 (2 bytes)
300 = 0xAC 0x02 (2 bytes)
→ 작은 숫자 자주 = 큰 절약.
```
### ZigZag (signed)
```
일반 varint: -1 = 18446744073709551615 (10 bytes)
ZigZag: -1 = 1 (2 bytes), 1 = 2, -2 = 3, 2 = 4
n = (n << 1) ^ (n >> 31) # 32-bit
```
→ sint32 / sint64 사용 시 효율.
### Field number 영원
```proto
message User {
// ❌ 변경 X — wire 호환 깨짐
string email = 1;
// ✅ 새 추가
string name = 2; // OK (옛 reader 가 무시)
reserved 3, 4 to 6; // 옛 field number 차단
}
```
→ Schema evolution 의 핵심.
### Repeated (packed)
```proto
repeated int32 numbers = 1 [packed = true]; // proto3 default
```
```
Wire: tag (length-delimited) + length + varint varint varint...
→ 매 element tag 안 반복 — 작음.
```
### Optional (proto3, 3.15+)
```proto
optional int32 age = 5; // null vs 0 구별 가능
```
→ 옛 proto3 = 0 vs unset 같음. Optional 가 필요시.
### Code generation
```bash
# Protoc
protoc --go_out=. --go-grpc_out=. order.proto
protoc --plugin=protoc-gen-ts_proto=./node_modules/.bin/protoc-gen-ts_proto \
--ts_proto_out=. order.proto
# Buf (modern)
buf generate
```
```ts
// Generated TS
import { Order } from './order';
const order = Order.create({
id: '...',
userId: '...',
amount: 99.5,
});
const buf = Order.encode(order).finish(); // 작은 binary
const decoded = Order.decode(buf);
```
### Size 비교 (예)
```
JSON:
{"id":"abc-123","user_id":"u1","amount":99.5,"items":[{"product_id":"p1","qty":2,"price":49.75}]}
108 bytes
ProtoBuf:
~40 bytes (tag + value).
→ 60% 작음.
```
```
+ gzip:
JSON gzipped: ~60 bytes (반복 키 압축 됨)
ProtoBuf gzipped: ~35 bytes
→ ProtoBuf 가 여전히 작음 (already 효율).
```
### 속도
```
JSON parse: ~1 GB/s
ProtoBuf parse: ~5 GB/s
→ Hot path = 큰 차이.
```
### gRPC = HTTP/2 + ProtoBuf
```
[[Backend_gRPC_Patterns]]
Service 정의 + binary efficient + 다중 stream.
```
### Kafka + Schema Registry + ProtoBuf
```ts
// Producer
import { SchemaRegistry, SchemaType } from '@kafkajs/confluent-schema-registry';
const registry = new SchemaRegistry({ host: '...' });
const { id } = await registry.register({
type: SchemaType.PROTOBUF,
schema: protoSchema,
});
const message = await registry.encode(id, order);
await producer.send({ topic: 'orders', messages: [{ value: message }] });
```
→ [[Data_Eng_Schema_Registry]].
### Buf (modern protoc)
```yaml
# buf.yaml
version: v1
breaking:
use: [FILE]
lint:
use: [DEFAULT]
```
```bash
buf lint # style
buf breaking --against '.git#branch=main' # breaking detect
buf generate # codegen
buf push # registry
```
→ 좋은 schema management.
### Connect-RPC (modern, browser-friendly)
```proto
service UserService {
rpc GetUser(GetUserRequest) returns (User);
}
```
```ts
import { createPromiseClient } from '@connectrpc/connect';
import { createConnectTransport } from '@connectrpc/connect-web';
const client = createPromiseClient(UserService, createConnectTransport({
baseUrl: 'https://api.example.com',
}));
const user = await client.getUser({ id: 'u1' });
```
→ HTTP+JSON / HTTP+protobuf / gRPC 모두 호환.
### Protobuf vs JSON 결정
```
ProtoBuf:
+ 작은 size
+ 빠른 parse
+ Schema 강제
+ 다언어
- Binary = debug 어려움
- Schema 관리 필요
JSON:
+ Human-readable
+ 어디나 native
+ Quick prototype
- 큰 size
- Type 약함
→ 큰 throughput / 다언어 / 강 type = ProtoBuf
Public API / 작은 traffic / 단순 = JSON
```
### gzip + ProtoBuf
```
ProtoBuf 가 이미 효율 — gzip 으로 추가 절감 작음.
But network 비싸 = enable.
```
### Reflection / debug
```bash
# grpcurl — protobuf service inspect
grpcurl -plaintext localhost:50051 list
grpcurl -plaintext localhost:50051 describe user.v1.UserService.GetUser
grpcurl -plaintext -d '{"id":"u1"}' localhost:50051 user.v1.UserService.GetUser
```
→ JSON-style debug (server reflection enabled 시).
### FlatBuffers / Cap'n Proto (대안)
```
FlatBuffers: zero-copy parse — 더 빠름. Game / mobile.
Cap'n Proto: similar — RPC focus.
→ ProtoBuf 가 default. 특수 case 만.
```
### Avro 와 차이
```
Avro: schema-on-read (schema 가 message 안 또는 registry).
Protobuf: tag-based (field number 만 안).
Avro = analytic / Hadoop 친화.
Protobuf = service / RPC 친화.
```
### 호환성
```
Add field: OK (옛 reader 가 무시).
Remove field: ✅ but reserve number.
Change type: ❌ 보통.
Rename field: OK (number 같으면).
Required → optional (proto2): OK.
```
### Wire format 직접 (low-level debug)
```bash
# protobuf binary 분석
protoc --decode_raw < message.bin
# Field tag + 값 출력
```
## 🤔 의사결정 기준
| 사용 | 추천 |
|---|---|
| 마이크로서비스 RPC | gRPC + ProtoBuf |
| Browser → server | ConnectRPC / tRPC / REST |
| Kafka heavy | Avro / ProtoBuf + registry |
| Public API | REST + JSON |
| Mobile binary | ProtoBuf 또는 FlatBuffers |
| Internal high-throughput | ProtoBuf |
| Debug-friendly | JSON |
## ❌ 안티패턴
- **Field number 재사용**: prod 깨짐.
- **Schema 한 곳 다른 곳 다름**: drift.
- **Required 필드 prod (proto2)**: 호환 깨짐 — proto3 사용.
- **Binary log without --decode_raw**: 디버깅 어려움.
- **JSON in binary protocol**: defeats purpose.
- **Reflection prod**: schema leak.
- **VS JSON 측정 안 함**: ProtoBuf 채택 가정.
## 🤖 LLM 활용 힌트
- ProtoBuf + Buf + gRPC / ConnectRPC 가 modern.
- Field number 영원 — 변경 X.
- Optional 명시 (proto3.15+).
- JSON 만 "OK" 아닌 case = ProtoBuf 시도.
## 🔗 관련 문서
- [[Backend_gRPC_Patterns]]
- [[Data_Eng_Schema_Registry]]
- [[CS_Compression_Algorithms]]