Files
2nd/10_Wiki/Topics/Coding/CS_ProtoBuf_Wire_Encoding.md
T
2026-05-09 21:08:02 +09:00

7.3 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
cs-protobuf-wire-encoding ProtoBuf Wire — Varint / Field Tag / 작은 size Coding draft B conceptual 2026-05-09 2026-05-09
cs
encoding
protobuf
vibe-coding
language applicable_to
Concept
Backend
Protocol Buffers
varint
wire format
field tag
JSON vs ProtoBuf
zigzag

ProtoBuf Wire Encoding

JSON 대비 30-70% 작음 + 빠름. Varint, field tag, 가변 길이. gRPC / Kafka / 마이크로서비스 표준 binary format.

📖 핵심 개념

  • Varint: 작은 숫자 = 1 byte, 큰 숫자 = 더.
  • Field tag: 이름 X, 숫자 ID.
  • Length-delimited: string / bytes / sub-message.
  • ZigZag: signed 안 효율.

💻 코드 패턴

Schema

syntax = "proto3";

message Order {
  string id = 1;
  string user_id = 2;
  double amount = 3;
  google.protobuf.Timestamp created_at = 4;
  repeated Item items = 5;
  
  message Item {
    string product_id = 1;
    int32 qty = 2;
    double price = 3;
  }
}

Wire format

Field = (tag << 3 | wire_type) varint + value

Wire types:
  0: Varint (int32, int64, bool, enum)
  1: 64-bit (double, fixed64)
  2: Length-delimited (string, bytes, message, packed)
  5: 32-bit (float, fixed32)

Field 1 (string) = (1 << 3 | 2) = 10 (1 byte tag) + length + bytes
Field 2 (string) = (2 << 3 | 2) = 18

Varint 인코딩

숫자 → 7-bit 단위, MSB = continuation.

0 = 0x00 (1 byte)
1 = 0x01 (1 byte)
127 = 0x7F (1 byte)
128 = 0x80 0x01 (2 bytes)
300 = 0xAC 0x02 (2 bytes)

→ 작은 숫자 자주 = 큰 절약.

ZigZag (signed)

일반 varint: -1 = 18446744073709551615 (10 bytes)
ZigZag: -1 = 1 (2 bytes), 1 = 2, -2 = 3, 2 = 4

n = (n << 1) ^ (n >> 31)   # 32-bit

→ sint32 / sint64 사용 시 효율.

Field number 영원

message User {
  // ❌ 변경 X — wire 호환 깨짐
  string email = 1;
  
  // ✅ 새 추가
  string name = 2;       // OK (옛 reader 가 무시)
  
  reserved 3, 4 to 6;    // 옛 field number 차단
}

→ Schema evolution 의 핵심.

Repeated (packed)

repeated int32 numbers = 1 [packed = true];  // proto3 default
Wire: tag (length-delimited) + length + varint varint varint...
→ 매 element tag 안 반복 — 작음.

Optional (proto3, 3.15+)

optional int32 age = 5;  // null vs 0 구별 가능

→ 옛 proto3 = 0 vs unset 같음. Optional 가 필요시.

Code generation

# Protoc
protoc --go_out=. --go-grpc_out=. order.proto
protoc --plugin=protoc-gen-ts_proto=./node_modules/.bin/protoc-gen-ts_proto \
  --ts_proto_out=. order.proto

# Buf (modern)
buf generate
// Generated TS
import { Order } from './order';

const order = Order.create({
  id: '...',
  userId: '...',
  amount: 99.5,
});

const buf = Order.encode(order).finish();    // 작은 binary
const decoded = Order.decode(buf);

Size 비교 (예)

JSON:
{"id":"abc-123","user_id":"u1","amount":99.5,"items":[{"product_id":"p1","qty":2,"price":49.75}]}
108 bytes

ProtoBuf:
~40 bytes (tag + value).

→ 60% 작음.
+ gzip:
JSON gzipped: ~60 bytes (반복 키 압축 됨)
ProtoBuf gzipped: ~35 bytes

→ ProtoBuf 가 여전히 작음 (already 효율).

속도

JSON parse: ~1 GB/s
ProtoBuf parse: ~5 GB/s

→ Hot path = 큰 차이.

gRPC = HTTP/2 + ProtoBuf

[[Backend_gRPC_Patterns]]

Service 정의 + binary efficient + 다중 stream.

Kafka + Schema Registry + ProtoBuf

// Producer
import { SchemaRegistry, SchemaType } from '@kafkajs/confluent-schema-registry';

const registry = new SchemaRegistry({ host: '...' });
const { id } = await registry.register({
  type: SchemaType.PROTOBUF,
  schema: protoSchema,
});

const message = await registry.encode(id, order);
await producer.send({ topic: 'orders', messages: [{ value: message }] });

Data_Eng_Schema_Registry.

Buf (modern protoc)

# buf.yaml
version: v1
breaking:
  use: [FILE]
lint:
  use: [DEFAULT]
buf lint                                           # style
buf breaking --against '.git#branch=main'           # breaking detect
buf generate                                        # codegen
buf push                                            # registry

→ 좋은 schema management.

Connect-RPC (modern, browser-friendly)

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
}
import { createPromiseClient } from '@connectrpc/connect';
import { createConnectTransport } from '@connectrpc/connect-web';

const client = createPromiseClient(UserService, createConnectTransport({
  baseUrl: 'https://api.example.com',
}));

const user = await client.getUser({ id: 'u1' });

→ HTTP+JSON / HTTP+protobuf / gRPC 모두 호환.

Protobuf vs JSON 결정

ProtoBuf:
+ 작은 size
+ 빠른 parse
+ Schema 강제
+ 다언어
- Binary = debug 어려움
- Schema 관리 필요

JSON:
+ Human-readable
+ 어디나 native
+ Quick prototype
- 큰 size
- Type 약함

→ 큰 throughput / 다언어 / 강 type = ProtoBuf
   Public API / 작은 traffic / 단순 = JSON

gzip + ProtoBuf

ProtoBuf 가 이미 효율 — gzip 으로 추가 절감 작음.
But network 비싸 = enable.

Reflection / debug

# grpcurl — protobuf service inspect
grpcurl -plaintext localhost:50051 list
grpcurl -plaintext localhost:50051 describe user.v1.UserService.GetUser
grpcurl -plaintext -d '{"id":"u1"}' localhost:50051 user.v1.UserService.GetUser

→ JSON-style debug (server reflection enabled 시).

FlatBuffers / Cap'n Proto (대안)

FlatBuffers: zero-copy parse — 더 빠름. Game / mobile.
Cap'n Proto: similar — RPC focus.

→ ProtoBuf 가 default. 특수 case 만.

Avro 와 차이

Avro:    schema-on-read (schema 가 message 안 또는 registry).
Protobuf: tag-based (field number 만 안).

Avro = analytic / Hadoop 친화.
Protobuf = service / RPC 친화.

호환성

Add field:    OK (옛 reader 가 무시).
Remove field: ✅ but reserve number.
Change type: ❌ 보통.
Rename field: OK (number 같으면).
Required → optional (proto2): OK.

Wire format 직접 (low-level debug)

# protobuf binary 분석
protoc --decode_raw < message.bin
# Field tag + 값 출력

🤔 의사결정 기준

사용 추천
마이크로서비스 RPC gRPC + ProtoBuf
Browser → server ConnectRPC / tRPC / REST
Kafka heavy Avro / ProtoBuf + registry
Public API REST + JSON
Mobile binary ProtoBuf 또는 FlatBuffers
Internal high-throughput ProtoBuf
Debug-friendly JSON

안티패턴

  • Field number 재사용: prod 깨짐.
  • Schema 한 곳 다른 곳 다름: drift.
  • Required 필드 prod (proto2): 호환 깨짐 — proto3 사용.
  • Binary log without --decode_raw: 디버깅 어려움.
  • JSON in binary protocol: defeats purpose.
  • Reflection prod: schema leak.
  • VS JSON 측정 안 함: ProtoBuf 채택 가정.

🤖 LLM 활용 힌트

  • ProtoBuf + Buf + gRPC / ConnectRPC 가 modern.
  • Field number 영원 — 변경 X.
  • Optional 명시 (proto3.15+).
  • JSON 만 "OK" 아닌 case = ProtoBuf 시도.

🔗 관련 문서