--- id: cs-protobuf-wire-encoding title: ProtoBuf Wire — Varint / Field Tag / 작은 size category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [cs, encoding, protobuf, vibe-coding] tech_stack: { language: "Concept", applicable_to: ["Backend"] } applied_in: [] aliases: [Protocol Buffers, varint, wire format, field tag, JSON vs ProtoBuf, zigzag] --- # ProtoBuf Wire Encoding > JSON 대비 30-70% 작음 + 빠름. **Varint, field tag, 가변 길이**. gRPC / Kafka / 마이크로서비스 표준 binary format. ## 📖 핵심 개념 - Varint: 작은 숫자 = 1 byte, 큰 숫자 = 더. - Field tag: 이름 X, 숫자 ID. - Length-delimited: string / bytes / sub-message. - ZigZag: signed 안 효율. ## 💻 코드 패턴 ### Schema ```proto syntax = "proto3"; message Order { string id = 1; string user_id = 2; double amount = 3; google.protobuf.Timestamp created_at = 4; repeated Item items = 5; message Item { string product_id = 1; int32 qty = 2; double price = 3; } } ``` ### Wire format ``` Field = (tag << 3 | wire_type) varint + value Wire types: 0: Varint (int32, int64, bool, enum) 1: 64-bit (double, fixed64) 2: Length-delimited (string, bytes, message, packed) 5: 32-bit (float, fixed32) Field 1 (string) = (1 << 3 | 2) = 10 (1 byte tag) + length + bytes Field 2 (string) = (2 << 3 | 2) = 18 ``` ### Varint 인코딩 ``` 숫자 → 7-bit 단위, MSB = continuation. 0 = 0x00 (1 byte) 1 = 0x01 (1 byte) 127 = 0x7F (1 byte) 128 = 0x80 0x01 (2 bytes) 300 = 0xAC 0x02 (2 bytes) → 작은 숫자 자주 = 큰 절약. ``` ### ZigZag (signed) ``` 일반 varint: -1 = 18446744073709551615 (10 bytes) ZigZag: -1 = 1 (2 bytes), 1 = 2, -2 = 3, 2 = 4 n = (n << 1) ^ (n >> 31) # 32-bit ``` → sint32 / sint64 사용 시 효율. ### Field number 영원 ```proto message User { // ❌ 변경 X — wire 호환 깨짐 string email = 1; // ✅ 새 추가 string name = 2; // OK (옛 reader 가 무시) reserved 3, 4 to 6; // 옛 field number 차단 } ``` → Schema evolution 의 핵심. ### Repeated (packed) ```proto repeated int32 numbers = 1 [packed = true]; // proto3 default ``` ``` Wire: tag (length-delimited) + length + varint varint varint... → 매 element tag 안 반복 — 작음. ``` ### Optional (proto3, 3.15+) ```proto optional int32 age = 5; // null vs 0 구별 가능 ``` → 옛 proto3 = 0 vs unset 같음. Optional 가 필요시. ### Code generation ```bash # Protoc protoc --go_out=. --go-grpc_out=. order.proto protoc --plugin=protoc-gen-ts_proto=./node_modules/.bin/protoc-gen-ts_proto \ --ts_proto_out=. order.proto # Buf (modern) buf generate ``` ```ts // Generated TS import { Order } from './order'; const order = Order.create({ id: '...', userId: '...', amount: 99.5, }); const buf = Order.encode(order).finish(); // 작은 binary const decoded = Order.decode(buf); ``` ### Size 비교 (예) ``` JSON: {"id":"abc-123","user_id":"u1","amount":99.5,"items":[{"product_id":"p1","qty":2,"price":49.75}]} 108 bytes ProtoBuf: ~40 bytes (tag + value). → 60% 작음. ``` ``` + gzip: JSON gzipped: ~60 bytes (반복 키 압축 됨) ProtoBuf gzipped: ~35 bytes → ProtoBuf 가 여전히 작음 (already 효율). ``` ### 속도 ``` JSON parse: ~1 GB/s ProtoBuf parse: ~5 GB/s → Hot path = 큰 차이. ``` ### gRPC = HTTP/2 + ProtoBuf ``` [[Backend_gRPC_Patterns]] Service 정의 + binary efficient + 다중 stream. ``` ### Kafka + Schema Registry + ProtoBuf ```ts // Producer import { SchemaRegistry, SchemaType } from '@kafkajs/confluent-schema-registry'; const registry = new SchemaRegistry({ host: '...' }); const { id } = await registry.register({ type: SchemaType.PROTOBUF, schema: protoSchema, }); const message = await registry.encode(id, order); await producer.send({ topic: 'orders', messages: [{ value: message }] }); ``` → [[Data_Eng_Schema_Registry]]. ### Buf (modern protoc) ```yaml # buf.yaml version: v1 breaking: use: [FILE] lint: use: [DEFAULT] ``` ```bash buf lint # style buf breaking --against '.git#branch=main' # breaking detect buf generate # codegen buf push # registry ``` → 좋은 schema management. ### Connect-RPC (modern, browser-friendly) ```proto service UserService { rpc GetUser(GetUserRequest) returns (User); } ``` ```ts import { createPromiseClient } from '@connectrpc/connect'; import { createConnectTransport } from '@connectrpc/connect-web'; const client = createPromiseClient(UserService, createConnectTransport({ baseUrl: 'https://api.example.com', })); const user = await client.getUser({ id: 'u1' }); ``` → HTTP+JSON / HTTP+protobuf / gRPC 모두 호환. ### Protobuf vs JSON 결정 ``` ProtoBuf: + 작은 size + 빠른 parse + Schema 강제 + 다언어 - Binary = debug 어려움 - Schema 관리 필요 JSON: + Human-readable + 어디나 native + Quick prototype - 큰 size - Type 약함 → 큰 throughput / 다언어 / 강 type = ProtoBuf Public API / 작은 traffic / 단순 = JSON ``` ### gzip + ProtoBuf ``` ProtoBuf 가 이미 효율 — gzip 으로 추가 절감 작음. But network 비싸 = enable. ``` ### Reflection / debug ```bash # grpcurl — protobuf service inspect grpcurl -plaintext localhost:50051 list grpcurl -plaintext localhost:50051 describe user.v1.UserService.GetUser grpcurl -plaintext -d '{"id":"u1"}' localhost:50051 user.v1.UserService.GetUser ``` → JSON-style debug (server reflection enabled 시). ### FlatBuffers / Cap'n Proto (대안) ``` FlatBuffers: zero-copy parse — 더 빠름. Game / mobile. Cap'n Proto: similar — RPC focus. → ProtoBuf 가 default. 특수 case 만. ``` ### Avro 와 차이 ``` Avro: schema-on-read (schema 가 message 안 또는 registry). Protobuf: tag-based (field number 만 안). Avro = analytic / Hadoop 친화. Protobuf = service / RPC 친화. ``` ### 호환성 ``` Add field: OK (옛 reader 가 무시). Remove field: ✅ but reserve number. Change type: ❌ 보통. Rename field: OK (number 같으면). Required → optional (proto2): OK. ``` ### Wire format 직접 (low-level debug) ```bash # protobuf binary 분석 protoc --decode_raw < message.bin # Field tag + 값 출력 ``` ## 🤔 의사결정 기준 | 사용 | 추천 | |---|---| | 마이크로서비스 RPC | gRPC + ProtoBuf | | Browser → server | ConnectRPC / tRPC / REST | | Kafka heavy | Avro / ProtoBuf + registry | | Public API | REST + JSON | | Mobile binary | ProtoBuf 또는 FlatBuffers | | Internal high-throughput | ProtoBuf | | Debug-friendly | JSON | ## ❌ 안티패턴 - **Field number 재사용**: prod 깨짐. - **Schema 한 곳 다른 곳 다름**: drift. - **Required 필드 prod (proto2)**: 호환 깨짐 — proto3 사용. - **Binary log without --decode_raw**: 디버깅 어려움. - **JSON in binary protocol**: defeats purpose. - **Reflection prod**: schema leak. - **VS JSON 측정 안 함**: ProtoBuf 채택 가정. ## 🤖 LLM 활용 힌트 - ProtoBuf + Buf + gRPC / ConnectRPC 가 modern. - Field number 영원 — 변경 X. - Optional 명시 (proto3.15+). - JSON 만 "OK" 아닌 case = ProtoBuf 시도. ## 🔗 관련 문서 - [[Backend_gRPC_Patterns]] - [[Data_Eng_Schema_Registry]] - [[CS_Compression_Algorithms]]