--- id: wiki-2026-0508-indirect-draw title: Indirect Draw category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Indirect Drawing, GPU-Driven Rendering, drawIndirect] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [graphics, gpu, webgpu, vulkan, rendering, performance] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: wgsl framework: webgpu --- # Indirect Draw ## 매 한 줄 > **"매 indirect draw 는 draw call args 의 GPU buffer 의 read — CPU roundtrip 없이 GPU 의 self-dispatch"**. 2026 의 GPU-driven rendering pipeline 의 foundation: Vulkan/D3D12/Metal/WebGPU 의 support. 매 culling, LOD, instancing 의 GPU 에서 결정 → CPU draw-call overhead 의 elimination. ## 매 핵심 ### 매 vs Direct Draw - **Direct**: `draw(vertexCount, instanceCount, firstVertex, firstInstance)` — args from CPU. - **Indirect**: `drawIndirect(buffer, offset)` — args read from GPU buffer. - **Multi-draw indirect (MDI)**: thousands of draws from one CPU command. ### 매 Args Layout (WebGPU) ``` struct DrawIndirectArgs { vertexCount: u32, instanceCount: u32, firstVertex: u32, firstInstance: u32, } struct DrawIndexedIndirectArgs { indexCount: u32, instanceCount: u32, firstIndex: u32, baseVertex: i32, firstInstance: u32, } ``` ### 매 Pipeline (GPU-driven) 1. Compute shader: per-object frustum/occlusion cull → write visible list. 2. Compute shader: write indirect args buffer (instanceCount=0 for culled). 3. `drawIndexedIndirect` (or MDI) reads buffer → renders only visible. ### 매 응용 1. Massive instanced scenes (foliage, crowds, particles). 2. GPU-driven culling (frustum, occlusion via Hi-Z). 3. LOD selection on GPU. 4. Variable-rate / batched rendering (cluster culling, Nanite-style). ## 💻 패턴 ### WebGPU Indirect Draw Setup ```ts // Args buffer (visible after compute) const indirectBuffer = device.createBuffer({ size: 16, // 4 u32 usage: GPUBufferUsage.INDIRECT | GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST, }); // Initialize: 36 verts, 1000 instances, offset 0/0 device.queue.writeBuffer(indirectBuffer, 0, new Uint32Array([36, 1000, 0, 0])); // In render pass pass.setPipeline(pipeline); pass.setVertexBuffer(0, vertices); pass.drawIndirect(indirectBuffer, 0); ``` ### Culling Compute Shader (WGSL) ```wgsl struct DrawArgs { vertexCount: u32, instanceCount: u32, firstVertex: u32, firstInstance: u32 } @group(0) @binding(0) var objects: array; @group(0) @binding(1) var drawArgs: DrawArgs; @group(0) @binding(2) var visibleInstances: array; @group(0) @binding(3) var camera: Camera; @compute @workgroup_size(64) fn cullCS(@builtin(global_invocation_id) gid: vec3) { let i = gid.x; if (i >= arrayLength(&objects)) { return; } let obj = objects[i]; if (frustumTest(obj.bounds, camera.frustum)) { let slot = atomicAdd(&drawArgs.instanceCount, 1u); visibleInstances[slot] = i; } } ``` ### Reset Pass (clear instanceCount) ```ts // Each frame, before culling, zero out instanceCount device.queue.writeBuffer(indirectBuffer, 4, new Uint32Array([0])); ``` ### Multi-Draw Indirect (Vulkan) ```cpp // Draw N different meshes from one buffer vkCmdDrawIndexedIndirect(cmd, indirectBuf, 0, /*drawCount*/ N, /*stride*/ sizeof(VkDrawIndexedIndirectCommand)); // Or with count buffer (drawCount is itself on GPU) vkCmdDrawIndexedIndirectCount(cmd, indirectBuf, 0, countBuf, 0, /*maxDraws*/ N, sizeof(VkDrawIndexedIndirectCommand)); ``` ### Three.js (R175+ has WebGPU) ```js import { WebGPURenderer, BatchedMesh } from 'three'; const renderer = new WebGPURenderer(); // BatchedMesh internally uses indirect draw + instancing const batched = new BatchedMesh(maxInstances, maxVerts, maxIndices); batched.addGeometry(geom1); batched.addGeometry(geom2); // One draw call, GPU handles per-instance state ``` ### Hi-Z Occlusion Culling (sketch) ```wgsl // Sample Hi-Z mip — fastest mip where bounding sphere covers >1 texel fn occluded(bsphere: vec4) -> bool { let screenRect = projectToScreen(bsphere); let mip = computeMip(screenRect); let depth = textureSampleLevel(hiZ, samp, screenRect.center, mip).r; return bsphereMinDepth(bsphere) > depth; } ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | <100 unique objects | Direct draw / instancing — overhead 의 not worth | | 1k-1M instances | Indirect draw + GPU cull | | Many distinct meshes | Multi-draw indirect (Vulkan/D3D12); WebGPU 의 batched | | Foliage/crowd | Indirect + GPU LOD selection | | Mobile / low-end | Direct draw (compute overhead 의 watch) | **기본값**: large dynamic scene 의 GPU-driven indirect pipeline. Small scene 의 direct draw. ## 🔗 Graph - 부모: [[Graphics Pipeline]] - 변형: [[GPU-Driven Rendering]] - 응용: [[Frustum Culling]] · [[Nanite]] - Adjacent: [[WebGPU]] · [[Vulkan]] · [[Compute Shader]] ## 🤖 LLM 활용 **언제**: GPU-driven pipeline 의 design, culling 의 implement, draw-call overhead 의 reduce. **언제 X**: simple scene 의 indirect draw 의 over-engineering — direct 의 fine. ## ❌ 안티패턴 - **CPU readback of indirect buffer**: 매 stall. GPU 의 self-contained 의 keep. - **Per-frame full buffer rewrite**: defeats purpose. 매 GPU compute 의 update. - **No Hi-Z for occlusion**: false positives — Hi-Z 또는 conservative AABB 의 사용. - **Indirect for tiny scenes**: compute dispatch overhead > savings. - **WebGL fallback assumed**: WebGL 의 no indirect draw — WebGPU required. ## 🧪 검증 / 중복 - Verified (WebGPU spec, Vulkan spec, GPU Gems / Activision Nanite paper). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — indirect draw / GPU-driven rendering full content |