--- id: wiki-2026-0508-draw-call title: Draw Call category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Drawcall, GPU Submit, Render Command] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [graphics, gpu, performance, rendering] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: C++/Rust framework: Vulkan/Metal/D3D12/WebGPU --- # Draw Call ## 매 한 줄 > **"매 CPU 가 GPU 에게 매 한 batch 를 그리라고 매 instructing 하는 single command"**. 1990s OpenGL `glDrawArrays` 시대의 매 ms-cost overhead 가 매 modern explicit API (Vulkan/D3D12/Metal/WebGPU) + bindless + GPU-driven rendering 으로 매 micro-second 수준으로 떨어짐. 매 2026 — `vkCmdDrawIndexedIndirectCount` + mesh shader 가 매 norm. ## 매 핵심 ### 매 anatomy - Set pipeline (shader, blend, depth state). - Bind resources (vertex/index buffer, uniform, texture). - Issue draw (`drawIndexed`, `dispatch`). - Submit to queue. ### 매 cost source - **Driver validation**: legacy GL 의 매 main bottleneck. - **State change**: pipeline / RT / descriptor switch. - **CPU↔GPU sync**: fence wait, map/unmap. - **Command recording**: 매 modern API 에서 매 thread 분산 가능. ### 매 응용 1. Draw call 수 줄임 → frame time 직접 감소. 2. Batching (instancing, atlas, indirect). 3. GPU-driven culling (compute → indirect). ## 💻 패턴 ### Vulkan minimal draw ```cpp vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline); VkBuffer vbs[] = {vertexBuf}; VkDeviceSize off[] = {0}; vkCmdBindVertexBuffers(cmd, 0, 1, vbs, off); vkCmdBindIndexBuffer(cmd, indexBuf, 0, VK_INDEX_TYPE_UINT32); vkCmdBindDescriptorSets(cmd, ..., 0, 1, &set, 0, nullptr); vkCmdDrawIndexed(cmd, indexCount, instanceCount, 0, 0, 0); ``` ### Instancing (1 call → N objects) ```glsl // vertex shader layout(location = 0) in vec3 pos; layout(location = 4) in mat4 modelMatrix; // per-instance void main() { gl_Position = vp * modelMatrix * vec4(pos, 1); } ``` ```cpp // CPU side vkCmdDrawIndexed(cmd, idxCount, 10000, 0, 0, 0); // 10k objects, 1 draw ``` ### Indirect draw (GPU-driven) ```cpp struct VkDrawIndexedIndirectCommand { uint32_t indexCount, instanceCount, firstIndex; int32_t vertexOffset; uint32_t firstInstance; }; // Compute shader culls & writes commands + count to GPU buffer. // CPU just calls: vkCmdDrawIndexedIndirectCount(cmd, drawBuf, 0, countBuf, 0, MAX_DRAWS, sizeof(Cmd)); ``` ### Bindless (descriptor indexing) ```glsl #extension GL_EXT_nonuniform_qualifier : require layout(set=0, binding=0) uniform sampler2D textures[]; layout(push_constant) uniform PC { uint texIndex; }; void main() { color = texture(textures[nonuniformEXT(texIndex)], uv); } ``` ### Mesh shader (DX12 / Vulkan) ```glsl #version 460 #extension GL_EXT_mesh_shader : require layout(local_size_x = 32) in; layout(triangles, max_vertices = 64, max_primitives = 124) out; void main() { SetMeshOutputsEXT(vertCount, primCount); // amplify / cull per meshlet, no IA stage } ``` ### Multi-thread command recording (Vulkan) ```cpp // 1 secondary CB per thread parallel_for(0, N, [&](int i) { VkCommandBuffer sec = secondaryCBs[threadId]; vkBeginCommandBuffer(sec, ...); record_draws_for_chunk(sec, chunk[i]); vkEndCommandBuffer(sec); }); vkCmdExecuteCommands(primaryCB, N, secondaryCBs.data()); ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 同 mesh 수천 개 | Instancing | | Diverse mesh, frustum cullable | GPU-driven indirect + compute culling | | Many materials | Bindless texture + uber-shader | | Highly detailed geometry | Mesh shader + meshlet | | Legacy GL/GLES | Atlas + state sort + minimize binds | **기본값**: Modern → indirect + bindless. Legacy → batch by state. ## 🔗 Graph - 부모: [[GPU Pipeline]] · [[Real-time Rendering]] - 변형: [[Indirect Draw]] - 응용: [[Frustum Culling]] · [[Geometry Merging]] - Adjacent: [[Vulkan]] · [[Metal]] · [[WebGPU]] ## 🤖 LLM 활용 **언제**: Renderer architecture, perf budget 분석, profiling 결과 해석. **언제 X**: Game design / art direction. ## ❌ 안티패턴 - **One draw per object**: legacy 패턴 — instancing/indirect 사용. - **Excessive state changes**: shader/pipeline 매 frame 수천 번 swap. - **CPU-side culling 만**: GPU 보내서 매 compute 로 culling. - **Map/unmap loop**: persistent mapped buffer + ring 사용. - **Single thread record**: secondary CB + parallel_for. ## 🧪 검증 / 중복 - Verified (Vulkan/D3D12 spec, Khronos best practices, GPU Zen). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — draw call cost + indirect/bindless/mesh shader |