2nd/10_Wiki/Topics/DevOps_and_Security/Draw Call.md

---
id: wiki-2026-0508-draw-call
title: Draw Call
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Drawcall, GPU Submit, Render Command]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [graphics, gpu, performance, rendering]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: C++/Rust
  framework: Vulkan/Metal/D3D12/WebGPU
---

# Draw Call

## 매 한 줄
> **"매 CPU 가 GPU 에게 매 한 batch 를 그리라고 매 instructing 하는 single command"**. 1990s OpenGL `glDrawArrays` 시대의 매 ms-cost overhead 가 매 modern explicit API (Vulkan/D3D12/Metal/WebGPU) + bindless + GPU-driven rendering 으로 매 micro-second 수준으로 떨어짐. 매 2026 — `vkCmdDrawIndexedIndirectCount` + mesh shader 가 매 norm.

## 매 핵심

### 매 anatomy
- Set pipeline (shader, blend, depth state).
- Bind resources (vertex/index buffer, uniform, texture).
- Issue draw (`drawIndexed`, `dispatch`).
- Submit to queue.

### 매 cost source
- **Driver validation**: legacy GL 의 매 main bottleneck.
- **State change**: pipeline / RT / descriptor switch.
- **CPU↔GPU sync**: fence wait, map/unmap.
- **Command recording**: 매 modern API 에서 매 thread 분산 가능.

### 매 응용
1. Draw call 수 줄임 → frame time 직접 감소.
2. Batching (instancing, atlas, indirect).
3. GPU-driven culling (compute → indirect).

## 💻 패턴

### Vulkan minimal draw
```cpp
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
VkBuffer vbs[] = {vertexBuf}; VkDeviceSize off[] = {0};
vkCmdBindVertexBuffers(cmd, 0, 1, vbs, off);
vkCmdBindIndexBuffer(cmd, indexBuf, 0, VK_INDEX_TYPE_UINT32);
vkCmdBindDescriptorSets(cmd, ..., 0, 1, &set, 0, nullptr);
vkCmdDrawIndexed(cmd, indexCount, instanceCount, 0, 0, 0);
```

### Instancing (1 call → N objects)
```glsl
// vertex shader
layout(location = 0) in vec3 pos;
layout(location = 4) in mat4 modelMatrix;  // per-instance
void main() { gl_Position = vp * modelMatrix * vec4(pos, 1); }
```
```cpp
// CPU side
vkCmdDrawIndexed(cmd, idxCount, 10000, 0, 0, 0);  // 10k objects, 1 draw
```

### Indirect draw (GPU-driven)
```cpp
struct VkDrawIndexedIndirectCommand {
    uint32_t indexCount, instanceCount, firstIndex;
    int32_t  vertexOffset; uint32_t firstInstance;
};
// Compute shader culls & writes commands + count to GPU buffer.
// CPU just calls:
vkCmdDrawIndexedIndirectCount(cmd, drawBuf, 0, countBuf, 0, MAX_DRAWS, sizeof(Cmd));
```

### Bindless (descriptor indexing)
```glsl
#extension GL_EXT_nonuniform_qualifier : require
layout(set=0, binding=0) uniform sampler2D textures[];
layout(push_constant) uniform PC { uint texIndex; };
void main() { color = texture(textures[nonuniformEXT(texIndex)], uv); }
```

### Mesh shader (DX12 / Vulkan)
```glsl
#version 460
#extension GL_EXT_mesh_shader : require
layout(local_size_x = 32) in;
layout(triangles, max_vertices = 64, max_primitives = 124) out;
void main() {
    SetMeshOutputsEXT(vertCount, primCount);
    // amplify / cull per meshlet, no IA stage
}
```

### Multi-thread command recording (Vulkan)
```cpp
// 1 secondary CB per thread
parallel_for(0, N, [&](int i) {
    VkCommandBuffer sec = secondaryCBs[threadId];
    vkBeginCommandBuffer(sec, ...);
    record_draws_for_chunk(sec, chunk[i]);
    vkEndCommandBuffer(sec);
});
vkCmdExecuteCommands(primaryCB, N, secondaryCBs.data());
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| 同 mesh 수천 개 | Instancing |
| Diverse mesh, frustum cullable | GPU-driven indirect + compute culling |
| Many materials | Bindless texture + uber-shader |
| Highly detailed geometry | Mesh shader + meshlet |
| Legacy GL/GLES | Atlas + state sort + minimize binds |

**기본값**: Modern → indirect + bindless. Legacy → batch by state.

## 🔗 Graph
- 부모: [[GPU Pipeline]] · [[Real-time Rendering]]
- 변형: [[Indirect Draw]]
- 응용: [[Frustum Culling]] · [[Geometry Merging]]
- Adjacent: [[Vulkan]] · [[Metal]] · [[WebGPU]]

## 🤖 LLM 활용
**언제**: Renderer architecture, perf budget 분석, profiling 결과 해석.
**언제 X**: Game design / art direction.

## ❌ 안티패턴
- **One draw per object**: legacy 패턴 — instancing/indirect 사용.
- **Excessive state changes**: shader/pipeline 매 frame 수천 번 swap.
- **CPU-side culling 만**: GPU 보내서 매 compute 로 culling.
- **Map/unmap loop**: persistent mapped buffer + ring 사용.
- **Single thread record**: secondary CB + parallel_for.

## 🧪 검증 / 중복
- Verified (Vulkan/D3D12 spec, Khronos best practices, GPU Zen).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — draw call cost + indirect/bindless/mesh shader |