Files
2nd/10_Wiki/Topics/AI_and_ML/Compute Shader.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

301 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-compute-shader
title: Compute Shader (WebGPU)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [compute shader, WebGPU compute, GPGPU, WGSL, GPU-driven rendering, indirect draw]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [webgpu, compute-shader, gpgpu, wgsl, gpu-driven-rendering, three-js, particle-system, simulation]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: WGSL / WebGPU
framework: Three.js / Babylon.js / wgpu-rs
---
# Compute Shader
## 매 한 줄
> **"매 GPU thousand core 의 parallel"**. 매 WebGPU 의 introduce → 매 web 의 GPGPU 의 가능. 매 particle, 매 fluid sim, 매 culling, 매 ML inference. 매 CPU 30ms (10K particle) → 매 GPU 2ms (100K particle) — 매 150× faster.
## 매 핵심
### 매 use case
1. **Particle system**: 매 millions.
2. **Fluid simulation**: 매 SPH, 매 grid-based.
3. **Cloth / soft-body**.
4. **Procedural terrain**.
5. **GPU-driven rendering**: 매 culling, 매 indirect draw.
6. **Compute skinning**: 매 GPU 의 vertex transform.
7. **Image processing**: 매 blur, 매 filter.
8. **GPGPU**: 매 ML inference, 매 numerical.
### 매 vs vertex / fragment shader
- **Vertex**: 매 per-vertex.
- **Fragment**: 매 per-pixel.
- **Compute**: 매 arbitrary computation, 매 storage R/W.
### 매 핵심 concept
#### Workgroup
- 매 thread group (e.g., 8×8×1 = 64 threads).
- 매 shared memory.
- 매 hardware-mapped (warp / wavefront).
#### Storage buffer / texture
- 매 read + write (vs sampled texture only read).
- 매 fluid sim 등 의 essential.
#### Workgroup variable (shared memory)
- 매 매 thread group 의 share.
- 매 10-100× faster than global.
- 매 reduction, prefix sum 의 base.
#### Indirect draw
- 매 GPU 의 draw command 의 generate.
- 매 CPU-GPU sync 의 minimize.
### 매 WGSL (WebGPU Shading Language)
- 매 syntax: 매 Rust-like.
- 매 type-strict.
- 매 vertex / fragment / compute 의 unified.
### 매 sync / async
- 매 GPU 의 async by default.
- 매 dependency 의 explicit barrier.
- 매 readback 의 expensive (avoid).
### 매 modern application
- **Three.js WebGPU renderer**: 매 v160+.
- **Babylon.js**.
- **wgpu-rs**: 매 native + web.
- **Hokusai** (Expo 2025 Osaka): 매 1M particle fluid.
- **Million-component BIM platform**.
## 💻 패턴
### Basic compute shader (WGSL)
```wgsl
// 매 add two arrays
@group(0) @binding(0) var<storage, read> input_a: array<f32>;
@group(0) @binding(1) var<storage, read> input_b: array<f32>;
@group(0) @binding(2) var<storage, read_write> output: array<f32>;
@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
let idx = id.x;
if (idx >= arrayLength(&input_a)) { return; }
output[idx] = input_a[idx] + input_b[idx];
}
```
### JavaScript dispatch (WebGPU)
```js
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
// 매 buffer
const inputA = device.createBuffer({
size: data.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(inputA, 0, data);
// 매 pipeline
const module = device.createShaderModule({ code: wgslSource });
const pipeline = device.createComputePipeline({
layout: 'auto',
compute: { module, entryPoint: 'main' },
});
const bindGroup = device.createBindGroup({
layout: pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: inputA } },
{ binding: 1, resource: { buffer: inputB } },
{ binding: 2, resource: { buffer: output } },
],
});
// 매 dispatch
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(Math.ceil(data.length / 64));
pass.end();
device.queue.submit([encoder.finish()]);
```
### Particle system (Three.js WebGPU)
```js
import { Fn, instanceIndex, storage, attribute } from 'three/webgpu';
const positionsAttribute = new Float32Array(N_PARTICLES * 3);
const positionsBuffer = renderer.computeAsync(
Fn(() => {
const i = instanceIndex;
const pos = storage(positionsAttribute, 'vec3', N_PARTICLES);
pos.element(i).addAssign(velocity.element(i).mul(dt));
pos.element(i).y.assign(pos.element(i).y.sub(gravity * dt));
// 매 boundary
If(pos.element(i).y.lessThan(0), () => {
pos.element(i).y.assign(0);
velocity.element(i).y.mulAssign(-0.8);
});
})().compute(N_PARTICLES)
);
```
### Fluid simulation (SPH-style)
```wgsl
// 매 매 particle 의 neighbor 의 search + 매 force compute
@group(0) @binding(0) var<storage, read_write> particles: array<Particle>;
@group(0) @binding(1) var<uniform> params: SimParams;
@compute @workgroup_size(64)
fn step(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
if (i >= arrayLength(&particles)) { return; }
var force = vec3<f32>(0.0, -9.8, 0.0);
// 매 neighbor sum (simplified — real SPH uses spatial grid)
for (var j = 0u; j < arrayLength(&particles); j++) {
if (j == i) { continue; }
let r = particles[j].pos - particles[i].pos;
let d = length(r);
if (d < params.smoothing_length) {
force += sph_force(particles[i], particles[j], r, d);
}
}
particles[i].vel += force * params.dt;
particles[i].pos += particles[i].vel * params.dt;
}
```
### GPU-driven culling (frustum)
```wgsl
@group(0) @binding(0) var<storage, read> instances: array<InstanceData>;
@group(0) @binding(1) var<storage, read_write> draw_args: array<DrawArgs>;
@group(0) @binding(2) var<uniform> camera: Camera;
@compute @workgroup_size(64)
fn cull(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
if (i >= arrayLength(&instances)) { return; }
if (in_frustum(instances[i].bounding_box, camera.frustum)) {
let slot = atomicAdd(&draw_args[0].instance_count, 1u);
visible_indices[slot] = i;
}
}
```
### Compute skinning (vertex transform pre-pass)
```wgsl
@group(0) @binding(0) var<storage, read> bone_matrices: array<mat4x4<f32>>;
@group(0) @binding(1) var<storage, read> base_vertices: array<Vertex>;
@group(0) @binding(2) var<storage, read_write> skinned: array<vec4<f32>>;
@compute @workgroup_size(64)
fn skin(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
let v = base_vertices[i];
var pos = vec4<f32>(0.0);
for (var b = 0u; b < 4u; b++) {
pos += bone_matrices[v.bone_idx[b]] * vec4<f32>(v.position, 1.0) * v.bone_weight[b];
}
skinned[i] = pos;
}
// 매 매 render pass 의 skinned 의 read.
```
### Workgroup shared memory (reduction)
```wgsl
var<workgroup> shared: array<f32, 64>;
@compute @workgroup_size(64)
fn sum_reduce(
@builtin(local_invocation_id) lid: vec3<u32>,
@builtin(global_invocation_id) gid: vec3<u32>,
) {
shared[lid.x] = input[gid.x];
workgroupBarrier();
// 매 tree reduction
for (var stride = 32u; stride > 0u; stride >>= 1u) {
if (lid.x < stride) {
shared[lid.x] += shared[lid.x + stride];
}
workgroupBarrier();
}
if (lid.x == 0u) {
output[workgroup_id.x] = shared[0];
}
}
```
### Async render (Three.js)
```js
// 매 compute pass 의 finish 후 의 render
async function frame() {
await renderer.computeAsync(particleUpdate);
await renderer.renderAsync(scene, camera);
}
```
## 🤔 결정 기준
| 상황 | Approach |
|---|---|
| 100K+ particle | Compute shader |
| Fluid sim | Compute + storage texture |
| Frustum culling | GPU-driven culling |
| ML inference (browser) | WebGPU + WGSL |
| Image processing | Compute + storage texture |
| Skinned mesh (many) | Compute skinning |
| < 10K particle | CPU OK |
| < 1000 instance | CPU instance |
**기본값**: WebGPU + Three.js v160+ for web. wgpu-rs for native.
## 🔗 Graph
- 부모: [[WebGPU]] · [[Computer-Graphics]]
- 변형: [[WGSL]] · [[GPU-Driven-Rendering]] · [[Indirect-Draw]]
- 응용: [[Three-js]] · [[Particle-System]]
- Adjacent: [[CSS Animations]] · [[Web-Performance]] · [[Bottlenecks]] · [[Bioenergetics]] (energy-efficient)
## 🤖 LLM 활용
**언제**: 매 web GPU compute. 매 large particle / sim. 매 GPU-driven rendering. 매 browser ML.
**언제 X**: 매 small task (CPU OK). 매 WebGL only fallback 필요.
## ❌ 안티패턴
- **CPU-GPU readback every frame**: 매 sync stall.
- **Workgroup size 의 wrong** (e.g., 8): 매 underutilization.
- **No barrier**: 매 race condition.
- **Storage texture 의 use w/o WebGPU**: 매 unsupported.
- **Sync compute + render**: 매 stall.
- **No fallback (older browser)**: 매 break.
## 🧪 검증 / 중복
- Verified (WebGPU spec, Three.js webgpu, Hokusai exhibition).
- 신뢰도 A.
- Related: [[CSS Animations]] · [[Web-Performance]] · [[Bottlenecks]] · [[Baseline-Project]] · [[20k skinned instances demo]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-19 | Auto-mapped |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — workgroup + 매 WGSL / Three.js / fluid / culling / skinning code |