Files
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

301 lines
8.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-compute-shader
title: Compute Shader (WebGPU)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [compute shader, WebGPU compute, GPGPU, WGSL, GPU-driven rendering, indirect draw]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [webgpu, compute-shader, gpgpu, wgsl, gpu-driven-rendering, three-js, particle-system, simulation]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: WGSL / WebGPU
framework: Three.js / Babylon.js / wgpu-rs
---
# Compute Shader
## 매 한 줄
> **"매 GPU thousand core 의 parallel"**. 매 WebGPU 의 introduce → 매 web 의 GPGPU 의 가능. 매 particle, 매 fluid sim, 매 culling, 매 ML inference. 매 CPU 30ms (10K particle) → 매 GPU 2ms (100K particle) — 매 150× faster.
## 매 핵심
### 매 use case
1. **Particle system**: 매 millions.
2. **Fluid simulation**: 매 SPH, 매 grid-based.
3. **Cloth / soft-body**.
4. **Procedural terrain**.
5. **GPU-driven rendering**: 매 culling, 매 indirect draw.
6. **Compute skinning**: 매 GPU 의 vertex transform.
7. **Image processing**: 매 blur, 매 filter.
8. **GPGPU**: 매 ML inference, 매 numerical.
### 매 vs vertex / fragment shader
- **Vertex**: 매 per-vertex.
- **Fragment**: 매 per-pixel.
- **Compute**: 매 arbitrary computation, 매 storage R/W.
### 매 핵심 concept
#### Workgroup
- 매 thread group (e.g., 8×8×1 = 64 threads).
- 매 shared memory.
- 매 hardware-mapped (warp / wavefront).
#### Storage buffer / texture
- 매 read + write (vs sampled texture only read).
- 매 fluid sim 등 의 essential.
#### Workgroup variable (shared memory)
- 매 매 thread group 의 share.
- 매 10-100× faster than global.
- 매 reduction, prefix sum 의 base.
#### Indirect draw
- 매 GPU 의 draw command 의 generate.
- 매 CPU-GPU sync 의 minimize.
### 매 WGSL (WebGPU Shading Language)
- 매 syntax: 매 Rust-like.
- 매 type-strict.
- 매 vertex / fragment / compute 의 unified.
### 매 sync / async
- 매 GPU 의 async by default.
- 매 dependency 의 explicit barrier.
- 매 readback 의 expensive (avoid).
### 매 modern application
- **Three.js WebGPU renderer**: 매 v160+.
- **Babylon.js**.
- **wgpu-rs**: 매 native + web.
- **Hokusai** (Expo 2025 Osaka): 매 1M particle fluid.
- **Million-component BIM platform**.
## 💻 패턴
### Basic compute shader (WGSL)
```wgsl
// 매 add two arrays
@group(0) @binding(0) var<storage, read> input_a: array<f32>;
@group(0) @binding(1) var<storage, read> input_b: array<f32>;
@group(0) @binding(2) var<storage, read_write> output: array<f32>;
@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
let idx = id.x;
if (idx >= arrayLength(&input_a)) { return; }
output[idx] = input_a[idx] + input_b[idx];
}
```
### JavaScript dispatch (WebGPU)
```js
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
// 매 buffer
const inputA = device.createBuffer({
size: data.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(inputA, 0, data);
// 매 pipeline
const module = device.createShaderModule({ code: wgslSource });
const pipeline = device.createComputePipeline({
layout: 'auto',
compute: { module, entryPoint: 'main' },
});
const bindGroup = device.createBindGroup({
layout: pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: inputA } },
{ binding: 1, resource: { buffer: inputB } },
{ binding: 2, resource: { buffer: output } },
],
});
// 매 dispatch
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(Math.ceil(data.length / 64));
pass.end();
device.queue.submit([encoder.finish()]);
```
### Particle system (Three.js WebGPU)
```js
import { Fn, instanceIndex, storage, attribute } from 'three/webgpu';
const positionsAttribute = new Float32Array(N_PARTICLES * 3);
const positionsBuffer = renderer.computeAsync(
Fn(() => {
const i = instanceIndex;
const pos = storage(positionsAttribute, 'vec3', N_PARTICLES);
pos.element(i).addAssign(velocity.element(i).mul(dt));
pos.element(i).y.assign(pos.element(i).y.sub(gravity * dt));
// 매 boundary
If(pos.element(i).y.lessThan(0), () => {
pos.element(i).y.assign(0);
velocity.element(i).y.mulAssign(-0.8);
});
})().compute(N_PARTICLES)
);
```
### Fluid simulation (SPH-style)
```wgsl
// 매 매 particle 의 neighbor 의 search + 매 force compute
@group(0) @binding(0) var<storage, read_write> particles: array<Particle>;
@group(0) @binding(1) var<uniform> params: SimParams;
@compute @workgroup_size(64)
fn step(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
if (i >= arrayLength(&particles)) { return; }
var force = vec3<f32>(0.0, -9.8, 0.0);
// 매 neighbor sum (simplified — real SPH uses spatial grid)
for (var j = 0u; j < arrayLength(&particles); j++) {
if (j == i) { continue; }
let r = particles[j].pos - particles[i].pos;
let d = length(r);
if (d < params.smoothing_length) {
force += sph_force(particles[i], particles[j], r, d);
}
}
particles[i].vel += force * params.dt;
particles[i].pos += particles[i].vel * params.dt;
}
```
### GPU-driven culling (frustum)
```wgsl
@group(0) @binding(0) var<storage, read> instances: array<InstanceData>;
@group(0) @binding(1) var<storage, read_write> draw_args: array<DrawArgs>;
@group(0) @binding(2) var<uniform> camera: Camera;
@compute @workgroup_size(64)
fn cull(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
if (i >= arrayLength(&instances)) { return; }
if (in_frustum(instances[i].bounding_box, camera.frustum)) {
let slot = atomicAdd(&draw_args[0].instance_count, 1u);
visible_indices[slot] = i;
}
}
```
### Compute skinning (vertex transform pre-pass)
```wgsl
@group(0) @binding(0) var<storage, read> bone_matrices: array<mat4x4<f32>>;
@group(0) @binding(1) var<storage, read> base_vertices: array<Vertex>;
@group(0) @binding(2) var<storage, read_write> skinned: array<vec4<f32>>;
@compute @workgroup_size(64)
fn skin(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
let v = base_vertices[i];
var pos = vec4<f32>(0.0);
for (var b = 0u; b < 4u; b++) {
pos += bone_matrices[v.bone_idx[b]] * vec4<f32>(v.position, 1.0) * v.bone_weight[b];
}
skinned[i] = pos;
}
// 매 매 render pass 의 skinned 의 read.
```
### Workgroup shared memory (reduction)
```wgsl
var<workgroup> shared: array<f32, 64>;
@compute @workgroup_size(64)
fn sum_reduce(
@builtin(local_invocation_id) lid: vec3<u32>,
@builtin(global_invocation_id) gid: vec3<u32>,
) {
shared[lid.x] = input[gid.x];
workgroupBarrier();
// 매 tree reduction
for (var stride = 32u; stride > 0u; stride >>= 1u) {
if (lid.x < stride) {
shared[lid.x] += shared[lid.x + stride];
}
workgroupBarrier();
}
if (lid.x == 0u) {
output[workgroup_id.x] = shared[0];
}
}
```
### Async render (Three.js)
```js
// 매 compute pass 의 finish 후 의 render
async function frame() {
await renderer.computeAsync(particleUpdate);
await renderer.renderAsync(scene, camera);
}
```
## 🤔 결정 기준
| 상황 | Approach |
|---|---|
| 100K+ particle | Compute shader |
| Fluid sim | Compute + storage texture |
| Frustum culling | GPU-driven culling |
| ML inference (browser) | WebGPU + WGSL |
| Image processing | Compute + storage texture |
| Skinned mesh (many) | Compute skinning |
| < 10K particle | CPU OK |
| < 1000 instance | CPU instance |
**기본값**: WebGPU + Three.js v160+ for web. wgpu-rs for native.
## 🔗 Graph
- 부모: [[WebGPU]] · [[Computer-Graphics]]
- 변형: [[WGSL]] · [[GPU-driven Rendering]] · [[Indirect Draw]]
- 응용: [[Three.js]] · [[Particle-System]]
- Adjacent: [[CSS Animations]] · [[Web-Performance]] · [[Bottlenecks]] · [[Bioenergetics]] (energy-efficient)
## 🤖 LLM 활용
**언제**: 매 web GPU compute. 매 large particle / sim. 매 GPU-driven rendering. 매 browser ML.
**언제 X**: 매 small task (CPU OK). 매 WebGL only fallback 필요.
## ❌ 안티패턴
- **CPU-GPU readback every frame**: 매 sync stall.
- **Workgroup size 의 wrong** (e.g., 8): 매 underutilization.
- **No barrier**: 매 race condition.
- **Storage texture 의 use w/o WebGPU**: 매 unsupported.
- **Sync compute + render**: 매 stall.
- **No fallback (older browser)**: 매 break.
## 🧪 검증 / 중복
- Verified (WebGPU spec, Three.js webgpu, Hokusai exhibition).
- 신뢰도 A.
- Related: [[CSS Animations]] · [[Web-Performance]] · [[Bottlenecks]] · [[Baseline (Web Platform Features)]] · [[20k skinned instances demo]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-19 | Auto-mapped |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — workgroup + 매 WGSL / Three.js / fluid / culling / skinning code |