Files
2nd/10_Wiki/Topics/AI_and_ML/Compute Shader.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

8.9 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-compute-shader Compute Shader (WebGPU) 10_Wiki/Topics verified self
compute shader
WebGPU compute
GPGPU
WGSL
GPU-driven rendering
indirect draw
none A 0.9 applied
webgpu
compute-shader
gpgpu
wgsl
gpu-driven-rendering
three-js
particle-system
simulation
2026-05-10 pending
language framework
WGSL / WebGPU Three.js / Babylon.js / wgpu-rs

Compute Shader

매 한 줄

"매 GPU thousand core 의 parallel". 매 WebGPU 의 introduce → 매 web 의 GPGPU 의 가능. 매 particle, 매 fluid sim, 매 culling, 매 ML inference. 매 CPU 30ms (10K particle) → 매 GPU 2ms (100K particle) — 매 150× faster.

매 핵심

매 use case

  1. Particle system: 매 millions.
  2. Fluid simulation: 매 SPH, 매 grid-based.
  3. Cloth / soft-body.
  4. Procedural terrain.
  5. GPU-driven rendering: 매 culling, 매 indirect draw.
  6. Compute skinning: 매 GPU 의 vertex transform.
  7. Image processing: 매 blur, 매 filter.
  8. GPGPU: 매 ML inference, 매 numerical.

매 vs vertex / fragment shader

  • Vertex: 매 per-vertex.
  • Fragment: 매 per-pixel.
  • Compute: 매 arbitrary computation, 매 storage R/W.

매 핵심 concept

Workgroup

  • 매 thread group (e.g., 8×8×1 = 64 threads).
  • 매 shared memory.
  • 매 hardware-mapped (warp / wavefront).

Storage buffer / texture

  • 매 read + write (vs sampled texture only read).
  • 매 fluid sim 등 의 essential.

Workgroup variable (shared memory)

  • 매 매 thread group 의 share.
  • 매 10-100× faster than global.
  • 매 reduction, prefix sum 의 base.

Indirect draw

  • 매 GPU 의 draw command 의 generate.
  • 매 CPU-GPU sync 의 minimize.

매 WGSL (WebGPU Shading Language)

  • 매 syntax: 매 Rust-like.
  • 매 type-strict.
  • 매 vertex / fragment / compute 의 unified.

매 sync / async

  • 매 GPU 의 async by default.
  • 매 dependency 의 explicit barrier.
  • 매 readback 의 expensive (avoid).

매 modern application

  • Three.js WebGPU renderer: 매 v160+.
  • Babylon.js.
  • wgpu-rs: 매 native + web.
  • Hokusai (Expo 2025 Osaka): 매 1M particle fluid.
  • Million-component BIM platform.

💻 패턴

Basic compute shader (WGSL)

// 매 add two arrays
@group(0) @binding(0) var<storage, read> input_a: array<f32>;
@group(0) @binding(1) var<storage, read> input_b: array<f32>;
@group(0) @binding(2) var<storage, read_write> output: array<f32>;

@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  let idx = id.x;
  if (idx >= arrayLength(&input_a)) { return; }
  output[idx] = input_a[idx] + input_b[idx];
}

JavaScript dispatch (WebGPU)

const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

// 매 buffer
const inputA = device.createBuffer({
  size: data.byteLength,
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(inputA, 0, data);

// 매 pipeline
const module = device.createShaderModule({ code: wgslSource });
const pipeline = device.createComputePipeline({
  layout: 'auto',
  compute: { module, entryPoint: 'main' },
});

const bindGroup = device.createBindGroup({
  layout: pipeline.getBindGroupLayout(0),
  entries: [
    { binding: 0, resource: { buffer: inputA } },
    { binding: 1, resource: { buffer: inputB } },
    { binding: 2, resource: { buffer: output } },
  ],
});

// 매 dispatch
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(Math.ceil(data.length / 64));
pass.end();
device.queue.submit([encoder.finish()]);

Particle system (Three.js WebGPU)

import { Fn, instanceIndex, storage, attribute } from 'three/webgpu';

const positionsAttribute = new Float32Array(N_PARTICLES * 3);
const positionsBuffer = renderer.computeAsync(
  Fn(() => {
    const i = instanceIndex;
    const pos = storage(positionsAttribute, 'vec3', N_PARTICLES);
    pos.element(i).addAssign(velocity.element(i).mul(dt));
    pos.element(i).y.assign(pos.element(i).y.sub(gravity * dt));
    // 매 boundary
    If(pos.element(i).y.lessThan(0), () => {
      pos.element(i).y.assign(0);
      velocity.element(i).y.mulAssign(-0.8);
    });
  })().compute(N_PARTICLES)
);

Fluid simulation (SPH-style)

// 매 매 particle 의 neighbor 의 search + 매 force compute
@group(0) @binding(0) var<storage, read_write> particles: array<Particle>;
@group(0) @binding(1) var<uniform> params: SimParams;

@compute @workgroup_size(64)
fn step(@builtin(global_invocation_id) id: vec3<u32>) {
  let i = id.x;
  if (i >= arrayLength(&particles)) { return; }
  
  var force = vec3<f32>(0.0, -9.8, 0.0);
  
  // 매 neighbor sum (simplified — real SPH uses spatial grid)
  for (var j = 0u; j < arrayLength(&particles); j++) {
    if (j == i) { continue; }
    let r = particles[j].pos - particles[i].pos;
    let d = length(r);
    if (d < params.smoothing_length) {
      force += sph_force(particles[i], particles[j], r, d);
    }
  }
  
  particles[i].vel += force * params.dt;
  particles[i].pos += particles[i].vel * params.dt;
}

GPU-driven culling (frustum)

@group(0) @binding(0) var<storage, read> instances: array<InstanceData>;
@group(0) @binding(1) var<storage, read_write> draw_args: array<DrawArgs>;
@group(0) @binding(2) var<uniform> camera: Camera;

@compute @workgroup_size(64)
fn cull(@builtin(global_invocation_id) id: vec3<u32>) {
  let i = id.x;
  if (i >= arrayLength(&instances)) { return; }
  
  if (in_frustum(instances[i].bounding_box, camera.frustum)) {
    let slot = atomicAdd(&draw_args[0].instance_count, 1u);
    visible_indices[slot] = i;
  }
}

Compute skinning (vertex transform pre-pass)

@group(0) @binding(0) var<storage, read> bone_matrices: array<mat4x4<f32>>;
@group(0) @binding(1) var<storage, read> base_vertices: array<Vertex>;
@group(0) @binding(2) var<storage, read_write> skinned: array<vec4<f32>>;

@compute @workgroup_size(64)
fn skin(@builtin(global_invocation_id) id: vec3<u32>) {
  let i = id.x;
  let v = base_vertices[i];
  
  var pos = vec4<f32>(0.0);
  for (var b = 0u; b < 4u; b++) {
    pos += bone_matrices[v.bone_idx[b]] * vec4<f32>(v.position, 1.0) * v.bone_weight[b];
  }
  
  skinned[i] = pos;
}

// 매 매 render pass 의 skinned 의 read.

Workgroup shared memory (reduction)

var<workgroup> shared: array<f32, 64>;

@compute @workgroup_size(64)
fn sum_reduce(
  @builtin(local_invocation_id) lid: vec3<u32>,
  @builtin(global_invocation_id) gid: vec3<u32>,
) {
  shared[lid.x] = input[gid.x];
  workgroupBarrier();
  
  // 매 tree reduction
  for (var stride = 32u; stride > 0u; stride >>= 1u) {
    if (lid.x < stride) {
      shared[lid.x] += shared[lid.x + stride];
    }
    workgroupBarrier();
  }
  
  if (lid.x == 0u) {
    output[workgroup_id.x] = shared[0];
  }
}

Async render (Three.js)

// 매 compute pass 의 finish 후 의 render
async function frame() {
  await renderer.computeAsync(particleUpdate);
  await renderer.renderAsync(scene, camera);
}

🤔 결정 기준

상황 Approach
100K+ particle Compute shader
Fluid sim Compute + storage texture
Frustum culling GPU-driven culling
ML inference (browser) WebGPU + WGSL
Image processing Compute + storage texture
Skinned mesh (many) Compute skinning
< 10K particle CPU OK
< 1000 instance CPU instance

기본값: WebGPU + Three.js v160+ for web. wgpu-rs for native.

🔗 Graph

🤖 LLM 활용

언제: 매 web GPU compute. 매 large particle / sim. 매 GPU-driven rendering. 매 browser ML. 언제 X: 매 small task (CPU OK). 매 WebGL only fallback 필요.

안티패턴

  • CPU-GPU readback every frame: 매 sync stall.
  • Workgroup size 의 wrong (e.g., 8): 매 underutilization.
  • No barrier: 매 race condition.
  • Storage texture 의 use w/o WebGPU: 매 unsupported.
  • Sync compute + render: 매 stall.
  • No fallback (older browser): 매 break.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-04-19 Auto-mapped
2026-05-08 Phase 1
2026-05-10 Manual cleanup — workgroup + 매 WGSL / Three.js / fluid / culling / skinning code