Files
2nd/10_Wiki/Topics/AI_and_ML/Deep-Convolutional-GANs.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

7.6 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-dcgan DCGAN (Deep Convolutional GAN) 10_Wiki/Topics verified self
DCGAN
GAN
generative adversarial network
StyleGAN
CycleGAN
Pix2Pix
none A 0.88 applied
gan
dcgan
generative-models
deep-learning
image-generation
stylegan
cyclegan
history
2026-05-10 pending
language framework
Python PyTorch

DCGAN

매 한 줄

"매 GAN 의 first 의 stable architecture" (Radford 2015). 매 stride conv + 매 batch norm + 매 specific activation. 매 generative AI 의 grandparent. 매 modern: 매 Diffusion 의 superseded 가, 매 fast inference / GAN-based super-res / image-to-image 의 still relevant.

매 핵심

매 GAN basics (Goodfellow 2014)

  • Generator: 매 noise → 매 image.
  • Discriminator: 매 real vs fake.
  • Min-max game.

DCGAN (2015) 의 contribution

  1. Strided conv (no pooling).
  2. BatchNorm in both G and D.
  3. No fully-connected hidden layer.
  4. ReLU in G (Tanh output).
  5. LeakyReLU in D.

매 famous GAN evolution

  • DCGAN (2015): 매 stable.
  • WGAN (2017): 매 Wasserstein loss, 매 mode collapse 의 mitigate.
  • Pix2Pix (2017): 매 image-to-image.
  • CycleGAN (2017): 매 unpaired.
  • StyleGAN (2018-2021): 매 face quality SOTA.
  • BigGAN (2018): 매 large-scale.
  • GigaGAN (2023): 매 text-to-image GAN.

매 mode collapse

  • 매 G 의 매 limited variety 의 generate.
  • 매 mitigation: 매 minibatch discrimination, 매 spectral norm, 매 WGAN.

매 evaluation

  • FID (Fréchet Inception Distance): 매 generated vs real 의 distance.
  • IS (Inception Score).
  • Precision / Recall (Kynkäänniemi 2019).

매 modern relevance

  • Super-resolution (ESRGAN, Real-ESRGAN).
  • Image-to-image (CycleGAN 의 still useful).
  • Domain adaptation (sim2real).
  • Implicit EBM.
  • Fast generation (vs slow diffusion).

매 vs Diffusion

측면 GAN Diffusion
Quality High (StyleGAN) Highest
Diversity Mode collapse risk High
Training stability Tricky Stable
Inference speed Fast (single step) Slow (multi-step)
Conditioning Hard Easy (CLIP)

💻 패턴

DCGAN (PyTorch)

import torch.nn as nn

class Generator(nn.Module):
    def __init__(self, nz=100, ngf=64, nc=3):
        super().__init__()
        self.main = nn.Sequential(
            # 매 input: nz × 1 × 1
            nn.ConvTranspose2d(nz, ngf*8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf*8),
            nn.ReLU(True),
            # 매 ngf*8 × 4 × 4
            nn.ConvTranspose2d(ngf*8, ngf*4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf*4),
            nn.ReLU(True),
            # 매 ... → 매 nc × 64 × 64
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh(),
        )
    
    def forward(self, z):
        return self.main(z)

class Discriminator(nn.Module):
    def __init__(self, ndf=64, nc=3):
        super().__init__()
        self.main = nn.Sequential(
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, True),
            nn.Conv2d(ndf, ndf*2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf*2),
            nn.LeakyReLU(0.2, True),
            # 매 ... 
            nn.Conv2d(ndf*8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid(),
        )
    
    def forward(self, x):
        return self.main(x).view(-1)

Training loop

G = Generator().to('cuda')
D = Discriminator().to('cuda')
opt_g = torch.optim.Adam(G.parameters(), lr=2e-4, betas=(0.5, 0.999))
opt_d = torch.optim.Adam(D.parameters(), lr=2e-4, betas=(0.5, 0.999))
criterion = nn.BCELoss()

for real, _ in loader:
    real = real.to('cuda')
    bs = real.size(0)
    
    # 매 D step
    opt_d.zero_grad()
    d_real = D(real)
    loss_real = criterion(d_real, torch.ones(bs).to('cuda'))
    
    z = torch.randn(bs, 100, 1, 1).to('cuda')
    fake = G(z)
    d_fake = D(fake.detach())
    loss_fake = criterion(d_fake, torch.zeros(bs).to('cuda'))
    
    (loss_real + loss_fake).backward()
    opt_d.step()
    
    # 매 G step
    opt_g.zero_grad()
    d_fake = D(fake)
    loss_g = criterion(d_fake, torch.ones(bs).to('cuda'))
    loss_g.backward()
    opt_g.step()

WGAN-GP (modern stable)

def gradient_penalty(D, real, fake):
    bs = real.size(0)
    alpha = torch.rand(bs, 1, 1, 1).to(real.device)
    interp = alpha * real + (1 - alpha) * fake
    interp.requires_grad_()
    
    d_interp = D(interp)
    grads = torch.autograd.grad(d_interp.sum(), interp, create_graph=True)[0]
    grad_norm = grads.view(bs, -1).norm(2, dim=1)
    return ((grad_norm - 1) ** 2).mean()

# 매 D step
loss_d = D(fake).mean() - D(real).mean() + 10 * gradient_penalty(D, real, fake)

Real-ESRGAN (super-resolution, modern application)

from realesrgan import RealESRGAN
from PIL import Image

model = RealESRGAN(device='cuda', scale=4)
model.load_weights('weights/RealESRGAN_x4.pth', download=True)

img = Image.open('low_res.jpg')
sr_img = model.predict(img)
sr_img.save('high_res.jpg')

CycleGAN (unpaired image-to-image)

# 매 매 horse → 매 zebra (no pair)
# 매 G_ab: 매 A → B, 매 G_ba: 매 B → A
# 매 cycle loss: 매 G_ba(G_ab(a)) ≈ a

StyleGAN inversion (modern)

from stylegan2_pytorch import Trainer

# 매 매 image 의 latent z 의 find
def invert(image, generator, n_iters=1000):
    z = torch.randn(1, 512, requires_grad=True)
    for _ in range(n_iters):
        gen = generator(z)
        loss = (gen - image).pow(2).mean()
        loss.backward()
        z.data -= 0.01 * z.grad.data
        z.grad.zero_()
    return z

FID evaluation

from pytorch_fid import fid_score

fid = fid_score.calculate_fid_given_paths(
    ['./real_images/', './fake_images/'],
    batch_size=50,
    device='cuda',
    dims=2048,
)
print(f'FID: {fid:.2f}')  # 매 lower 의 better

매 결정 기준

응용 Method
Photoreal generation Diffusion (SDXL, Flux)
Face StyleGAN3
Super-resolution Real-ESRGAN
Domain adapt (unpaired) CycleGAN
Sim2Real CycleGAN / paired
Fast inference GAN > Diffusion
Quality + control Diffusion + ControlNet

기본값: 매 modern generation = Diffusion. 매 GAN = 매 SR / I2I 의 still.

🔗 Graph

🤖 LLM 활용

언제: 매 GAN history. 매 fast generation. 매 super-res. 매 image-to-image. 언제 X: 매 highest quality (use diffusion).

안티패턴

  • DCGAN 의 production 의 force: 매 modern 의 diffusion 더 좋음.
  • No mode collapse check: 매 single output.
  • WGAN-GP 없 의 unstable: 매 training fail.
  • FID 만 의 trust: 매 다른 metric 도.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — DCGAN architecture + GAN evolution + 매 PyTorch / WGAN-GP / Real-ESRGAN code