Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

4.3 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Leaky ReLU and Activations

매 한 줄

"매 activation = 비선형성". ReLU 계열이 base, Transformer는 GELU/SiLU/SwiGLU.

매 핵심

매 ReLU 계열

ReLU: max(0, x). 빠름, dying ReLU 문제.
Leaky ReLU: max(αx, x), α=0.01. 음수 작게 통과.
PReLU: α 학습 가능 파라미터.
ELU: x>0이면 x, 아니면 α(eˣ-1). 평균 0에 가까움.
SELU: scaled ELU. self-normalizing (FC + lecun_normal init).

매 Smooth 계열

GELU: x·Φ(x). BERT/GPT 표준. xerf 또는 tanh 근사.
SiLU/Swish: x·σ(x). PaLM, EfficientNet.
Mish: x·tanh(softplus(x)). YOLOv4.

매 Gated 계열 (FFN)

GLU: (xW)⊗σ(xV). 정보 게이팅.
SwiGLU: (xW)⊗SiLU(xV). LLaMA, PaLM FFN. 보통 hidden ×2/3 보정.
GeGLU: GELU 변형.

매 Output 전용

Sigmoid: 이진. saturation→gradient vanish.
Softmax: multi-class probability.
Tanh: [-1,1]. RNN, GAN generator.

매 직관

ReLU: 빠르고 단순, but dead neurons
GELU/SiLU: smooth, 0근처 비선형성↑, deep transformer에 유리
SwiGLU: gating으로 expressiveness↑, 동일 param 대비 성능↑

💻 패턴

PyTorch built-ins

import torch.nn as nn, torch.nn.functional as F
# 클래스
nn.ReLU(); nn.LeakyReLU(0.01); nn.PReLU()
nn.ELU(alpha=1.0); nn.SELU()
nn.GELU(approximate="tanh"); nn.SiLU(); nn.Mish()
# 함수
F.relu(x); F.leaky_relu(x, 0.01); F.gelu(x); F.silu(x)

SwiGLU FFN (LLaMA-style)

class SwiGLU(nn.Module):
    def __init__(self, d, d_hidden):
        super().__init__()
        self.w_gate = nn.Linear(d, d_hidden, bias=False)
        self.w_up   = nn.Linear(d, d_hidden, bias=False)
        self.w_down = nn.Linear(d_hidden, d, bias=False)
    def forward(self, x):
        return self.w_down(F.silu(self.w_gate(x)) * self.w_up(x))
# 보통 d_hidden = int(8/3 * d), 64 multiple로 round

GELU 직접

import math, torch
def gelu_tanh(x):
    return 0.5 * x * (1 + torch.tanh(math.sqrt(2/math.pi) * (x + 0.044715 * x**3)))

Init과 페어링

# ReLU/Leaky → He (Kaiming)
nn.init.kaiming_normal_(layer.weight, nonlinearity="relu")
# SELU → LeCun normal
nn.init.normal_(layer.weight, std=(1.0 / fan_in)**0.5)
# Tanh → Xavier
nn.init.xavier_normal_(layer.weight)

Dying ReLU 진단

# forward hook으로 zero ratio 측정
zero_ratio = (act == 0).float().mean().item()
# > 0.5 지속 → Leaky/ELU/GELU로 교체

매 결정 기준

모델	Activation
CNN classic	ReLU
ResNet/EfficientNet	ReLU / SiLU
Transformer (BERT/GPT)	GELU
LLaMA / PaLM FFN	SwiGLU
GAN generator	Tanh (out), ReLU (hidden)
Self-normalizing FC	SELU + lecun_normal
YOLO 변형	Mish
Output binary	Sigmoid
Output multiclass	Softmax (or none + CE)

기본값: 일반 DL → ReLU. Transformer → GELU. LLM FFN → SwiGLU.

🔗 Graph

🤖 LLM 활용

언제: 모델별 표준 activation 추천, 코드 생성. 언제 X: 새로운 SoTA activation 검증은 실험 필요.

❌ 안티패턴

ReLU + softmax 출력 hidden에 Sigmoid 끼우기
SELU에 BatchNorm 같이 쓰기 (self-norm 깨짐)
Sigmoid를 deep network hidden에 (vanishing)
SwiGLU 쓰면서 hidden dim 보정 안 함 (param 늘어남)
Output에 ReLU (negative target 못 표현)
He init을 GELU/SiLU에도 (괜찮지만 정확히는 다름)

🧪 검증 / 중복

Verified (He 2015, Hendrycks GELU, Ramachandran Swish, Shazeer SwiGLU). 신뢰도 A.
중복: 없음.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — SwiGLU/GELU 코드, init pairing

4.3 KiB Raw Blame History Unescape Escape