Files
2nd/10_Wiki/Topics/AI_and_ML/Leaky-ReLU-and-Activations.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

143 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-leaky-relu-and-activations
title: Leaky ReLU and Activations
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Activation Functions, ReLU Family, GELU, SiLU, Swish]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [activation, relu, gelu, silu, swiglu, deep-learning]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack: { language: Python, framework: PyTorch }
---
# Leaky ReLU and Activations
## 매 한 줄
> **"매 activation = 비선형성"**. ReLU 계열이 base, Transformer는 GELU/SiLU/SwiGLU.
## 매 핵심
### 매 ReLU 계열
- **ReLU**: max(0, x). 빠름, dying ReLU 문제.
- **Leaky ReLU**: max(αx, x), α=0.01. 음수 작게 통과.
- **PReLU**: α 학습 가능 파라미터.
- **ELU**: x>0이면 x, 아니면 α(eˣ-1). 평균 0에 가까움.
- **SELU**: scaled ELU. self-normalizing (FC + lecun_normal init).
### 매 Smooth 계열
- **GELU**: x·Φ(x). BERT/GPT 표준. xerf 또는 tanh 근사.
- **SiLU/Swish**: x·σ(x). PaLM, EfficientNet.
- **Mish**: x·tanh(softplus(x)). YOLOv4.
### 매 Gated 계열 (FFN)
- **GLU**: (xW)⊗σ(xV). 정보 게이팅.
- **SwiGLU**: (xW)⊗SiLU(xV). LLaMA, PaLM FFN. 보통 hidden ×2/3 보정.
- **GeGLU**: GELU 변형.
### 매 Output 전용
- **Sigmoid**: 이진. saturation→gradient vanish.
- **Softmax**: multi-class probability.
- **Tanh**: [-1,1]. RNN, GAN generator.
### 매 직관
- ReLU: 빠르고 단순, but dead neurons
- GELU/SiLU: smooth, 0근처 비선형성↑, deep transformer에 유리
- SwiGLU: gating으로 expressiveness↑, 동일 param 대비 성능↑
## 💻 패턴
### PyTorch built-ins
```python
import torch.nn as nn, torch.nn.functional as F
# 클래스
nn.ReLU(); nn.LeakyReLU(0.01); nn.PReLU()
nn.ELU(alpha=1.0); nn.SELU()
nn.GELU(approximate="tanh"); nn.SiLU(); nn.Mish()
# 함수
F.relu(x); F.leaky_relu(x, 0.01); F.gelu(x); F.silu(x)
```
### SwiGLU FFN (LLaMA-style)
```python
class SwiGLU(nn.Module):
def __init__(self, d, d_hidden):
super().__init__()
self.w_gate = nn.Linear(d, d_hidden, bias=False)
self.w_up = nn.Linear(d, d_hidden, bias=False)
self.w_down = nn.Linear(d_hidden, d, bias=False)
def forward(self, x):
return self.w_down(F.silu(self.w_gate(x)) * self.w_up(x))
# 보통 d_hidden = int(8/3 * d), 64 multiple로 round
```
### GELU 직접
```python
import math, torch
def gelu_tanh(x):
return 0.5 * x * (1 + torch.tanh(math.sqrt(2/math.pi) * (x + 0.044715 * x**3)))
```
### Init과 페어링
```python
# ReLU/Leaky → He (Kaiming)
nn.init.kaiming_normal_(layer.weight, nonlinearity="relu")
# SELU → LeCun normal
nn.init.normal_(layer.weight, std=(1.0 / fan_in)**0.5)
# Tanh → Xavier
nn.init.xavier_normal_(layer.weight)
```
### Dying ReLU 진단
```python
# forward hook으로 zero ratio 측정
zero_ratio = (act == 0).float().mean().item()
# > 0.5 지속 → Leaky/ELU/GELU로 교체
```
## 매 결정 기준
| 모델 | Activation |
|---|---|
| CNN classic | ReLU |
| ResNet/EfficientNet | ReLU / SiLU |
| Transformer (BERT/GPT) | GELU |
| LLaMA / PaLM FFN | SwiGLU |
| GAN generator | Tanh (out), ReLU (hidden) |
| Self-normalizing FC | SELU + lecun_normal |
| YOLO 변형 | Mish |
| Output binary | Sigmoid |
| Output multiclass | Softmax (or none + CE) |
**기본값**: 일반 DL → ReLU. Transformer → GELU. LLM FFN → SwiGLU.
## 🔗 Graph
- 부모: [[Neural-Networks]], [[Deep-Learning]]
- 변형: [[GELU]]
- 응용: [[Transformer]], [[CNN]]
- Adjacent: [[Loss-Functions-Foundations]]
## 🤖 LLM 활용
**언제**: 모델별 표준 activation 추천, 코드 생성.
**언제 X**: 새로운 SoTA activation 검증은 실험 필요.
## ❌ 안티패턴
- ReLU + softmax 출력 hidden에 Sigmoid 끼우기
- SELU에 BatchNorm 같이 쓰기 (self-norm 깨짐)
- Sigmoid를 deep network hidden에 (vanishing)
- SwiGLU 쓰면서 hidden dim 보정 안 함 (param 늘어남)
- Output에 ReLU (negative target 못 표현)
- He init을 GELU/SiLU에도 (괜찮지만 정확히는 다름)
## 🧪 검증 / 중복
- Verified (He 2015, Hendrycks GELU, Ramachandran Swish, Shazeer SwiGLU). 신뢰도 A.
- 중복: 없음.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — SwiGLU/GELU 코드, init pairing |