"매 spatial/sequence dimension downsample — invariance + receptive field 확대.". CNN 시대의 staple (max/avg pool), 매 modern Transformer는 거의 안 씀 (strided conv 또는 attention pooling). Global pool은 여전히 classification head 표준.
매 핵심
매 종류
Max Pooling: window 내 max — translation invariance, edge-preserve.
Average Pooling: window 평균 — smooth, all-pixel contribute.
Global Average Pooling (GAP): 매 entire feature map → 단일 값. ResNet/EfficientNet head.
언제: CNN backbone에서 spatial reduce, classification head GAP, set/graph aggregation.
언제 X: dense prediction (segmentation, detection)에서 매 정보 손실 — skip connection 결합 또는 dilated conv 고려.
❌ 안티패턴
Pool then upsample for segmentation without skip: 매 detail 손실. U-Net skip 사용.
MaxPool everywhere in modern arch: 매 strided conv가 매 학습 가능 — 거의 dominant.
Flatten without GAP: classification head fully-connected로 들어가면 매 huge params + overfit.
Pool over tokens with [CLS] available: attention pool 또는 [CLS] readout 매 better.