--- id: wiki-2026-0508-pooling title: Pooling category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Max Pooling, Average Pooling, Global Pooling] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [deep-learning, cnn, pooling, downsampling] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: pytorch --- # Pooling ## 매 한 줄 > **"매 spatial/sequence dimension downsample — invariance + receptive field 확대."**. CNN 시대의 staple (max/avg pool), 매 modern Transformer는 거의 안 씀 (strided conv 또는 attention pooling). Global pool은 여전히 classification head 표준. ## 매 핵심 ### 매 종류 - **Max Pooling**: window 내 max — translation invariance, edge-preserve. - **Average Pooling**: window 평균 — smooth, all-pixel contribute. - **Global Average Pooling (GAP)**: 매 entire feature map → 단일 값. ResNet/EfficientNet head. - **Adaptive Pooling**: output size fix → input size 무관 (PyTorch `AdaptiveAvgPool2d`). - **Attention Pooling**: weighted sum, learned weights — ViT [CLS] 또는 perceiver. - **L_p Pooling, Stochastic Pooling, Mixed Pooling**: less common, occasionally robust. ### 매 왜 사용 - **Downsampling**: spatial size 줄여 compute / params 감소. - **Invariance**: small translation에 robust. - **Receptive field 확대**: deeper layer가 wider context 봄. - **Overfitting 방지**: parameter-free regularization 효과. ### 매 modern shift - 2020+ Transformer 시대 — 매 pool 자리에 strided conv (stage transition) 또는 patch merging (Swin) 또는 attention pooling. - ConvNeXt도 strided conv 사용. - GAP은 classification head에서 여전히 universal. ## 💻 패턴 ### Max / Avg pool 기본 ```python import torch.nn as nn maxp = nn.MaxPool2d(kernel_size=2, stride=2) # H,W /2 avgp = nn.AvgPool2d(kernel_size=2, stride=2) ``` ### Global Average Pooling (classification head) ```python import torch.nn as nn class Head(nn.Module): def __init__(self, c, n_cls): super().__init__() self.gap = nn.AdaptiveAvgPool2d(1) self.fc = nn.Linear(c, n_cls) def forward(self, x): # x: (B, C, H, W) x = self.gap(x).flatten(1) # (B, C) return self.fc(x) ``` ### Adaptive pool (variable input size) ```python import torch, torch.nn as nn pool = nn.AdaptiveAvgPool2d((7, 7)) # 항상 7x7 output x = torch.randn(2, 64, 33, 41) # 임의 spatial y = pool(x) # (2, 64, 7, 7) ``` ### Attention Pooling (ViT [CLS]) ```python import torch, torch.nn as nn class AttnPool(nn.Module): def __init__(self, d, heads=8): super().__init__() self.q = nn.Parameter(torch.randn(1, 1, d)) self.attn = nn.MultiheadAttention(d, heads, batch_first=True) def forward(self, x): # x: (B, N, D) B = x.size(0) q = self.q.expand(B, -1, -1) out, _ = self.attn(q, x, x) return out.squeeze(1) # (B, D) ``` ### Patch Merging (Swin Transformer) ```python import torch, torch.nn as nn class PatchMerging(nn.Module): def __init__(self, dim): super().__init__() self.norm = nn.LayerNorm(4*dim) self.reduction = nn.Linear(4*dim, 2*dim, bias=False) def forward(self, x): # x: (B, H, W, C) x0 = x[:, 0::2, 0::2, :]; x1 = x[:, 1::2, 0::2, :] x2 = x[:, 0::2, 1::2, :]; x3 = x[:, 1::2, 1::2, :] x = torch.cat([x0,x1,x2,x3], -1) return self.reduction(self.norm(x)) ``` ### 1D pool (sequence / audio) ```python import torch.nn as nn pool1d = nn.MaxPool1d(kernel_size=4, stride=4) # (B, C, T) -> (B, C, T/4) gap1d = nn.AdaptiveAvgPool1d(1) ``` ### Set/Graph pooling (mean/max/sum) ```python import torch def set_mean(x, mask): # x:(B,N,D), mask:(B,N) m = mask.unsqueeze(-1).float() return (x*m).sum(1) / m.sum(1).clamp(min=1) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Classification final feature | Global Avg Pooling | | Variable input image | AdaptiveAvgPool2d | | Edge-preserve detection | Max Pool 또는 strided conv | | Transformer stage transition | Patch merging / strided conv | | Set/sequence aggregation | Attention pool | | Audio waveform | 1D max/avg pool 또는 strided conv | **기본값**: feature map → GAP, downsample → strided conv (modern). ## 🔗 Graph - 부모: [[Deep_Learning]] - 변형: [[Max_Pooling]] · [[Average_Pooling]] - 응용: [[Image-Classification-Mastery]] · [[ResNet]] · [[ViT]] ## 🤖 LLM 활용 **언제**: CNN backbone에서 spatial reduce, classification head GAP, set/graph aggregation. **언제 X**: dense prediction (segmentation, detection)에서 매 정보 손실 — skip connection 결합 또는 dilated conv 고려. ## ❌ 안티패턴 - **Pool then upsample for segmentation without skip**: 매 detail 손실. U-Net skip 사용. - **MaxPool everywhere in modern arch**: 매 strided conv가 매 학습 가능 — 거의 dominant. - **Flatten without GAP**: classification head fully-connected로 들어가면 매 huge params + overfit. - **Pool over tokens with [CLS] available**: attention pool 또는 [CLS] readout 매 better. ## 🧪 검증 / 중복 - Verified (PyTorch docs nn.MaxPool2d, AdaptiveAvgPool, Swin Transformer paper, ConvNeXt paper). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — pooling types + modern shift to strided conv / attention pool |