--- id: wiki-2026-0508-perceptrons-foundations title: Perceptrons-Foundations category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Perceptron, Rosenblatt Perceptron, MLP] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [perceptron, neural-network, mlp, history, foundations] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: pytorch, numpy --- # Perceptrons-Foundations ## 매 한 줄 > **"매 weighted sum + threshold = NN의 atom"**. Rosenblatt 1957 perceptron — 매 first trainable neuron model. Single-layer 의 XOR fail (Minsky 1969) → AI winter. MLP + backprop (1986) 의 revival. 매 modern transformer 도 결국 stacked perceptron. ## 매 핵심 ### 매 history - 1943: McCulloch-Pitts neuron (binary, no learning). - 1957: Rosenblatt perceptron — 매 hardware Mark I, learnable weights. - 1969: Minsky & Papert "Perceptrons" — 매 XOR limit proven → first AI winter. - 1986: Rumelhart, Hinton, Williams — 매 backprop revives MLP. - 2012: AlexNet — 매 deep MLP/CNN era 시작. ### 매 perceptron 수학 - `y = step(w·x + b)` where step(z) = 1 if z ≥ 0 else 0. - Update rule (Rosenblatt): `w ← w + η(y_true - y_pred)x`. - Convergence theorem: 매 linearly separable data 에 한해 finite steps 수렴. - Limit: 매 XOR (non-linearly separable) 학습 불가. ### 매 multi-layer (MLP) - Hidden layer + nonlinearity (sigmoid → ReLU → GELU). - Universal approximation theorem (Cybenko 1989, Hornik 1991): 매 single hidden layer with enough units 가 매 continuous function 근사 가능. - Training: backprop (chain rule으로 gradient 계산). ### 매 modern lens - Transformer FFN block = 2-layer MLP per token. - ViT, MLP-Mixer 등 매 pure-MLP 의 vision SOTA 도전. - 매 every "neural network" 의 atomic unit — perceptron. ### 매 응용 1. Pedagogical (NN intro). 2. Linear classifier (single perceptron). 3. Building block (MLP in transformer). 4. Mixture-of-Experts: each expert = MLP. ## 💻 패턴 ### Perceptron from scratch ```python import numpy as np class Perceptron: def __init__(self, n_features, lr=0.1): self.w = np.zeros(n_features) self.b = 0.0 self.lr = lr def predict(self, x): return 1 if x @ self.w + self.b >= 0 else 0 def fit(self, X, y, epochs=100): for _ in range(epochs): for xi, yi in zip(X, y): pred = self.predict(xi) err = yi - pred self.w += self.lr * err * xi self.b += self.lr * err ``` ### XOR fails for single perceptron ```python X = np.array([[0,0],[0,1],[1,0],[1,1]]) y = np.array([0, 1, 1, 0]) # XOR p = Perceptron(2) p.fit(X, y, epochs=1000) # Will NOT converge — XOR is not linearly separable ``` ### MLP solves XOR ```python import torch.nn as nn mlp = nn.Sequential( nn.Linear(2, 4), nn.ReLU(), nn.Linear(4, 1), nn.Sigmoid(), ) # Train with BCELoss + Adam — converges in <1000 steps ``` ### Transformer FFN = MLP per token ```python class FFN(nn.Module): def __init__(self, dim, hidden): super().__init__() self.up = nn.Linear(dim, hidden) self.down = nn.Linear(hidden, dim) def forward(self, x): return self.down(nn.functional.gelu(self.up(x))) ``` ### MLP-Mixer style (pure MLP vision) ```python class MixerBlock(nn.Module): def __init__(self, n_patches, dim): super().__init__() self.token_mix = nn.Sequential(nn.Linear(n_patches, n_patches*4), nn.GELU(), nn.Linear(n_patches*4, n_patches)) self.channel_mix = nn.Sequential(nn.Linear(dim, dim*4), nn.GELU(), nn.Linear(dim*4, dim)) def forward(self, x): # (B, N, D) x = x + self.token_mix(x.transpose(1,2)).transpose(1,2) x = x + self.channel_mix(x) return x ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Linearly separable | Single perceptron OK | | Non-linear pattern | MLP (>=1 hidden layer) | | Tabular data | Tree models (XGBoost) usually beat MLP | | Image | CNN or ViT (still MLP-based) | | Sequence | Transformer (MLP + attention) | | Pedagogical | Start with perceptron history | **기본값**: 매 modern model 의 building block 으로 MLP 이해. ## 🔗 Graph - 변형: [[MLP]] - 응용: [[MoE]] - Adjacent: [[데이터 사이언스 및 ML 엔지니어링|Backpropagation]] ## 🤖 LLM 활용 **언제**: 매 NN fundamentals, debugging gradient flow, designing custom architectures. **언제 X**: 매 production tabular tasks (use GBDT instead). ## ❌ 안티패턴 - **Linear activation only**: 매 multi-layer linear = single linear (collapses). 매 nonlinearity 필수. - **Step function in modern NN**: 매 non-differentiable → backprop fail. 매 ReLU/GELU 사용. - **Too wide, too shallow**: 매 universal approximation 가능해도 deep 가 sample-efficient. - **Forgetting bias**: 매 b=0 forced → cannot shift decision boundary off origin. ## 🧪 검증 / 중복 - Verified (Rosenblatt 1958, Minsky-Papert 1969, Rumelhart et al. 1986). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — perceptron history, XOR limit, MLP modern lens |