f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
162 lines
5.3 KiB
Markdown
162 lines
5.3 KiB
Markdown
---
|
|
id: wiki-2026-0508-perceptrons-foundations
|
|
title: Perceptrons-Foundations
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Perceptron, Rosenblatt Perceptron, MLP]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.95
|
|
verification_status: applied
|
|
tags: [perceptron, neural-network, mlp, history, foundations]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: pytorch, numpy
|
|
---
|
|
|
|
# Perceptrons-Foundations
|
|
|
|
## 매 한 줄
|
|
> **"매 weighted sum + threshold = NN의 atom"**. Rosenblatt 1957 perceptron — 매 first trainable neuron model. Single-layer 의 XOR fail (Minsky 1969) → AI winter. MLP + backprop (1986) 의 revival. 매 modern transformer 도 결국 stacked perceptron.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 history
|
|
- 1943: McCulloch-Pitts neuron (binary, no learning).
|
|
- 1957: Rosenblatt perceptron — 매 hardware Mark I, learnable weights.
|
|
- 1969: Minsky & Papert "Perceptrons" — 매 XOR limit proven → first AI winter.
|
|
- 1986: Rumelhart, Hinton, Williams — 매 backprop revives MLP.
|
|
- 2012: AlexNet — 매 deep MLP/CNN era 시작.
|
|
|
|
### 매 perceptron 수학
|
|
- `y = step(w·x + b)` where step(z) = 1 if z ≥ 0 else 0.
|
|
- Update rule (Rosenblatt): `w ← w + η(y_true - y_pred)x`.
|
|
- Convergence theorem: 매 linearly separable data 에 한해 finite steps 수렴.
|
|
- Limit: 매 XOR (non-linearly separable) 학습 불가.
|
|
|
|
### 매 multi-layer (MLP)
|
|
- Hidden layer + nonlinearity (sigmoid → ReLU → GELU).
|
|
- Universal approximation theorem (Cybenko 1989, Hornik 1991): 매 single hidden layer with enough units 가 매 continuous function 근사 가능.
|
|
- Training: backprop (chain rule으로 gradient 계산).
|
|
|
|
### 매 modern lens
|
|
- Transformer FFN block = 2-layer MLP per token.
|
|
- ViT, MLP-Mixer 등 매 pure-MLP 의 vision SOTA 도전.
|
|
- 매 every "neural network" 의 atomic unit — perceptron.
|
|
|
|
### 매 응용
|
|
1. Pedagogical (NN intro).
|
|
2. Linear classifier (single perceptron).
|
|
3. Building block (MLP in transformer).
|
|
4. Mixture-of-Experts: each expert = MLP.
|
|
|
|
## 💻 패턴
|
|
|
|
### Perceptron from scratch
|
|
```python
|
|
import numpy as np
|
|
|
|
class Perceptron:
|
|
def __init__(self, n_features, lr=0.1):
|
|
self.w = np.zeros(n_features)
|
|
self.b = 0.0
|
|
self.lr = lr
|
|
|
|
def predict(self, x):
|
|
return 1 if x @ self.w + self.b >= 0 else 0
|
|
|
|
def fit(self, X, y, epochs=100):
|
|
for _ in range(epochs):
|
|
for xi, yi in zip(X, y):
|
|
pred = self.predict(xi)
|
|
err = yi - pred
|
|
self.w += self.lr * err * xi
|
|
self.b += self.lr * err
|
|
```
|
|
|
|
### XOR fails for single perceptron
|
|
```python
|
|
X = np.array([[0,0],[0,1],[1,0],[1,1]])
|
|
y = np.array([0, 1, 1, 0]) # XOR
|
|
p = Perceptron(2)
|
|
p.fit(X, y, epochs=1000)
|
|
# Will NOT converge — XOR is not linearly separable
|
|
```
|
|
|
|
### MLP solves XOR
|
|
```python
|
|
import torch.nn as nn
|
|
mlp = nn.Sequential(
|
|
nn.Linear(2, 4), nn.ReLU(),
|
|
nn.Linear(4, 1), nn.Sigmoid(),
|
|
)
|
|
# Train with BCELoss + Adam — converges in <1000 steps
|
|
```
|
|
|
|
### Transformer FFN = MLP per token
|
|
```python
|
|
class FFN(nn.Module):
|
|
def __init__(self, dim, hidden):
|
|
super().__init__()
|
|
self.up = nn.Linear(dim, hidden)
|
|
self.down = nn.Linear(hidden, dim)
|
|
def forward(self, x):
|
|
return self.down(nn.functional.gelu(self.up(x)))
|
|
```
|
|
|
|
### MLP-Mixer style (pure MLP vision)
|
|
```python
|
|
class MixerBlock(nn.Module):
|
|
def __init__(self, n_patches, dim):
|
|
super().__init__()
|
|
self.token_mix = nn.Sequential(nn.Linear(n_patches, n_patches*4),
|
|
nn.GELU(), nn.Linear(n_patches*4, n_patches))
|
|
self.channel_mix = nn.Sequential(nn.Linear(dim, dim*4),
|
|
nn.GELU(), nn.Linear(dim*4, dim))
|
|
def forward(self, x): # (B, N, D)
|
|
x = x + self.token_mix(x.transpose(1,2)).transpose(1,2)
|
|
x = x + self.channel_mix(x)
|
|
return x
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Linearly separable | Single perceptron OK |
|
|
| Non-linear pattern | MLP (>=1 hidden layer) |
|
|
| Tabular data | Tree models (XGBoost) usually beat MLP |
|
|
| Image | CNN or ViT (still MLP-based) |
|
|
| Sequence | Transformer (MLP + attention) |
|
|
| Pedagogical | Start with perceptron history |
|
|
|
|
**기본값**: 매 modern model 의 building block 으로 MLP 이해.
|
|
|
|
## 🔗 Graph
|
|
- 변형: [[MLP]]
|
|
- 응용: [[MoE]]
|
|
- Adjacent: [[데이터_사이언스_및_ML_엔지니어링|Backpropagation]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 NN fundamentals, debugging gradient flow, designing custom architectures.
|
|
**언제 X**: 매 production tabular tasks (use GBDT instead).
|
|
|
|
## ❌ 안티패턴
|
|
- **Linear activation only**: 매 multi-layer linear = single linear (collapses). 매 nonlinearity 필수.
|
|
- **Step function in modern NN**: 매 non-differentiable → backprop fail. 매 ReLU/GELU 사용.
|
|
- **Too wide, too shallow**: 매 universal approximation 가능해도 deep 가 sample-efficient.
|
|
- **Forgetting bias**: 매 b=0 forced → cannot shift decision boundary off origin.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Rosenblatt 1958, Minsky-Papert 1969, Rumelhart et al. 1986).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — perceptron history, XOR limit, MLP modern lens |
|