Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

5.2 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Perceptrons-Foundations

매 한 줄

"매 weighted sum + threshold = NN의 atom". Rosenblatt 1957 perceptron — 매 first trainable neuron model. Single-layer 의 XOR fail (Minsky 1969) → AI winter. MLP + backprop (1986) 의 revival. 매 modern transformer 도 결국 stacked perceptron.

매 핵심

매 history

1943: McCulloch-Pitts neuron (binary, no learning).
1957: Rosenblatt perceptron — 매 hardware Mark I, learnable weights.
1969: Minsky & Papert "Perceptrons" — 매 XOR limit proven → first AI winter.
1986: Rumelhart, Hinton, Williams — 매 backprop revives MLP.
2012: AlexNet — 매 deep MLP/CNN era 시작.

매 perceptron 수학

y = step(w·x + b) where step(z) = 1 if z ≥ 0 else 0.
Update rule (Rosenblatt): w ← w + η(y_true - y_pred)x.
Convergence theorem: 매 linearly separable data 에 한해 finite steps 수렴.
Limit: 매 XOR (non-linearly separable) 학습 불가.

매 multi-layer (MLP)

Hidden layer + nonlinearity (sigmoid → ReLU → GELU).
Universal approximation theorem (Cybenko 1989, Hornik 1991): 매 single hidden layer with enough units 가 매 continuous function 근사 가능.
Training: backprop (chain rule으로 gradient 계산).

매 modern lens

Transformer FFN block = 2-layer MLP per token.
ViT, MLP-Mixer 등 매 pure-MLP 의 vision SOTA 도전.
매 every "neural network" 의 atomic unit — perceptron.

매 응용

Pedagogical (NN intro).
Linear classifier (single perceptron).
Building block (MLP in transformer).
Mixture-of-Experts: each expert = MLP.

💻 패턴

Perceptron from scratch

import numpy as np

class Perceptron:
    def __init__(self, n_features, lr=0.1):
        self.w = np.zeros(n_features)
        self.b = 0.0
        self.lr = lr

    def predict(self, x):
        return 1 if x @ self.w + self.b >= 0 else 0

    def fit(self, X, y, epochs=100):
        for _ in range(epochs):
            for xi, yi in zip(X, y):
                pred = self.predict(xi)
                err = yi - pred
                self.w += self.lr * err * xi
                self.b += self.lr * err

XOR fails for single perceptron

X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0, 1, 1, 0])  # XOR
p = Perceptron(2)
p.fit(X, y, epochs=1000)
# Will NOT converge — XOR is not linearly separable

MLP solves XOR

import torch.nn as nn
mlp = nn.Sequential(
    nn.Linear(2, 4), nn.ReLU(),
    nn.Linear(4, 1), nn.Sigmoid(),
)
# Train with BCELoss + Adam — converges in <1000 steps

Transformer FFN = MLP per token

class FFN(nn.Module):
    def __init__(self, dim, hidden):
        super().__init__()
        self.up = nn.Linear(dim, hidden)
        self.down = nn.Linear(hidden, dim)
    def forward(self, x):
        return self.down(nn.functional.gelu(self.up(x)))

MLP-Mixer style (pure MLP vision)

class MixerBlock(nn.Module):
    def __init__(self, n_patches, dim):
        super().__init__()
        self.token_mix = nn.Sequential(nn.Linear(n_patches, n_patches*4),
                                        nn.GELU(), nn.Linear(n_patches*4, n_patches))
        self.channel_mix = nn.Sequential(nn.Linear(dim, dim*4),
                                          nn.GELU(), nn.Linear(dim*4, dim))
    def forward(self, x):  # (B, N, D)
        x = x + self.token_mix(x.transpose(1,2)).transpose(1,2)
        x = x + self.channel_mix(x)
        return x

매 결정 기준

상황	Approach
Linearly separable	Single perceptron OK
Non-linear pattern	MLP (>=1 hidden layer)
Tabular data	Tree models (XGBoost) usually beat MLP
Image	CNN or ViT (still MLP-based)
Sequence	Transformer (MLP + attention)
Pedagogical	Start with perceptron history

기본값: 매 modern model 의 building block 으로 MLP 이해.

🔗 Graph

🤖 LLM 활용

언제: 매 NN fundamentals, debugging gradient flow, designing custom architectures. 언제 X: 매 production tabular tasks (use GBDT instead).

❌ 안티패턴

Linear activation only: 매 multi-layer linear = single linear (collapses). 매 nonlinearity 필수.
Step function in modern NN: 매 non-differentiable → backprop fail. 매 ReLU/GELU 사용.
Too wide, too shallow: 매 universal approximation 가능해도 deep 가 sample-efficient.
Forgetting bias: 매 b=0 forced → cannot shift decision boundary off origin.

🧪 검증 / 중복

Verified (Rosenblatt 1958, Minsky-Papert 1969, Rumelhart et al. 1986).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — perceptron history, XOR limit, MLP modern lens

5.2 KiB Raw Blame History