"매 weighted sum + threshold = NN의 atom". Rosenblatt 1957 perceptron — 매 first trainable neuron model. Single-layer 의 XOR fail (Minsky 1969) → AI winter. MLP + backprop (1986) 의 revival. 매 modern transformer 도 결국 stacked perceptron.
매 핵심
매 history
1943: McCulloch-Pitts neuron (binary, no learning).
1957: Rosenblatt perceptron — 매 hardware Mark I, learnable weights.
1969: Minsky & Papert "Perceptrons" — 매 XOR limit proven → first AI winter.
1986: Rumelhart, Hinton, Williams — 매 backprop revives MLP.
2012: AlexNet — 매 deep MLP/CNN era 시작.
매 perceptron 수학
y = step(w·x + b) where step(z) = 1 if z ≥ 0 else 0.
Update rule (Rosenblatt): w ← w + η(y_true - y_pred)x.
Convergence theorem: 매 linearly separable data 에 한해 finite steps 수렴.
X=np.array([[0,0],[0,1],[1,0],[1,1]])y=np.array([0,1,1,0])# XORp=Perceptron(2)p.fit(X,y,epochs=1000)# Will NOT converge — XOR is not linearly separable
MLP solves XOR
importtorch.nnasnnmlp=nn.Sequential(nn.Linear(2,4),nn.ReLU(),nn.Linear(4,1),nn.Sigmoid(),)# Train with BCELoss + Adam — converges in <1000 steps
classMixerBlock(nn.Module):def__init__(self,n_patches,dim):super().__init__()self.token_mix=nn.Sequential(nn.Linear(n_patches,n_patches*4),nn.GELU(),nn.Linear(n_patches*4,n_patches))self.channel_mix=nn.Sequential(nn.Linear(dim,dim*4),nn.GELU(),nn.Linear(dim*4,dim))defforward(self,x):# (B, N, D)x=x+self.token_mix(x.transpose(1,2)).transpose(1,2)x=x+self.channel_mix(x)returnx