defstutter_step(unit,enemy):ifunit.weapon_ready:returnAction.attack(enemy)ifdist(unit,enemy)<unit.rangeandenemy.range>unit.range:returnAction.move(away_from(enemy))returnAction.move(toward(enemy))# close gap
importgymnasiumasgymfromstable_baselines3importPPOenv=gym.make("SC2MoveToBeacon-v0")# 단순 micro envmodel=PPO("MultiInputPolicy",env,n_steps=2048,batch_size=64,learning_rate=3e-4,gamma=0.99,verbose=1)model.learn(total_timesteps=2_000_000)
League training 개념 (AlphaStar)
# Main agents + Main exploiters + League exploiters; PFSP matchmaking# Prioritized Fictitious Self-Play: 약한 상대일수록 자주 매칭하지 않음
Replay-based behavior cloning
# 1) replay parse → (state, action) pairs# 2) supervised cross-entropy on actions# 3) RL fine-tune from BC checkpoint (jumpstart RL exploration)