--- id: wiki-2026-0508-micro-management title: Micro-management (RTS) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Micro, RTS Micro, Unit Micro, StarCraft Micro] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: applied tags: [game-design, rts, ai, alphastar, reinforcement-learning, starcraft, esports] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: { language: python, framework: pysc2|sc2|gymnasium } --- # Micro-management (RTS) > 한 줄: RTS에서 개별 유닛을 정밀 컨트롤(이동·공격·스킬)해 전투력을 극대화하는 행위 — APM·반응 속도·상황 인지의 종합. AlphaStar가 인간 상위 0.2%를 돌파한 RL 벤치마크. ## 핵심 - **Macro vs Micro**: macro = 경제·생산·확장, micro = 유닛 컨트롤. 프로는 둘 다. - **APM**: Actions Per Minute. 인간 프로 200-400, AlphaStar 비공식 cap 280 (인간 수준 제한). - **핵심 기술**: focus fire, kiting (stutter step), spreading (스플래시 회피), flanking, target priority, 시거리 활용. - **AI 벤치마크**: SC2 (DeepMind PySC2/AlphaStar), Dota 2 (OpenAI Five), Honor of Kings (JueWu) — long-horizon, partial observation, multi-agent의 종합 시험대. - **현대 RL**: self-play + league training (AlphaStar), PPO/IMPALA, large-scale distributed. ## 결정 기준 | 목표 | 접근 | |---|---| | RTS AI 학습 환경 | PySC2 (SC2), microRTS (가벼움), Stratagus | | 단순 micro 봇 | 스크립트 + finite state machine | | 강화학습 | PPO + LSTM/transformer policy | | 멀티 유닛 협동 | MARL (QMIX, MAPPO) | | 모방 학습 시작 | Replay 데이터 → behavior cloning | | 대규모 학습 | League / population-based training | ## 💻 패턴 ### PySC2 minimal agent ```python from pysc2.agents import base_agent from pysc2.lib import actions, features class MicroAgent(base_agent.BaseAgent): def step(self, obs): super().step(obs) marines = [u for u in obs.observation.feature_units if u.unit_type == features.PlayerRelative.SELF and u.alliance == 1] if not marines: return actions.FUNCTIONS.no_op() target = pick_lowest_hp_enemy(obs) if target is not None: return actions.FUNCTIONS.Attack_screen("now", [target.x, target.y]) return actions.FUNCTIONS.no_op() ``` ### Kiting (stutter step) 의사 코드 ```python def stutter_step(unit, enemy): if unit.weapon_ready: return Action.attack(enemy) if dist(unit, enemy) < unit.range and enemy.range > unit.range: return Action.move(away_from(enemy)) return Action.move(toward(enemy)) # close gap ``` ### Focus fire (target priority) ```python def focus_target(my_units, enemies): # threat = damage * speed / hp_remaining return min(enemies, key=lambda e: e.hp / max(e.threat, 1)) ``` ### PPO with PySC2 (sketch) ```python import gymnasium as gym from stable_baselines3 import PPO env = gym.make("SC2MoveToBeacon-v0") # 단순 micro env model = PPO("MultiInputPolicy", env, n_steps=2048, batch_size=64, learning_rate=3e-4, gamma=0.99, verbose=1) model.learn(total_timesteps=2_000_000) ``` ### League training 개념 (AlphaStar) ```python # Main agents + Main exploiters + League exploiters; PFSP matchmaking # Prioritized Fictitious Self-Play: 약한 상대일수록 자주 매칭하지 않음 ``` ### Replay-based behavior cloning ```python # 1) replay parse → (state, action) pairs # 2) supervised cross-entropy on actions # 3) RL fine-tune from BC checkpoint (jumpstart RL exploration) ``` ## 🔗 Graph - 상위: [[Reinforcement-Learning]] - 관련: [[PPO]] ## 🤖 LLM 활용 - 게임 디자인 토론(LLM brainstorm): 새 유닛 micro skill 설계, 카운터 분석. - 코드 어시스트: PySC2 보일러플레이트, replay parser. - LLM 단독으로 RTS 플레이는 비현실적 (실시간·연속 액션 부적합) — RL과 결합. ## ❌ 안티패턴 - **APM만 추구** — 의미 없는 클릭 (spam). EPM (effective APM)이 중요. - **모든 유닛 attack-move** — 마이크로 포기. 부대 분리·focus fire·kiting 조합. - **RL 환경에서 sparse reward만** — 학습 안 됨. shaping (kill, hp 차이) + curriculum. - **single-agent로 multi-unit RTS 학습** — 정책 collapse. 적절한 abstraction (군 단위) 또는 MARL. - **AlphaStar 재현 시도 with 작은 GPU** — 수만 TPU-day 필요. microRTS·SMAC 같은 축소 환경부터. ## 🧪 검증 / 중복 - 중복 후보 없음. - 검증: 표준 task — MoveToBeacon, CollectMineralShards, DefeatRoaches (PySC2 mini-games)·SMAC 시나리오 win-rate. ## 🕓 Changelog - 2026-05-08 | Phase 1 — 자동 시드. - 2026-05-10 | Manual cleanup — PySC2/PPO/league 패턴, kiting 의사코드, RL 안티패턴 정리.