2nd/10_Wiki/Topics/AI_and_ML/Micro-management.md

---
id: wiki-2026-0508-micro-management
title: Micro-management (RTS)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Micro, RTS Micro, Unit Micro, StarCraft Micro]
duplicate_of: none
source_trust_level: B
confidence_score: 0.85
verification_status: applied
tags: [game-design, rts, ai, alphastar, reinforcement-learning, starcraft, esports]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack: { language: python, framework: pysc2|sc2|gymnasium }
---

# Micro-management (RTS)

> 한 줄: RTS에서 개별 유닛을 정밀 컨트롤(이동·공격·스킬)해 전투력을 극대화하는 행위 — APM·반응 속도·상황 인지의 종합. AlphaStar가 인간 상위 0.2%를 돌파한 RL 벤치마크.

## 핵심
- **Macro vs Micro**: macro = 경제·생산·확장, micro = 유닛 컨트롤. 프로는 둘 다.
- **APM**: Actions Per Minute. 인간 프로 200-400, AlphaStar 비공식 cap 280 (인간 수준 제한).
- **핵심 기술**: focus fire, kiting (stutter step), spreading (스플래시 회피), flanking, target priority, 시거리 활용.
- **AI 벤치마크**: SC2 (DeepMind PySC2/AlphaStar), Dota 2 (OpenAI Five), Honor of Kings (JueWu) — long-horizon, partial observation, multi-agent의 종합 시험대.
- **현대 RL**: self-play + league training (AlphaStar), PPO/IMPALA, large-scale distributed.

## 결정 기준
| 목표 | 접근 |
|---|---|
| RTS AI 학습 환경 | PySC2 (SC2), microRTS (가벼움), Stratagus |
| 단순 micro 봇 | 스크립트 + finite state machine |
| 강화학습 | PPO + LSTM/transformer policy |
| 멀티 유닛 협동 | MARL (QMIX, MAPPO) |
| 모방 학습 시작 | Replay 데이터 → behavior cloning |
| 대규모 학습 | League / population-based training |

## 💻 패턴

### PySC2 minimal agent
```python
from pysc2.agents import base_agent
from pysc2.lib import actions, features

class MicroAgent(base_agent.BaseAgent):
    def step(self, obs):
        super().step(obs)
        marines = [u for u in obs.observation.feature_units
                   if u.unit_type == features.PlayerRelative.SELF and u.alliance == 1]
        if not marines: return actions.FUNCTIONS.no_op()
        target = pick_lowest_hp_enemy(obs)
        if target is not None:
            return actions.FUNCTIONS.Attack_screen("now", [target.x, target.y])
        return actions.FUNCTIONS.no_op()
```

### Kiting (stutter step) 의사 코드
```python
def stutter_step(unit, enemy):
    if unit.weapon_ready:
        return Action.attack(enemy)
    if dist(unit, enemy) < unit.range and enemy.range > unit.range:
        return Action.move(away_from(enemy))
    return Action.move(toward(enemy))   # close gap
```

### Focus fire (target priority)
```python
def focus_target(my_units, enemies):
    # threat = damage * speed / hp_remaining
    return min(enemies, key=lambda e: e.hp / max(e.threat, 1))
```

### PPO with PySC2 (sketch)
```python
import gymnasium as gym
from stable_baselines3 import PPO
env = gym.make("SC2MoveToBeacon-v0")  # 단순 micro env
model = PPO("MultiInputPolicy", env, n_steps=2048, batch_size=64,
            learning_rate=3e-4, gamma=0.99, verbose=1)
model.learn(total_timesteps=2_000_000)
```

### League training 개념 (AlphaStar)
```python
# Main agents + Main exploiters + League exploiters; PFSP matchmaking
# Prioritized Fictitious Self-Play: 약한 상대일수록 자주 매칭하지 않음
```

### Replay-based behavior cloning
```python
# 1) replay parse → (state, action) pairs
# 2) supervised cross-entropy on actions
# 3) RL fine-tune from BC checkpoint (jumpstart RL exploration)
```

## 🔗 Graph
- 상위: [[Reinforcement-Learning]]
- 관련: [[PPO]]

## 🤖 LLM 활용
- 게임 디자인 토론(LLM brainstorm): 새 유닛 micro skill 설계, 카운터 분석.
- 코드 어시스트: PySC2 보일러플레이트, replay parser.
- LLM 단독으로 RTS 플레이는 비현실적 (실시간·연속 액션 부적합) — RL과 결합.

## ❌ 안티패턴
- **APM만 추구** — 의미 없는 클릭 (spam). EPM (effective APM)이 중요.
- **모든 유닛 attack-move** — 마이크로 포기. 부대 분리·focus fire·kiting 조합.
- **RL 환경에서 sparse reward만** — 학습 안 됨. shaping (kill, hp 차이) + curriculum.
- **single-agent로 multi-unit RTS 학습** — 정책 collapse. 적절한 abstraction (군 단위) 또는 MARL.
- **AlphaStar 재현 시도 with 작은 GPU** — 수만 TPU-day 필요. microRTS·SMAC 같은 축소 환경부터.

## 🧪 검증 / 중복
- 중복 후보 없음.
- 검증: 표준 task — MoveToBeacon, CollectMineralShards, DefeatRoaches (PySC2 mini-games)·SMAC 시나리오 win-rate.

## 🕓 Changelog
- 2026-05-08 | Phase 1 — 자동 시드.
- 2026-05-10 | Manual cleanup — PySC2/PPO/league 패턴, kiting 의사코드, RL 안티패턴 정리.