--- id: wiki-2026-0508-physical-intelligence title: Physical Intelligence category: 10_Wiki/Topics status: verified canonical_id: self aliases: [PI, π0, pi-zero, embodied-foundation-model] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [robotics, foundation-model, embodied-ai, vla] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: JAX/PyTorch --- # Physical Intelligence ## 매 한 줄 > **"매 robot 의 universal foundation model — 매 ChatGPT moment for embodied AI"**. 매 Physical Intelligence (PI, 2024 launch)는 π0 — 매 vision-language-action (VLA) foundation model 의 출시한 startup. 매 single weights 로 매 다양한 robot 매 dishwashing, laundry folding, table bussing 의 수행. ## 매 핵심 ### 매 회사 + 모델 - **매 회사**: Physical Intelligence (Carolina Parada, Sergey Levine, Chelsea Finn 등 — 매 Google Brain/Stanford alumni). 2024 founded, $400M+ raised, $2.4B valuation. - **매 π0 (pi-zero, 2024-10)**: 매 first VLA foundation model. PaliGemma (3B VLM) backbone + 매 action expert (300M params, flow matching for continuous actions). 매 50Hz control. - **매 π0.5 (2025)**: open-world generalization, hierarchical planning, longer-horizon tasks. - **매 π0-FAST**: tokenized action representation (FAST — Frequency-space Action Sequence Tokenization), 5× faster training. ### 매 architecture key - **매 VLA = VLM + action head**: 매 vision (ViT) + language (LLM) + action decoder. - **매 flow matching action expert**: 매 continuous robot actions 매 discrete tokens 의 X — 매 flow matching 의 학습. - **매 cross-embodiment**: single model 매 7+ robot platforms (ALOHA, UR5e, Franka, mobile manipulators). - **매 internet pretraining + robot fine-tune**: 매 PaliGemma weights 의 시작 → 매 10K+ hours robot demos 의 training. ### 매 응용 1. 매 household chore robot (laundry folding, dishwashing). 2. 매 warehouse manipulation (Covariant + PI partnership). 3. 매 humanoid foundation model (Figure 02, 1X NEO compatibility 의 explore). ## 💻 패턴 ### π0 inference (lerobot integration) ```python # pip install lerobot transformers from lerobot.common.policies.pi0 import PI0Policy import torch policy = PI0Policy.from_pretrained("lerobot/pi0") policy.eval().to("cuda") obs = { "observation.images.top": torch.zeros(1, 3, 224, 224).cuda(), "observation.state": torch.zeros(1, 14).cuda(), # joint positions "task": ["fold the towel"], } with torch.no_grad(): action_chunk = policy.select_action(obs) # (1, 50, 14) — 50-step chunk ``` ### Action chunking + temporal ensembling ```python # π0 outputs 50-step action chunks at 50Hz # Execute first k steps, predict again — temporal ensemble for smoothness ACTION_HORIZON = 50 EXECUTE_STEPS = 8 action_buffer = collections.deque(maxlen=ACTION_HORIZON) for t in range(MAX_STEPS): if t % EXECUTE_STEPS == 0: chunk = policy(obs) # predict 50-step chunk action_buffer.extend(chunk[0]) action = action_buffer.popleft() obs = robot.step(action) ``` ### Flow matching action head (simplified) ```python import torch.nn as nn class FlowMatchingActionHead(nn.Module): def __init__(self, dim=1024, action_dim=14, horizon=50): super().__init__() self.net = nn.Sequential( nn.Linear(dim + action_dim + 1, 1024), nn.SiLU(), nn.Linear(1024, action_dim), ) def forward(self, vlm_features, noisy_action, t): x = torch.cat([vlm_features, noisy_action, t], dim=-1) return self.net(x) # velocity field def sample(self, vlm_features, num_steps=10): a = torch.randn(B, 50, 14) for i in range(num_steps): t = i / num_steps v = self.forward(vlm_features, a, t) a = a + v / num_steps return a ``` ### LeRobot dataset format ```python from lerobot.common.datasets.lerobot_dataset import LeRobotDataset ds = LeRobotDataset("lerobot/aloha_static_fork_pick_up") sample = ds[0] # {'observation.images.top': tensor, 'observation.state': tensor, # 'action': tensor, 'task': 'pick up the fork'} ``` ### Cross-embodiment fine-tune ```python # Fine-tune π0 on a new robot (e.g. custom 6-DoF arm) config = PI0Config(action_dim=6, state_dim=6) # adjust dims policy = PI0Policy(config) policy.load_pretrained_vlm("lerobot/pi0") # load PaliGemma + freeze # Train action head only on small (~1000 episode) custom dataset ``` ### Language-conditioned task switch ```python # Same weights, different language prompts → different behaviors for task in ["fold the shirt", "pick up the cup", "wipe the table"]: obs["task"] = [task] action = policy(obs) execute(action) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | 매 single-task robot, abundant data | 매 task-specific BC/RL | | 매 multi-task, language-conditioned | 매 π0 fine-tune | | 매 zero-shot new task | 매 π0.5 (open-world) | | 매 humanoid full-body | 매 π0 + whole-body controller | | 매 high-frequency control (>100Hz) | 매 distill π0 → smaller policy | **기본값**: 매 cross-embodiment manipulation 의 π0 fine-tune (lerobot 사용). ## 🔗 Graph - 부모: [[Embodied-AI]] · [[Foundation-Models]] ## 🤖 LLM 활용 **언제**: 매 multi-task robot manipulation, language-conditioned policy, cross-embodiment transfer 의 사용. **언제 X**: 매 simple pick-and-place (overkill), 매 sub-50Hz needed (latency), 매 contact-rich precision tasks (still ongoing research). ## ❌ 안티패턴 - **매 raw pretrained π0 deploy**: 매 fine-tune 없이 — 매 robot/scene mismatch 의 fail. - **매 ignore action chunking**: 매 single-step prediction → 매 jittery motion. - **매 mismatched camera intrinsics**: 매 training cam 매 deploy cam 의 different → 매 OOD failure. ## 🧪 검증 / 중복 - Verified (Physical Intelligence official, π0 paper 2024-10, lerobot integration). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — π0/π0.5 VLA foundation model + lerobot patterns |