Files
2nd/10_Wiki/Topics/AI_and_ML/Physical-Intelligence.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

167 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-physical-intelligence
title: Physical Intelligence
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [PI, π0, pi-zero, embodied-foundation-model]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [robotics, foundation-model, embodied-ai, vla]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: JAX/PyTorch
---
# Physical Intelligence
## 매 한 줄
> **"매 robot 의 universal foundation model — 매 ChatGPT moment for embodied AI"**. 매 Physical Intelligence (PI, 2024 launch)는 π0 — 매 vision-language-action (VLA) foundation model 의 출시한 startup. 매 single weights 로 매 다양한 robot 매 dishwashing, laundry folding, table bussing 의 수행.
## 매 핵심
### 매 회사 + 모델
- **매 회사**: Physical Intelligence (Carolina Parada, Sergey Levine, Chelsea Finn 등 — 매 Google Brain/Stanford alumni). 2024 founded, $400M+ raised, $2.4B valuation.
- **매 π0 (pi-zero, 2024-10)**: 매 first VLA foundation model. PaliGemma (3B VLM) backbone + 매 action expert (300M params, flow matching for continuous actions). 매 50Hz control.
- **매 π0.5 (2025)**: open-world generalization, hierarchical planning, longer-horizon tasks.
- **매 π0-FAST**: tokenized action representation (FAST — Frequency-space Action Sequence Tokenization), 5× faster training.
### 매 architecture key
- **매 VLA = VLM + action head**: 매 vision (ViT) + language (LLM) + action decoder.
- **매 flow matching action expert**: 매 continuous robot actions 매 discrete tokens 의 X — 매 flow matching 의 학습.
- **매 cross-embodiment**: single model 매 7+ robot platforms (ALOHA, UR5e, Franka, mobile manipulators).
- **매 internet pretraining + robot fine-tune**: 매 PaliGemma weights 의 시작 → 매 10K+ hours robot demos 의 training.
### 매 응용
1. 매 household chore robot (laundry folding, dishwashing).
2. 매 warehouse manipulation (Covariant + PI partnership).
3. 매 humanoid foundation model (Figure 02, 1X NEO compatibility 의 explore).
## 💻 패턴
### π0 inference (lerobot integration)
```python
# pip install lerobot transformers
from lerobot.common.policies.pi0 import PI0Policy
import torch
policy = PI0Policy.from_pretrained("lerobot/pi0")
policy.eval().to("cuda")
obs = {
"observation.images.top": torch.zeros(1, 3, 224, 224).cuda(),
"observation.state": torch.zeros(1, 14).cuda(), # joint positions
"task": ["fold the towel"],
}
with torch.no_grad():
action_chunk = policy.select_action(obs) # (1, 50, 14) — 50-step chunk
```
### Action chunking + temporal ensembling
```python
# π0 outputs 50-step action chunks at 50Hz
# Execute first k steps, predict again — temporal ensemble for smoothness
ACTION_HORIZON = 50
EXECUTE_STEPS = 8
action_buffer = collections.deque(maxlen=ACTION_HORIZON)
for t in range(MAX_STEPS):
if t % EXECUTE_STEPS == 0:
chunk = policy(obs) # predict 50-step chunk
action_buffer.extend(chunk[0])
action = action_buffer.popleft()
obs = robot.step(action)
```
### Flow matching action head (simplified)
```python
import torch.nn as nn
class FlowMatchingActionHead(nn.Module):
def __init__(self, dim=1024, action_dim=14, horizon=50):
super().__init__()
self.net = nn.Sequential(
nn.Linear(dim + action_dim + 1, 1024), nn.SiLU(),
nn.Linear(1024, action_dim),
)
def forward(self, vlm_features, noisy_action, t):
x = torch.cat([vlm_features, noisy_action, t], dim=-1)
return self.net(x) # velocity field
def sample(self, vlm_features, num_steps=10):
a = torch.randn(B, 50, 14)
for i in range(num_steps):
t = i / num_steps
v = self.forward(vlm_features, a, t)
a = a + v / num_steps
return a
```
### LeRobot dataset format
```python
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset("lerobot/aloha_static_fork_pick_up")
sample = ds[0]
# {'observation.images.top': tensor, 'observation.state': tensor,
# 'action': tensor, 'task': 'pick up the fork'}
```
### Cross-embodiment fine-tune
```python
# Fine-tune π0 on a new robot (e.g. custom 6-DoF arm)
config = PI0Config(action_dim=6, state_dim=6) # adjust dims
policy = PI0Policy(config)
policy.load_pretrained_vlm("lerobot/pi0") # load PaliGemma + freeze
# Train action head only on small (~1000 episode) custom dataset
```
### Language-conditioned task switch
```python
# Same weights, different language prompts → different behaviors
for task in ["fold the shirt", "pick up the cup", "wipe the table"]:
obs["task"] = [task]
action = policy(obs)
execute(action)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| 매 single-task robot, abundant data | 매 task-specific BC/RL |
| 매 multi-task, language-conditioned | 매 π0 fine-tune |
| 매 zero-shot new task | 매 π0.5 (open-world) |
| 매 humanoid full-body | 매 π0 + whole-body controller |
| 매 high-frequency control (>100Hz) | 매 distill π0 → smaller policy |
**기본값**: 매 cross-embodiment manipulation 의 π0 fine-tune (lerobot 사용).
## 🔗 Graph
- 부모: [[Embodied-AI]] · [[Foundation-Models]]
## 🤖 LLM 활용
**언제**: 매 multi-task robot manipulation, language-conditioned policy, cross-embodiment transfer 의 사용.
**언제 X**: 매 simple pick-and-place (overkill), 매 sub-50Hz needed (latency), 매 contact-rich precision tasks (still ongoing research).
## ❌ 안티패턴
- **매 raw pretrained π0 deploy**: 매 fine-tune 없이 — 매 robot/scene mismatch 의 fail.
- **매 ignore action chunking**: 매 single-step prediction → 매 jittery motion.
- **매 mismatched camera intrinsics**: 매 training cam 매 deploy cam 의 different → 매 OOD failure.
## 🧪 검증 / 중복
- Verified (Physical Intelligence official, π0 paper 2024-10, lerobot integration).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — π0/π0.5 VLA foundation model + lerobot patterns |