--- id: wiki-2026-0508-embodied-ai title: Embodied AI category: 10_Wiki/Topics status: verified canonical_id: self aliases: [embodied AI, robot AI, VLA, vision-language-action, sim2real, robot foundation model] duplicate_of: none source_trust_level: A confidence_score: 0.96 verification_status: applied tags: [ai, robotics, embodied-ai, vla, foundation-model, sim2real, manipulation] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: ROS 2 / PyTorch / Isaac / MuJoCo --- # Embodied AI ## 매 한 줄 > **"매 physical body 의 의 perceive + act + learn"**. 매 disembodied LLM 의 X — 매 manipulator + locomotion + navigation. 매 modern: 매 RT-2, OpenVLA, π0 — 매 VLM + action. 매 sim2real + diffusion policy. ## 매 핵심 ### 매 task - **Navigation**: 매 ObjectNav, PointNav. - **Manipulation**: 매 pick-place, insertion. - **Locomotion**: 매 quadruped, humanoid. - **Mobile manipulation**: 매 fetch. - **Long-horizon**: 매 cook, clean. ### 매 modern method - **Diffusion Policy** (Chi 2023): 매 visual → action 의 diffusion. - **VLA** (RT-2, OpenVLA): 매 VLM + action token. - **π0** (Physical Intelligence): 매 generalist robot foundation. - **ACT** (Aloha): 매 chunked transformer. ### 매 sim2real - **Domain randomization**: 매 light, texture, dynamics. - **Real2sim2real**: 매 real data + sim refine. - **Co-training**: 매 sim + real mix. ### 매 platform - **NVIDIA Isaac Sim / Lab**. - **MuJoCo / DeepMind Control**. - **PyBullet**. - **Habitat** (navigation). - **RoboCasa** (kitchen). ### 매 응용 1. **Industrial**: 매 assembly. 2. **Logistics**: 매 pick-pack. 3. **Service**: 매 cleaning. 4. **Surgery**: 매 da Vinci. 5. **Domestic**: 매 humanoid (1X, Figure, Optimus). ## 💻 패턴 ### Diffusion Policy (Chi 2023) ```python import torch from torch import nn class DiffusionPolicy(nn.Module): def __init__(self, obs_dim, action_dim, horizon=8, n_steps=100): super().__init__() self.horizon = horizon self.n_steps = n_steps self.cond_encoder = nn.Linear(obs_dim, 256) self.noise_pred = nn.Sequential( nn.Linear(action_dim * horizon + 256 + 1, 512), nn.ReLU(), nn.Linear(512, action_dim * horizon), ) def predict(self, obs): cond = self.cond_encoder(obs) x = torch.randn(self.horizon * 2) for t in reversed(range(self.n_steps)): t_emb = torch.tensor([t / self.n_steps]) noise = self.noise_pred(torch.cat([x, cond, t_emb])) x = x - 0.01 * noise return x.reshape(self.horizon, -1) ``` ### VLA (RT-2 / OpenVLA style) ```python class VLA(nn.Module): def __init__(self, vlm, action_dim=7, n_bins=256): super().__init__() self.vlm = vlm # 매 PaLI-X / Llama-VL self.action_proj = nn.Linear(vlm.hidden_dim, n_bins * action_dim) self.n_bins = n_bins def forward(self, image, instruction): feat = self.vlm(image, instruction).last_hidden_state[:, -1] logits = self.action_proj(feat).reshape(-1, 7, self.n_bins) action_bins = logits.argmax(-1) return self.bin_to_action(action_bins) ``` ### Behavior cloning (basic IL) ```python def behavior_cloning(demos, model): """매 (obs, action) 의 supervised learning.""" optim = torch.optim.AdamW(model.parameters(), lr=1e-4) for epoch in range(100): for obs, action in demos: pred = model(obs) loss = F.mse_loss(pred, action) optim.zero_grad() loss.backward() optim.step() return model ``` ### Sim2Real (domain randomization) ```python def randomize_env(env): env.gravity = np.random.uniform(9.5, 10.1) env.friction = np.random.uniform(0.5, 1.5) env.light_intensity = np.random.uniform(0.5, 1.5) env.texture = random.choice(textures) env.payload_mass = np.random.uniform(0, 0.5) return env ``` ### Habitat navigation ```python import habitat config = habitat.get_config('benchmark/nav/objectnav_hm3d_v1.yaml') env = habitat.Env(config) obs = env.reset() while not env.episode_over: action = policy(obs) obs = env.step(action) ``` ### MuJoCo manipulation ```python import mujoco model = mujoco.MjModel.from_xml_path('panda.xml') data = mujoco.MjData(model) mujoco.mj_step(model, data) ee_pos = data.site('end_effector').xpos ``` ### Reward shaping (manipulation) ```python def grasp_reward(state): distance = np.linalg.norm(state.gripper_pos - state.object_pos) in_grasp = state.gripper_holding_object lifted = state.object_pos[2] - state.object_init_z > 0.1 return -distance + (5 if in_grasp else 0) + (10 if lifted else 0) ``` ### Curriculum learning ```python def curriculum(success_rate, level): if success_rate > 0.8: return level + 1 if success_rate < 0.3: return max(0, level - 1) return level # 매 level 0: easy (objects close, no obstacles) # 매 level 1: clutter # 매 level 2: distractors + dynamic ``` ### Real2Sim2Real (RoboCasa-style) ```python def real2sim(real_traj): # 매 real state 의 sim recreate sim_init = match_initial_state(real_traj[0]) sim_traj = simulate(sim_init, real_traj.actions) return sim_traj def sim_train_real_eval(sim_data, real_data): model = train_on(sim_data + real_data) return evaluate_real(model, real_data.eval) ``` ### Action chunking (ACT) ```python class ACT(nn.Module): """매 Aloha bimanual.""" def __init__(self, chunk=100): super().__init__() self.chunk = chunk self.encoder = TransformerEncoder() self.decoder = TransformerDecoder() def forward(self, obs): feat = self.encoder(obs) actions = self.decoder(feat) # 매 [chunk, action_dim] return actions def execute(self, obs): chunk = self.forward(obs) # 매 temporal ensembling return chunk[0] ``` ### Safety filter ```python def safe_action(proposed, state): if proposed.force > MAX_FORCE: proposed.force = MAX_FORCE if collision_imminent(proposed, state): return STOP_ACTION if outside_workspace(proposed, state): return CLAMP_TO_WORKSPACE(proposed) return proposed ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Visual policy | Diffusion Policy | | Language-conditioned | VLA (OpenVLA / π0) | | Multi-task | Foundation model | | Long-horizon | Hierarchical + chunking | | Sim-only | Domain randomization | | Few demos | BC + augmentation | | Generalist | π0 / RT-X | **기본값**: 매 modern = 매 VLA finetune (OpenVLA) + 매 diffusion policy + 매 sim2real domain randomization + 매 safety filter. ## 🔗 Graph - 부모: [[AI]] · [[Robotics]] · [[Embodied Cognition]] - 변형: [[VLA]] - Adjacent: [[Foundation-Model]] · [[CLIP]] · [[π0]] ## 🤖 LLM 활용 **언제**: 매 robot. 매 manipulation. 매 navigation. 매 multimodal physical. **언제 X**: 매 pure simulation game. 매 disembodied chat. ## ❌ 안티패턴 - **No safety filter**: 매 hardware 의 damage. - **Sim-only no DR**: 매 sim2real gap. - **BC overfit demos**: 매 OOD fail. - **Tiny VLM 의 generalist 의 expect**: 매 capacity 의 부족. - **No chunking**: 매 jitter / instability. ## 🧪 검증 / 중복 - Verified (RT-2, OpenVLA, Diffusion Policy 2023, π0 2024). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-04-26 | EMBODIED-AI auto | | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — diffusion / VLA / BC / sim2real / curriculum / ACT code |