Files
2nd/10_Wiki/Topics/AI_and_ML/Embodied-AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.4 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-embodied-ai Embodied AI 10_Wiki/Topics verified self
embodied AI
robot AI
VLA
vision-language-action
sim2real
robot foundation model
none A 0.96 applied
ai
robotics
embodied-ai
vla
foundation-model
sim2real
manipulation
2026-05-10 pending
language framework
Python ROS 2 / PyTorch / Isaac / MuJoCo

Embodied AI

매 한 줄

"매 physical body 의 의 perceive + act + learn". 매 disembodied LLM 의 X — 매 manipulator + locomotion + navigation. 매 modern: 매 RT-2, OpenVLA, π0 — 매 VLM + action. 매 sim2real + diffusion policy.

매 핵심

매 task

  • Navigation: 매 ObjectNav, PointNav.
  • Manipulation: 매 pick-place, insertion.
  • Locomotion: 매 quadruped, humanoid.
  • Mobile manipulation: 매 fetch.
  • Long-horizon: 매 cook, clean.

매 modern method

  • Diffusion Policy (Chi 2023): 매 visual → action 의 diffusion.
  • VLA (RT-2, OpenVLA): 매 VLM + action token.
  • π0 (Physical Intelligence): 매 generalist robot foundation.
  • ACT (Aloha): 매 chunked transformer.

매 sim2real

  • Domain randomization: 매 light, texture, dynamics.
  • Real2sim2real: 매 real data + sim refine.
  • Co-training: 매 sim + real mix.

매 platform

  • NVIDIA Isaac Sim / Lab.
  • MuJoCo / DeepMind Control.
  • PyBullet.
  • Habitat (navigation).
  • RoboCasa (kitchen).

매 응용

  1. Industrial: 매 assembly.
  2. Logistics: 매 pick-pack.
  3. Service: 매 cleaning.
  4. Surgery: 매 da Vinci.
  5. Domestic: 매 humanoid (1X, Figure, Optimus).

💻 패턴

Diffusion Policy (Chi 2023)

import torch
from torch import nn

class DiffusionPolicy(nn.Module):
    def __init__(self, obs_dim, action_dim, horizon=8, n_steps=100):
        super().__init__()
        self.horizon = horizon
        self.n_steps = n_steps
        self.cond_encoder = nn.Linear(obs_dim, 256)
        self.noise_pred = nn.Sequential(
            nn.Linear(action_dim * horizon + 256 + 1, 512),
            nn.ReLU(),
            nn.Linear(512, action_dim * horizon),
        )
    
    def predict(self, obs):
        cond = self.cond_encoder(obs)
        x = torch.randn(self.horizon * 2)
        for t in reversed(range(self.n_steps)):
            t_emb = torch.tensor([t / self.n_steps])
            noise = self.noise_pred(torch.cat([x, cond, t_emb]))
            x = x - 0.01 * noise
        return x.reshape(self.horizon, -1)

VLA (RT-2 / OpenVLA style)

class VLA(nn.Module):
    def __init__(self, vlm, action_dim=7, n_bins=256):
        super().__init__()
        self.vlm = vlm  # 매 PaLI-X / Llama-VL
        self.action_proj = nn.Linear(vlm.hidden_dim, n_bins * action_dim)
        self.n_bins = n_bins
    
    def forward(self, image, instruction):
        feat = self.vlm(image, instruction).last_hidden_state[:, -1]
        logits = self.action_proj(feat).reshape(-1, 7, self.n_bins)
        action_bins = logits.argmax(-1)
        return self.bin_to_action(action_bins)

Behavior cloning (basic IL)

def behavior_cloning(demos, model):
    """매 (obs, action) 의 supervised learning."""
    optim = torch.optim.AdamW(model.parameters(), lr=1e-4)
    for epoch in range(100):
        for obs, action in demos:
            pred = model(obs)
            loss = F.mse_loss(pred, action)
            optim.zero_grad()
            loss.backward()
            optim.step()
    return model

Sim2Real (domain randomization)

def randomize_env(env):
    env.gravity = np.random.uniform(9.5, 10.1)
    env.friction = np.random.uniform(0.5, 1.5)
    env.light_intensity = np.random.uniform(0.5, 1.5)
    env.texture = random.choice(textures)
    env.payload_mass = np.random.uniform(0, 0.5)
    return env

Habitat navigation

import habitat
config = habitat.get_config('benchmark/nav/objectnav_hm3d_v1.yaml')
env = habitat.Env(config)
obs = env.reset()
while not env.episode_over:
    action = policy(obs)
    obs = env.step(action)

MuJoCo manipulation

import mujoco
model = mujoco.MjModel.from_xml_path('panda.xml')
data = mujoco.MjData(model)
mujoco.mj_step(model, data)
ee_pos = data.site('end_effector').xpos

Reward shaping (manipulation)

def grasp_reward(state):
    distance = np.linalg.norm(state.gripper_pos - state.object_pos)
    in_grasp = state.gripper_holding_object
    lifted = state.object_pos[2] - state.object_init_z > 0.1
    
    return -distance + (5 if in_grasp else 0) + (10 if lifted else 0)

Curriculum learning

def curriculum(success_rate, level):
    if success_rate > 0.8: return level + 1
    if success_rate < 0.3: return max(0, level - 1)
    return level

# 매 level 0: easy (objects close, no obstacles)
# 매 level 1: clutter
# 매 level 2: distractors + dynamic

Real2Sim2Real (RoboCasa-style)

def real2sim(real_traj):
    # 매 real state 의 sim recreate
    sim_init = match_initial_state(real_traj[0])
    sim_traj = simulate(sim_init, real_traj.actions)
    return sim_traj

def sim_train_real_eval(sim_data, real_data):
    model = train_on(sim_data + real_data)
    return evaluate_real(model, real_data.eval)

Action chunking (ACT)

class ACT(nn.Module):
    """매 Aloha bimanual."""
    def __init__(self, chunk=100):
        super().__init__()
        self.chunk = chunk
        self.encoder = TransformerEncoder()
        self.decoder = TransformerDecoder()
    
    def forward(self, obs):
        feat = self.encoder(obs)
        actions = self.decoder(feat)  # 매 [chunk, action_dim]
        return actions
    
    def execute(self, obs):
        chunk = self.forward(obs)
        # 매 temporal ensembling
        return chunk[0]

Safety filter

def safe_action(proposed, state):
    if proposed.force > MAX_FORCE: proposed.force = MAX_FORCE
    if collision_imminent(proposed, state): return STOP_ACTION
    if outside_workspace(proposed, state): return CLAMP_TO_WORKSPACE(proposed)
    return proposed

매 결정 기준

상황 Approach
Visual policy Diffusion Policy
Language-conditioned VLA (OpenVLA / π0)
Multi-task Foundation model
Long-horizon Hierarchical + chunking
Sim-only Domain randomization
Few demos BC + augmentation
Generalist π0 / RT-X

기본값: 매 modern = 매 VLA finetune (OpenVLA) + 매 diffusion policy + 매 sim2real domain randomization + 매 safety filter.

🔗 Graph

🤖 LLM 활용

언제: 매 robot. 매 manipulation. 매 navigation. 매 multimodal physical. 언제 X: 매 pure simulation game. 매 disembodied chat.

안티패턴

  • No safety filter: 매 hardware 의 damage.
  • Sim-only no DR: 매 sim2real gap.
  • BC overfit demos: 매 OOD fail.
  • Tiny VLM 의 generalist 의 expect: 매 capacity 의 부족.
  • No chunking: 매 jitter / instability.

🧪 검증 / 중복

  • Verified (RT-2, OpenVLA, Diffusion Policy 2023, π0 2024).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-26 EMBODIED-AI auto
2026-05-08 Phase 1
2026-05-10 Manual cleanup — diffusion / VLA / BC / sim2real / curriculum / ACT code