Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

7.4 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Embodied AI

매 한 줄

"매 physical body 의 의 perceive + act + learn". 매 disembodied LLM 의 X — 매 manipulator + locomotion + navigation. 매 modern: 매 RT-2, OpenVLA, π0 — 매 VLM + action. 매 sim2real + diffusion policy.

매 핵심

매 task

Navigation: 매 ObjectNav, PointNav.
Manipulation: 매 pick-place, insertion.
Locomotion: 매 quadruped, humanoid.
Mobile manipulation: 매 fetch.
Long-horizon: 매 cook, clean.

매 modern method

Diffusion Policy (Chi 2023): 매 visual → action 의 diffusion.
VLA (RT-2, OpenVLA): 매 VLM + action token.
π0 (Physical Intelligence): 매 generalist robot foundation.
ACT (Aloha): 매 chunked transformer.

매 sim2real

Domain randomization: 매 light, texture, dynamics.
Real2sim2real: 매 real data + sim refine.
Co-training: 매 sim + real mix.

매 platform

NVIDIA Isaac Sim / Lab.
MuJoCo / DeepMind Control.
PyBullet.
Habitat (navigation).
RoboCasa (kitchen).

매 응용

Industrial: 매 assembly.
Logistics: 매 pick-pack.
Service: 매 cleaning.
Surgery: 매 da Vinci.
Domestic: 매 humanoid (1X, Figure, Optimus).

💻 패턴

Diffusion Policy (Chi 2023)

import torch
from torch import nn

class DiffusionPolicy(nn.Module):
    def __init__(self, obs_dim, action_dim, horizon=8, n_steps=100):
        super().__init__()
        self.horizon = horizon
        self.n_steps = n_steps
        self.cond_encoder = nn.Linear(obs_dim, 256)
        self.noise_pred = nn.Sequential(
            nn.Linear(action_dim * horizon + 256 + 1, 512),
            nn.ReLU(),
            nn.Linear(512, action_dim * horizon),
        )
    
    def predict(self, obs):
        cond = self.cond_encoder(obs)
        x = torch.randn(self.horizon * 2)
        for t in reversed(range(self.n_steps)):
            t_emb = torch.tensor([t / self.n_steps])
            noise = self.noise_pred(torch.cat([x, cond, t_emb]))
            x = x - 0.01 * noise
        return x.reshape(self.horizon, -1)

VLA (RT-2 / OpenVLA style)

class VLA(nn.Module):
    def __init__(self, vlm, action_dim=7, n_bins=256):
        super().__init__()
        self.vlm = vlm  # 매 PaLI-X / Llama-VL
        self.action_proj = nn.Linear(vlm.hidden_dim, n_bins * action_dim)
        self.n_bins = n_bins
    
    def forward(self, image, instruction):
        feat = self.vlm(image, instruction).last_hidden_state[:, -1]
        logits = self.action_proj(feat).reshape(-1, 7, self.n_bins)
        action_bins = logits.argmax(-1)
        return self.bin_to_action(action_bins)

Behavior cloning (basic IL)

def behavior_cloning(demos, model):
    """매 (obs, action) 의 supervised learning."""
    optim = torch.optim.AdamW(model.parameters(), lr=1e-4)
    for epoch in range(100):
        for obs, action in demos:
            pred = model(obs)
            loss = F.mse_loss(pred, action)
            optim.zero_grad()
            loss.backward()
            optim.step()
    return model

Sim2Real (domain randomization)

def randomize_env(env):
    env.gravity = np.random.uniform(9.5, 10.1)
    env.friction = np.random.uniform(0.5, 1.5)
    env.light_intensity = np.random.uniform(0.5, 1.5)
    env.texture = random.choice(textures)
    env.payload_mass = np.random.uniform(0, 0.5)
    return env

import habitat
config = habitat.get_config('benchmark/nav/objectnav_hm3d_v1.yaml')
env = habitat.Env(config)
obs = env.reset()
while not env.episode_over:
    action = policy(obs)
    obs = env.step(action)

MuJoCo manipulation

import mujoco
model = mujoco.MjModel.from_xml_path('panda.xml')
data = mujoco.MjData(model)
mujoco.mj_step(model, data)
ee_pos = data.site('end_effector').xpos

Reward shaping (manipulation)

def grasp_reward(state):
    distance = np.linalg.norm(state.gripper_pos - state.object_pos)
    in_grasp = state.gripper_holding_object
    lifted = state.object_pos[2] - state.object_init_z > 0.1
    
    return -distance + (5 if in_grasp else 0) + (10 if lifted else 0)

Curriculum learning

def curriculum(success_rate, level):
    if success_rate > 0.8: return level + 1
    if success_rate < 0.3: return max(0, level - 1)
    return level

# 매 level 0: easy (objects close, no obstacles)
# 매 level 1: clutter
# 매 level 2: distractors + dynamic

Real2Sim2Real (RoboCasa-style)

def real2sim(real_traj):
    # 매 real state 의 sim recreate
    sim_init = match_initial_state(real_traj[0])
    sim_traj = simulate(sim_init, real_traj.actions)
    return sim_traj

def sim_train_real_eval(sim_data, real_data):
    model = train_on(sim_data + real_data)
    return evaluate_real(model, real_data.eval)

Action chunking (ACT)

class ACT(nn.Module):
    """매 Aloha bimanual."""
    def __init__(self, chunk=100):
        super().__init__()
        self.chunk = chunk
        self.encoder = TransformerEncoder()
        self.decoder = TransformerDecoder()
    
    def forward(self, obs):
        feat = self.encoder(obs)
        actions = self.decoder(feat)  # 매 [chunk, action_dim]
        return actions
    
    def execute(self, obs):
        chunk = self.forward(obs)
        # 매 temporal ensembling
        return chunk[0]

Safety filter

def safe_action(proposed, state):
    if proposed.force > MAX_FORCE: proposed.force = MAX_FORCE
    if collision_imminent(proposed, state): return STOP_ACTION
    if outside_workspace(proposed, state): return CLAMP_TO_WORKSPACE(proposed)
    return proposed

매 결정 기준

상황	Approach
Visual policy	Diffusion Policy
Language-conditioned	VLA (OpenVLA / π0)
Multi-task	Foundation model
Long-horizon	Hierarchical + chunking
Sim-only	Domain randomization
Few demos	BC + augmentation
Generalist	π0 / RT-X

기본값: 매 modern = 매 VLA finetune (OpenVLA) + 매 diffusion policy + 매 sim2real domain randomization + 매 safety filter.

🔗 Graph

부모: AI · Robotics · Embodied Cognition
변형: VLA
Adjacent: Foundation-Model · CLIP · π0

🤖 LLM 활용

언제: 매 robot. 매 manipulation. 매 navigation. 매 multimodal physical. 언제 X: 매 pure simulation game. 매 disembodied chat.

❌ 안티패턴

No safety filter: 매 hardware 의 damage.
Sim-only no DR: 매 sim2real gap.
BC overfit demos: 매 OOD fail.
Tiny VLM 의 generalist 의 expect: 매 capacity 의 부족.
No chunking: 매 jitter / instability.

🧪 검증 / 중복

Verified (RT-2, OpenVLA, Diffusion Policy 2023, π0 2024).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-04-26	EMBODIED-AI auto
2026-05-08	Phase 1
2026-05-10	Manual cleanup — diffusion / VLA / BC / sim2real / curriculum / ACT code

7.4 KiB Raw Blame History