Files
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.6 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-excessive-agency Excessive Agency (LLM) 10_Wiki/Topics verified self
excessive agency
OWASP LLM06
agent over-permission
tool abuse
autonomous risk
none A 0.92 applied
llm-security
owasp
excessive-agency
agent-safety
tool-use
permission
2026-05-10 pending
language framework
Python / TypeScript LangChain / LlamaIndex / Custom Agent

Excessive Agency

매 한 줄

"매 LLM agent 의 의 의 too 의 permission / function / autonomy". OWASP LLM Top 10 (LLM06). 매 manipulated → 매 destructive action. 매 mitigation: 매 least privilege + 매 human-in-the-loop + 매 sandboxed tools.

매 핵심

매 sub-type

  • Excessive functionality: 매 too many tools.
  • Excessive permission: 매 broad access.
  • Excessive autonomy: 매 no HITL.

매 attack scenario

  • Prompt injection → 매 tool 의 abuse.
  • Indirect (web page) → 매 fetched 의 inject.
  • Cross-tool: 매 read DB → 매 send email.
  • Agent escalation: 매 self-empower.

매 응용 risk

  1. Email agent: 매 send to anyone.
  2. DB agent: 매 DROP TABLE.
  3. Browser agent: 매 sensitive site visit.
  4. Code agent: 매 git push --force.
  5. Multi-agent: 매 manager prompt-inject worker.

매 mitigation

  • Least privilege.
  • Read-only by default.
  • HITL for destructive.
  • Sandbox / capability.
  • Rate limit.
  • Audit log.

💻 패턴

Tool whitelist (least privilege)

def safe_agent_tools(user_role):
    base = ['search', 'read_doc']
    if user_role == 'admin':
        base += ['write_doc', 'send_email']
    if user_role == 'super_admin':
        base += ['delete_doc']
    return base

agent = create_agent(tools=safe_agent_tools(current_user.role))

HITL approval

def execute_tool(tool, args):
    if tool.is_destructive:
        approval = request_human_approval(f"Approve {tool.name} with {args}?")
        if not approval: return {'status': 'denied'}
    return tool.run(**args)

Sandboxed file write

import os
SANDBOX = '/tmp/agent-sandbox'
os.makedirs(SANDBOX, exist_ok=True)

def safe_write(filename, content):
    full = os.path.realpath(os.path.join(SANDBOX, filename))
    if not full.startswith(SANDBOX):
        raise SecurityError('Outside sandbox')
    open(full, 'w').write(content)

Read-only DB role

CREATE ROLE agent_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO agent_readonly;
-- 매 NO insert / update / delete

Capability token

class Capability:
    def __init__(self, action, resource, expires_in=300):
        self.token = jwt.encode({'action': action, 'resource': resource, 'exp': time() + expires_in}, KEY)
    
    def use(self, action, resource):
        claims = jwt.decode(self.token, KEY)
        assert claims['action'] == action and claims['resource'] == resource
        return True

# 매 agent 의 cap 의 의 use, 매 broader 의 X

Rate limit + budget

class AgentBudget:
    def __init__(self, max_calls=50, max_cost_usd=1.0):
        self.calls = 0; self.cost = 0
        self.max_calls = max_calls; self.max_cost = max_cost_usd
    
    def check(self, estimated_cost):
        if self.calls >= self.max_calls: raise LimitError('Call limit')
        if self.cost + estimated_cost > self.max_cost: raise LimitError('Cost limit')
    
    def record(self, cost):
        self.calls += 1; self.cost += cost

Tool taint analysis

def detect_tainted_input(text, tool_args):
    """매 if user input flows to high-impact tool, escalate."""
    if any(t in str(tool_args) for t in TAINT_MARKERS):
        return require_approval(tool, tool_args)
    return False

Indirect injection check

def sanitize_external_content(html):
    """매 web fetched 의 instruction strip."""
    soup = BeautifulSoup(html, 'lxml')
    for tag in soup.find_all(['script', 'iframe']):
        tag.decompose()
    text = soup.get_text()
    # 매 prompt-inject pattern
    if re.search(r'(ignore (previous|all) instructions|new task|system:)', text, re.I):
        return text + "\n\n[NOTE: Suspicious content detected]"
    return text

Multi-agent isolation

class IsolatedAgent:
    def __init__(self, role, tools):
        self.role = role
        self.tools = tools  # 매 role-specific
    
    def receive(self, msg, sender):
        # 매 不 trust other agents' tool requests
        if sender != 'human' and msg.requests_tool_use:
            return 'Agent-to-agent tool requests not allowed'

Audit log

def audit_tool_use(tool, args, result, user):
    log({
        'timestamp': now(),
        'user': user,
        'agent_session': current_session.id,
        'tool': tool.name,
        'args': hash_sensitive(args),
        'result_status': result.status,
        'cost_usd': result.cost,
    })

Dry-run mode

def dry_run_tool(tool, args):
    """매 destructive action 의 simulate."""
    plan = tool.plan(**args)
    return f"DRY RUN: would {plan.summary}"

Reversibility check

def assess_reversibility(tool, args):
    if tool.action == 'delete' and not args.get('soft'): return 'irreversible'
    if tool.action == 'send_message': return 'visible_to_others'
    if tool.action == 'transfer_money': return 'irreversible'
    return 'safe'

# 매 irreversible → HITL

매 결정 기준

상황 Mitigation
Read-only research Whitelist + audit
Write actions HITL approval
Destructive Reversibility + HITL
Multi-agent Isolation
Public-facing Sandbox + budget
Sensitive data Capability token

기본값: 매 least privilege tools + 매 HITL for destructive + 매 sandbox + 매 audit + 매 budget cap.

🔗 Graph

🤖 LLM 활용

언제: 매 모든 agent. 매 tool-using LLM. 매 production deploy. 언제 X: 매 sandboxed dev only.

안티패턴

  • All-tools agent: 매 broad attack surface.
  • No HITL on destructive: 매 single mistake.
  • Implicit trust of fetched: 매 indirect injection.
  • No budget: 매 runaway cost.
  • No audit: 매 forensics 의 X.

🧪 검증 / 중복

  • Verified (OWASP LLM Top 10 2024, Anthropic agent safety).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — agency types + 매 whitelist / HITL / sandbox / budget code