[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -0,0 +1,373 @@
+---
+id: ai-browser-agent-patterns
+title: Browser Agent — Playwright / Puppeteer / browser-use
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [ai, agent, browser, vibe-coding]
+tech_stack: { language: "TS / Python", applicable_to: ["AI"] }
+applied_in: []
+aliases: [browser agent, web agent, Playwright agent, browser-use, Computer Use, accessibility tree]
+---
+
+# Browser Agent
+
+> LLM 가 browser 사용 — click, type, scroll. **Anthropic Computer Use, browser-use, Playwright + LLM**. Web automation 의 모던.
+
+## 📖 핵심 개념
+- Screenshot 또는 accessibility tree 가 input.
+- LLM 가 action 결정 (click x,y / type / scroll).
+- Loop until task done.
+- 신뢰성 / 비용 / 속도 trade-off.
+
+## 💻 코드 패턴
+
+### Playwright + LLM (간단)
+```ts
+import { chromium } from 'playwright';
+
+const browser = await chromium.launch();
+const page = await browser.newPage();
+await page.goto('https://example.com');
+
+// Screenshot → LLM
+const screenshot = await page.screenshot();
+const action = await llm.complete({
+  system: 'You are a browser agent. Output JSON: {action: click|type|scroll, ...}',
+  messages: [
+    { role: 'user', content: [
+      { type: 'image', source: { type: 'base64', data: screenshot.toString('base64') } },
+      { type: 'text', text: 'Search for "hello world"' },
+    ]},
+  ],
+});
+
+// Execute
+if (action.action === 'click') await page.mouse.click(action.x, action.y);
+if (action.action === 'type') await page.keyboard.type(action.text);
+```
+
+### Anthropic Computer Use
+```ts
+import Anthropic from '@anthropic-ai/sdk';
+
+const r = await client.messages.create({
+  model: 'claude-opus-4-7',
+  tools: [{
+    type: 'computer_20241022',
+    name: 'computer',
+    display_width_px: 1024,
+    display_height_px: 768,
+  }],
+  messages: [{
+    role: 'user',
+    content: [
+      { type: 'image', source: { ... } },
+      { type: 'text', text: 'Find the login button and click it' },
+    ],
+  }],
+});
+
+// r.content 가 tool_use → execute
+for (const c of r.content) {
+  if (c.type === 'tool_use' && c.name === 'computer') {
+    const { action, coordinate } = c.input;
+    if (action === 'left_click') await page.mouse.click(...coordinate);
+    // ...
+  }
+}
+```
+
+→ Claude 가 native browser tool.
+
+### browser-use (Python framework)
+```python
+from browser_use import Agent
+from langchain.chat_models import ChatOpenAI
+
+agent = Agent(
+    task='Find the cheapest flight from Seoul to Tokyo on Jun 1',
+    llm=ChatOpenAI(model='gpt-4o'),
+)
+result = await agent.run()
+```
+
+→ Library 가 loop / accessibility / 안정성 처리.
+
+### Accessibility tree (DOM 기반)
+```ts
+const snapshot = await page.accessibility.snapshot();
+// { name: 'Page', children: [{ role: 'button', name: 'Login' }, ...] }
+
+// LLM 에 ax tree 전달
+const action = await llm.complete({
+  prompt: `Tree: ${JSON.stringify(snapshot)}\nTask: ...`,
+});
+```
+
+→ Screenshot 보다 정확. Vision model 안 필요.
+
+### Element ID assignment
+```ts
+// 매 element 에 ID 추가 → LLM 가 ID 로 click.
+await page.evaluate(() => {
+  document.querySelectorAll('button, a, input').forEach((el, i) => {
+    el.setAttribute('data-agent-id', i);
+  });
+});
+
+// Screenshot + label 가 visible
+// LLM: "click element with id 5"
+await page.click('[data-agent-id="5"]');
+```
+
+→ Coordinate 가 brittle (resize). ID 가 stable.
+
+### Selector strategy
+```ts
+// LLM 가 CSS selector 생성
+const action = await llm.complete({
+  prompt: `Click the "Subscribe" button. Output: {selector}`,
+});
+
+await page.click(action.selector);
+// ❌ "button:nth-child(3)" — brittle
+// ✅ "button:has-text('Subscribe')" — semantic
+```
+
+→ Playwright 의 semantic selector 가 robust.
+
+### Loop until task done
+```ts
+for (let i = 0; i < 50; i++) {
+  const screenshot = await page.screenshot();
+  const action = await llm.complete({ ... });
+  
+  if (action.type === 'done') break;
+  
+  await execute(action);
+  await page.waitForLoadState('networkidle');
+}
+```
+
+→ Max iteration 제한 — infinite loop 방지.
+
+### Form filling
+```ts
+// LLM extract form fields
+const fields = await page.evaluate(() => 
+  [...document.querySelectorAll('input, select, textarea')].map(el => ({
+    selector: el.outerHTML,
+    type: el.type,
+  }))
+);
+
+const fills = await llm.complete({
+  prompt: `Fill form for "Alice, alice@x.com": ${JSON.stringify(fields)}`,
+});
+
+for (const fill of fills) {
+  await page.fill(fill.selector, fill.value);
+}
+```
+
+### Multi-step task
+```
+"Order pizza":
+1. Open URL
+2. Click "Sign in"
+3. Type email + password
+4. Navigate to menu
+5. Add pizza to cart
+6. Checkout
+7. Confirm
+
+→ 매 step 가 LLM call.
+```
+
+### Error handling
+```ts
+try {
+  await page.click(selector, { timeout: 5000 });
+} catch (e) {
+  // Element 가 없거나 안 visible
+  const screenshot = await page.screenshot();
+  const action = await llm.complete({
+    prompt: `Click failed: ${e.message}. Current screen: [image]. What to do?`,
+  });
+  // → Retry / scroll / different selector
+}
+```
+
+### Vision (multimodal)
+```ts
+// GPT-4V / Claude / Gemini 가 screenshot 본다.
+const r = await llm.complete({
+  messages: [
+    { role: 'user', content: [
+      { type: 'image', source: { type: 'base64', data: ss.toString('base64') } },
+      { type: 'text', text: 'Find the login button. Output coordinate.' },
+    ]},
+  ],
+});
+```
+
+→ Vision 가 큰 cost ↑.
+
+### 비용
+```
+1 task ≈ 10-100 LLM call.
+매 call = $0.01 - $0.10 (vision = 더).
+
+Task = $0.10 - $10.
+
+→ E-commerce automation 가능. 1 click 의 $.
+```
+
+### Speed
+```
+LLM call 1-5 sec.
+1 task = 30 sec - 5 min.
+
+→ Human 보다 X 빠름. 24/7 + 병렬.
+```
+
+### Use case
+```
+- Web scraping 의 새 (auth + dynamic UI)
+- E2E test 작성 (LLM 가 test 생성)
+- QA bot ("X feature broken?")
+- Form submission automation
+- Personal assistant (book ticket)
+- Research agent (visit 5 site, summarize)
+```
+
+### Browser-use 의 idea
+```
+- DOM tree 가 input
+- Element 가 numbered
+- LLM: "click 5"
+- Browser: id 5 의 element 가 무엇? → execute
+
+→ Coordinate brittleness 해결.
+```
+
+### Sandbox
+```ts
+// Untrusted user input → sandboxed browser
+const browser = await chromium.launch({
+  args: ['--no-sandbox', '--disable-setuid-sandbox'],
+});
+```
+
+→ Container / VM 가 안전.
+
+### Persistence
+```ts
+const context = await browser.newContext({
+  storageState: 'auth.json',  // 옛 cookie 사용
+});
+const page = await context.newPage();
+// → 로그인 상태 유지
+```
+
+→ 매 task 마다 login X.
+
+### Captcha 함정
+```
+- 자동 = bot detection.
+- Captcha 가 LLM 못 풀.
+- ToS 위반 가능 (scraping).
+
+→ User 가 manual intervene 옵션.
+또는 captcha solve service ($).
+```
+
+### Anti-detection
+```
+- Random delay
+- Real user-agent
+- Fingerprint randomize
+- Residential proxy
+
+→ ToS 위반 방향. 합법적 use case 만.
+```
+
+### Eval
+```python
+# Task suite (WebArena, VisualWebArena)
+tasks = load_dataset('webarena')
+success = 0
+for t in tasks:
+    result = agent.run(t)
+    if check(result, t.expected):
+        success += 1
+print(f'Success rate: {success / len(tasks):.1%}')
+```
+
+→ 2026 SoTA: 60-80% on standard task.
+
+### Limitations
+```
+- Captcha
+- 매우 동적 SPA (state)
+- Long task (10+ step)
+- Privacy / login
+- Cost (LLM call ↑)
+- 부정확 (hallucinate)
+```
+
+### Observability
+```ts
+// Action log
+log({
+  step: i,
+  action: action,
+  screenshot: ss,
+  url: page.url(),
+});
+
+// Replay later
+```
+
+→ Debug 친화.
+
+### Real production
+- **Devin** (Cognition): code agent 가 browser 도.
+- **Anthropic Computer Use**: native API.
+- **OpenAI Operator** (2025): browser agent product.
+- **Adept ACT-1**: web action.
+
+## 🤔 의사결정 기준
+| 작업 | 추천 |
+|---|---|
+| Simple scrape | Playwright (no LLM) |
+| Auth + dynamic | Browser agent |
+| QA / E2E | Test 생성 + run |
+| Research | Browser-use library |
+| Production | Computer Use API |
+| Cost-sensitive | Selector + tool (no vision) |
+| 고난도 | Vision + multi-step |
+
+## ❌ 안티패턴
+- **Coordinate 만 (no element ID)**: brittle.
+- **No max iteration**: infinite loop.
+- **Login 매번 새**: cost / detection.
+- **Captcha 없는 가정**: production 깨짐.
+- **No log**: debug 불가.
+- **ToS 무시**: 법적 risk.
+- **모든 task vision**: cost.
+
+## 🤖 LLM 활용 힌트
+- Anthropic Computer Use 가 native.
+- Browser-use 가 production framework.
+- Element ID > coordinate.
+- Accessibility tree > screenshot (cost).
+
+## 🔗 관련 문서
+- [[AI_Multi_Agent_Coordination]]
+- [[AI_Tool_Composition_Deep]]
+- [[Testing_Playwright_Advanced]]