Files
2nd/10_Wiki/Topics/AI_and_ML/AI Connect LLM Tool.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

11 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, inferred_by, tech_stack, applied_in
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit inferred_by tech_stack applied_in
wiki-2026-0508-ai-connect-llm-tool AI Connect LLM Tool (ConnectAI) 10_Wiki/Topics verified self
ConnectAI
Connect-AI-Lab
EZERAI
local AI coding agent
VS Code AI extension
none A 0.95 applied
vscode-extension
local-llm
ollama
lm-studio
ai-agent
privacy
second-brain
internal-tool
Datacollector_Export_Connect-AI-Lab
2026-05-09 pending Claude Opus 4.7 (manual cleanup 2026-05-09)
language framework
TypeScript VS Code Extension API / Ollama / LM Studio
Antigravity
ConnectAI

AI Connect LLM Tool (ConnectAI)

📌 한 줄 통찰 (The Karpathy Summary)

100% local + offline VS Code AI coding agent. Ollama / LM Studio 의 hardware 직접 사용 — 외부 server X. File edit + terminal + Second Brain (knowledge base) 통합. 기업 보안 / privacy 친화 의 internal tool.

📖 구조화된 지식 (Synthesized Content)

핵심 가치

  • 100% local: 매 LLM call 가 사용자 의 machine. Cloud API X.
  • Privacy-first: code / prompt 가 외부 X. 기업 / 의료 / 법적 case 의 답.
  • Hardware-aware: 매 사용자 의 GPU / RAM 의 best fit model.
  • VS Code native: extension API 의 deep 통합.
  • Second Brain: 매 codebase / wiki / personal note 의 RAG.

비교 (with cloud-based)

ConnectAI Cursor / Claude Code
Privacy 100% local Cloud API
Cost Hardware 만 $20-50 / month
Latency Local GPU 의존 Network
Quality Local model 의 한계 (Llama 8B-70B) Frontier (Opus, GPT-4)
Offline Yes No
Setup Ollama / LM Studio + GPU Pay + login
매 변경 Manual update Server-side (자동)

→ Privacy / cost / offline 가 critical = ConnectAI. Quality / 빠른 setup = Cursor / Claude Code.

Architecture

  1. VS Code Extension (TS): UI + sidebar + command.
  2. Local LLM Engine: Ollama 또는 LM Studio.
  3. Tool Registry: file_read / file_write / shell / search.
  4. Second Brain: 매 wiki / note 의 vector DB (local).
  5. Agent Loop: ReAct 식 (think → act → observe).

Local LLM 옵션

  • Ollama: 작은 / simple. CLI 친화. Mac M-series 강력.
  • LM Studio: GUI. 매 model 의 quantize / VRAM 측정.
  • vLLM (advanced): production. 큰 model + batching.
  • llama.cpp: 가장 simple. Mobile / embedded.

Model 선택 (hardware 따라)

RAM / VRAM 추천 model
8 GB Llama 3.2 3B (Q4)
16 GB Llama 3.1 8B (Q4) / Mistral 7B
24 GB Llama 3.1 8B (FP16) / Qwen 14B
32 GB DeepSeek Coder 33B (Q4)
48 GB Llama 3 70B (Q4)
96 GB+ Llama 3 70B (FP16) / DeepSeek V3

→ Mac M3 Max 96 GB 가 sweet (Llama 70B 가 fit).

Sidebar Chat UI

  • Streaming response (token 별).
  • File reference (@file).
  • Multi-turn conversation.
  • Code block 의 apply / insert.
  • Settings (model, temperature, system prompt).

Tool 목록

  • read_file(path): file content.
  • write_file(path, content): write / create.
  • edit_file(path, oldText, newText): precise diff.
  • run_command(cmd): terminal — 사용자 confirm.
  • search_codebase(query): ripgrep / regex.
  • query_brain(question): vector DB.

LM Studio 통합 (lifecycle)

  • Model 선택 → load (warm GPU).
  • Idle 5 min → unload (VRAM 회수).
  • 매 chat 시 → 자동 reload.

→ User 의 다른 work (game) 의 GPU 충돌 방지.

Second Brain (RAG)

  • Wiki / note 의 vector embed (local model).
  • 매 query 의 top-K retrieval.
  • LLM context 에 inject.
  • Privacy: 모든 거 local.

💻 코드 패턴 (Code Patterns)

Extension activation

// src/extension.ts
import * as vscode from 'vscode';

export function activate(context: vscode.ExtensionContext) {
  const provider = new SidebarChatProvider(context);
  
  context.subscriptions.push(
    vscode.window.registerWebviewViewProvider('connectai.sidebar', provider),
    vscode.commands.registerCommand('connectai.chat', () => provider.show()),
  );
}

LLM call (Ollama)

async function chat(prompt: string, model: string) {
  const r = await fetch('http://localhost:11434/api/chat', {
    method: 'POST',
    body: JSON.stringify({
      model,
      messages: [{ role: 'user', content: prompt }],
      stream: true,
    }),
  });
  
  const reader = r.body!.getReader();
  let buffer = '';
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    buffer += new TextDecoder().decode(value);
    
    let idx;
    while ((idx = buffer.indexOf('\n')) >= 0) {
      const line = buffer.slice(0, idx);
      buffer = buffer.slice(idx + 1);
      if (!line.trim()) continue;
      
      const chunk = JSON.parse(line);
      if (chunk.message?.content) yield chunk.message.content;
    }
  }
}

LM Studio 통합 (lifecycle manager)

import { LMStudioClient } from '@lmstudio/sdk';

class ModelLifecycleManager {
  private client = new LMStudioClient({ baseUrl: 'http://localhost:1234' });
  private currentModel?: string;
  private idleTimer?: NodeJS.Timeout;
  
  async onModelSelected(modelKey: string) {
    if (this.idleTimer) clearTimeout(this.idleTimer);
    if (this.currentModel === modelKey) return;
    
    if (this.currentModel) await this.client.llm.unload(this.currentModel);
    await this.client.llm.load(modelKey);
    this.currentModel = modelKey;
    
    this.scheduleIdleUnload();
  }
  
  onActivity() {
    if (this.idleTimer) {
      clearTimeout(this.idleTimer);
      this.scheduleIdleUnload();
    }
  }
  
  private scheduleIdleUnload() {
    const timeout = vscode.workspace.getConfiguration('connectai').get<number>('idleTimeoutMs', 300_000);
    if (timeout <= 0) return;
    
    this.idleTimer = setTimeout(async () => {
      if (this.currentModel) {
        await this.client.llm.unload(this.currentModel);
        this.currentModel = undefined;
      }
    }, timeout);
  }
}

Tool execution (file edit)

async function editFile(path: string, oldText: string, newText: string) {
  const uri = vscode.Uri.file(path);
  const doc = await vscode.workspace.openTextDocument(uri);
  
  const text = doc.getText();
  const idx = text.indexOf(oldText);
  if (idx === -1) throw new Error('oldText not found');
  
  const edit = new vscode.WorkspaceEdit();
  const start = doc.positionAt(idx);
  const end = doc.positionAt(idx + oldText.length);
  edit.replace(uri, new vscode.Range(start, end), newText);
  
  await vscode.workspace.applyEdit(edit);
}

Run command (with user confirmation)

async function runCommand(cmd: string): Promise<string> {
  // Always ask user first
  const ok = await vscode.window.showWarningMessage(
    `Run command: ${cmd}?`,
    { modal: true },
    'Yes', 'No'
  );
  
  if (ok !== 'Yes') return 'cancelled';
  
  const term = vscode.window.createTerminal('ConnectAI');
  term.show();
  term.sendText(cmd);
  // Wait + capture output (separate logic).
  return await waitForOutput(term);
}

Second Brain (RAG)

import { ChromaClient } from 'chromadb';
const chroma = new ChromaClient({ path: 'http://localhost:8000' });

async function queryBrain(question: string): Promise<string[]> {
  const collection = await chroma.getCollection({ name: 'wiki' });
  const emb = await embedLocal(question);   // Ollama embedding model
  
  const results = await collection.query({
    queryEmbeddings: [emb],
    nResults: 5,
  });
  
  return results.documents[0];
}

Configuration

// .vscode/settings.json
{
  "connectai.engine": "lmstudio",  // "ollama" | "lmstudio"
  "connectai.ollamaUrl": "http://localhost:11434",
  "connectai.lmStudioUrl": "http://localhost:1234",
  "connectai.defaultModel": "llama-3.1-8b-instruct",
  "connectai.lmStudio.idleTimeoutMs": 300000,
  "connectai.lmStudio.autoLoadOnSelect": true
}

🤔 의사결정 기준 (Decision Criteria)

상황 ConnectAI Cursor / Claude Code
Sensitive code (의료, 금융, 정부) ConnectAI
Quality 우선 (frontier model)
Offline 작업
매월 cost ↓ ($20+/month)
빠른 setup (model download)
Multi-file refactor 작은 model 의 한계
Air-gapped

기본값: Privacy / offline 가 hard requirement → ConnectAI. Productivity / quality 우선 → Cursor / Claude Code.

⚠️ 모순 및 업데이트 (Contradictions & Updates)

  • Quality gap: Local 70B 가 Cloud Opus 보다 약. 매 task 의 reality check.
  • Hardware cost: M3 Max + 96 GB = $4000+. ROI 가 매월 cloud subscription 와 비교 (1-2 year breakeven).
  • Architecture: 현재 monolithic (extension.ts heavy) → modular 권장. lmstudio module 의 분리 가 best practice.
  • run_command security: 매 user 의 confirmation 가 critical. 자동 실행 = system 위험.
  • Model lifecycle: 옛 = 매 chat 의 load (slow). 모던 = persistent + idle eject (LM Studio 통합).

🔗 지식 연결 (Graph)

🤖 LLM 활용 힌트 (How to Use This Knowledge)

언제 이 지식을 쓰는가:

  • 기업 의 internal AI tool 의 design.
  • Privacy-sensitive code 의 AI assist.
  • ConnectAI 의 새 기능 / refactor.
  • Local LLM 의 hardware sizing.
  • LM Studio / Ollama 통합.
  • Second Brain / RAG architecture.

언제 쓰면 안 되는가:

  • 매 dev 가 cloud OK + cost 가 OK = Cursor 가 더 좋음.
  • 매우 큰 codebase 의 mass refactor = 큰 model (Opus / GPT-4) 가 quality.
  • Quick prototype — setup overhead.
  • 사용자 의 hardware 가 부족 — 매 model 가 slow.

안티패턴 (Anti-Patterns)

  • run_command 자동 실행: 매 LLM 의 hallucination = rm -rf 위험.
  • Monolithic extension.ts: 매 feature 의 추가 시 maintainability ↓. Module 화.
  • No idle eject: VRAM 영구 점유 → 다른 work 의 GPU contention.
  • Cloud model 의 fallback 가 default: privacy 의 가치 X.
  • Embedding 가 cloud (OpenAI): privacy violation.
  • No tool whitelisting: shell + file 의 unrestricted = 사고.
  • Quality 의 cloud-comparison expectation: 매 user 의 gap 인지.

🧪 검증 상태 (Validation)

  • 정보 상태: verified (applied — Antigravity 프로젝트 의 active dev).
  • 출처 신뢰도: A (project's primary tool).
  • 검토 이유: Manual cleanup. 매 architecture 의 implementation detail 가 ConnectAI repo 와 sync.

🧬 중복 검사 (Duplicate Check)

🕓 변경 이력 (Changelog)

날짜 변경 내용 처리 방식 신뢰도
2026-05-08 P-Reinforce Phase 1 정규화 UPDATE A
2026-05-09 Manual cleanup — code pattern + lifecycle 통합 + 의사결정 기준 + 안티패턴 추가 UPDATE A