--- id: wiki-2026-0508-ai-connect-llm-tool title: AI Connect LLM Tool (ConnectAI) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [ConnectAI, Connect-AI-Lab, EZERAI, local AI coding agent, VS Code AI extension] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [vscode-extension, local-llm, ollama, lm-studio, ai-agent, privacy, second-brain, internal-tool] raw_sources: [Datacollector_Export_Connect-AI-Lab] last_reinforced: 2026-05-09 github_commit: pending inferred_by: Claude Opus 4.7 (manual cleanup 2026-05-09) tech_stack: language: TypeScript framework: VS Code Extension API / Ollama / LM Studio applied_in: [Antigravity, ConnectAI] --- # AI Connect LLM Tool (ConnectAI) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > **100% local + offline VS Code AI coding agent**. Ollama / LM Studio 의 hardware 직접 μ‚¬μš© β€” μ™ΈλΆ€ server X. File edit + terminal + Second Brain (knowledge base) 톡합. κΈ°μ—… λ³΄μ•ˆ / privacy μΉœν™” 의 internal tool. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) ### 핡심 κ°€μΉ˜ - **100% local**: λ§€ LLM call κ°€ μ‚¬μš©μž 의 machine. Cloud API X. - **Privacy-first**: code / prompt κ°€ μ™ΈλΆ€ X. κΈ°μ—… / 의료 / 법적 case 의 λ‹΅. - **Hardware-aware**: λ§€ μ‚¬μš©μž 의 GPU / RAM 의 best fit model. - **VS Code native**: extension API 의 deep 톡합. - **Second Brain**: λ§€ codebase / wiki / personal note 의 RAG. ### 비ꡐ (with cloud-based) | | ConnectAI | Cursor / Claude Code | |---|---|---| | Privacy | 100% local | Cloud API | | Cost | Hardware 만 | $20-50 / month | | Latency | Local GPU 의쑴 | Network | | Quality | Local model 의 ν•œκ³„ (Llama 8B-70B) | Frontier (Opus, GPT-4) | | Offline | Yes | No | | Setup | Ollama / LM Studio + GPU | Pay + login | | λ§€ λ³€κ²½ | Manual update | Server-side (μžλ™) | β†’ Privacy / cost / offline κ°€ critical = ConnectAI. Quality / λΉ λ₯Έ setup = Cursor / Claude Code. ### Architecture 1. **VS Code Extension** (TS): UI + sidebar + command. 2. **Local LLM Engine**: Ollama λ˜λŠ” LM Studio. 3. **Tool Registry**: file_read / file_write / shell / search. 4. **Second Brain**: λ§€ wiki / note 의 vector DB (local). 5. **Agent Loop**: ReAct 식 (think β†’ act β†’ observe). ### Local LLM μ˜΅μ…˜ - **Ollama**: μž‘μ€ / simple. CLI μΉœν™”. Mac M-series κ°•λ ₯. - **LM Studio**: GUI. λ§€ model 의 quantize / VRAM μΈ‘μ •. - **vLLM (advanced)**: production. 큰 model + batching. - **llama.cpp**: κ°€μž₯ simple. Mobile / embedded. ### Model 선택 (hardware 따라) | RAM / VRAM | μΆ”μ²œ model | |---|---| | 8 GB | Llama 3.2 3B (Q4) | | 16 GB | Llama 3.1 8B (Q4) / Mistral 7B | | 24 GB | Llama 3.1 8B (FP16) / Qwen 14B | | 32 GB | DeepSeek Coder 33B (Q4) | | 48 GB | Llama 3 70B (Q4) | | 96 GB+ | Llama 3 70B (FP16) / DeepSeek V3 | β†’ Mac M3 Max 96 GB κ°€ sweet (Llama 70B κ°€ fit). ### Sidebar Chat UI - Streaming response (token 별). - File reference (@file). - Multi-turn conversation. - Code block 의 apply / insert. - Settings (model, temperature, system prompt). ### Tool λͺ©λ‘ - `read_file(path)`: file content. - `write_file(path, content)`: write / create. - `edit_file(path, oldText, newText)`: precise diff. - `run_command(cmd)`: terminal β€” μ‚¬μš©μž confirm. - `search_codebase(query)`: ripgrep / regex. - `query_brain(question)`: vector DB. ### LM Studio 톡합 (lifecycle) - Model 선택 β†’ load (warm GPU). - Idle 5 min β†’ unload (VRAM 회수). - λ§€ chat μ‹œ β†’ μžλ™ reload. β†’ User 의 λ‹€λ₯Έ work (game) 의 GPU 좩돌 λ°©μ§€. ### Second Brain (RAG) - Wiki / note 의 vector embed (local model). - λ§€ query 의 top-K retrieval. - LLM context 에 inject. - Privacy: λͺ¨λ“  κ±° local. ## πŸ’» μ½”λ“œ νŒ¨ν„΄ (Code Patterns) ### Extension activation ```ts // src/extension.ts import * as vscode from 'vscode'; export function activate(context: vscode.ExtensionContext) { const provider = new SidebarChatProvider(context); context.subscriptions.push( vscode.window.registerWebviewViewProvider('connectai.sidebar', provider), vscode.commands.registerCommand('connectai.chat', () => provider.show()), ); } ``` ### LLM call (Ollama) ```ts async function chat(prompt: string, model: string) { const r = await fetch('http://localhost:11434/api/chat', { method: 'POST', body: JSON.stringify({ model, messages: [{ role: 'user', content: prompt }], stream: true, }), }); const reader = r.body!.getReader(); let buffer = ''; while (true) { const { done, value } = await reader.read(); if (done) break; buffer += new TextDecoder().decode(value); let idx; while ((idx = buffer.indexOf('\n')) >= 0) { const line = buffer.slice(0, idx); buffer = buffer.slice(idx + 1); if (!line.trim()) continue; const chunk = JSON.parse(line); if (chunk.message?.content) yield chunk.message.content; } } } ``` ### LM Studio 톡합 (lifecycle manager) ```ts import { LMStudioClient } from '@lmstudio/sdk'; class ModelLifecycleManager { private client = new LMStudioClient({ baseUrl: 'http://localhost:1234' }); private currentModel?: string; private idleTimer?: NodeJS.Timeout; async onModelSelected(modelKey: string) { if (this.idleTimer) clearTimeout(this.idleTimer); if (this.currentModel === modelKey) return; if (this.currentModel) await this.client.llm.unload(this.currentModel); await this.client.llm.load(modelKey); this.currentModel = modelKey; this.scheduleIdleUnload(); } onActivity() { if (this.idleTimer) { clearTimeout(this.idleTimer); this.scheduleIdleUnload(); } } private scheduleIdleUnload() { const timeout = vscode.workspace.getConfiguration('connectai').get('idleTimeoutMs', 300_000); if (timeout <= 0) return; this.idleTimer = setTimeout(async () => { if (this.currentModel) { await this.client.llm.unload(this.currentModel); this.currentModel = undefined; } }, timeout); } } ``` ### Tool execution (file edit) ```ts async function editFile(path: string, oldText: string, newText: string) { const uri = vscode.Uri.file(path); const doc = await vscode.workspace.openTextDocument(uri); const text = doc.getText(); const idx = text.indexOf(oldText); if (idx === -1) throw new Error('oldText not found'); const edit = new vscode.WorkspaceEdit(); const start = doc.positionAt(idx); const end = doc.positionAt(idx + oldText.length); edit.replace(uri, new vscode.Range(start, end), newText); await vscode.workspace.applyEdit(edit); } ``` ### Run command (with user confirmation) ```ts async function runCommand(cmd: string): Promise { // Always ask user first const ok = await vscode.window.showWarningMessage( `Run command: ${cmd}?`, { modal: true }, 'Yes', 'No' ); if (ok !== 'Yes') return 'cancelled'; const term = vscode.window.createTerminal('ConnectAI'); term.show(); term.sendText(cmd); // Wait + capture output (separate logic). return await waitForOutput(term); } ``` ### Second Brain (RAG) ```ts import { ChromaClient } from 'chromadb'; const chroma = new ChromaClient({ path: 'http://localhost:8000' }); async function queryBrain(question: string): Promise { const collection = await chroma.getCollection({ name: 'wiki' }); const emb = await embedLocal(question); // Ollama embedding model const results = await collection.query({ queryEmbeddings: [emb], nResults: 5, }); return results.documents[0]; } ``` ### Configuration ```json // .vscode/settings.json { "connectai.engine": "lmstudio", // "ollama" | "lmstudio" "connectai.ollamaUrl": "http://localhost:11434", "connectai.lmStudioUrl": "http://localhost:1234", "connectai.defaultModel": "llama-3.1-8b-instruct", "connectai.lmStudio.idleTimeoutMs": 300000, "connectai.lmStudio.autoLoadOnSelect": true } ``` ## πŸ€” μ˜μ‚¬κ²°μ • κΈ°μ€€ (Decision Criteria) | 상황 | ConnectAI | Cursor / Claude Code | |---|---|---| | Sensitive code (의료, 금육, μ •λΆ€) | βœ… ConnectAI | ❌ | | Quality μš°μ„  (frontier model) | ❌ | βœ… | | Offline μž‘μ—… | βœ… | ❌ | | λ§€μ›” cost ↓ | βœ… | ❌ ($20+/month) | | λΉ λ₯Έ setup | ❌ (model download) | βœ… | | Multi-file refactor | μž‘μ€ model 의 ν•œκ³„ | βœ… | | Air-gapped | βœ… | ❌ | **κΈ°λ³Έκ°’**: Privacy / offline κ°€ hard requirement β†’ ConnectAI. Productivity / quality μš°μ„  β†’ Cursor / Claude Code. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & Updates) - **Quality gap**: Local 70B κ°€ Cloud Opus 보닀 μ•½. λ§€ task 의 reality check. - **Hardware cost**: M3 Max + 96 GB = $4000+. ROI κ°€ λ§€μ›” cloud subscription 와 비ꡐ (1-2 year breakeven). - **Architecture**: ν˜„μž¬ monolithic (extension.ts heavy) β†’ modular ꢌμž₯. lmstudio module 의 뢄리 κ°€ best practice. - **`run_command` security**: λ§€ user 의 confirmation κ°€ critical. μžλ™ μ‹€ν–‰ = system μœ„ν—˜. - **Model lifecycle**: μ˜› = λ§€ chat 의 load (slow). λͺ¨λ˜ = persistent + idle eject (LM Studio 톡합). ## πŸ”— 지식 μ—°κ²° (Graph) - κ΄€λ ¨ tool: [[Ollama]] Β· [[LM-Studio]] Β· [[LLM_Optimization_and_Deployment_Strategies|vLLM]] Β· [[llama.cpp]] - Cloud alternative: [[Claude-Code]] - 적용: [[Connect-AI-Lab]] ## πŸ€– LLM ν™œμš© 힌트 (How to Use This Knowledge) **μ–Έμ œ 이 지식을 μ“°λŠ”κ°€:** - κΈ°μ—… 의 internal AI tool 의 design. - Privacy-sensitive code 의 AI assist. - ConnectAI 의 μƒˆ κΈ°λŠ₯ / refactor. - Local LLM 의 hardware sizing. - LM Studio / Ollama 톡합. - Second Brain / RAG architecture. **μ–Έμ œ μ“°λ©΄ μ•ˆ λ˜λŠ”κ°€:** - λ§€ dev κ°€ cloud OK + cost κ°€ OK = Cursor κ°€ 더 μ’‹μŒ. - 맀우 큰 codebase 의 mass refactor = 큰 model (Opus / GPT-4) κ°€ quality. - Quick prototype β€” setup overhead. - μ‚¬μš©μž 의 hardware κ°€ λΆ€μ‘± β€” λ§€ model κ°€ slow. ## ❌ μ•ˆν‹°νŒ¨ν„΄ (Anti-Patterns) - **`run_command` μžλ™ μ‹€ν–‰**: λ§€ LLM 의 hallucination = `rm -rf` μœ„ν—˜. - **Monolithic extension.ts**: λ§€ feature 의 μΆ”κ°€ μ‹œ maintainability ↓. Module ν™”. - **No idle eject**: VRAM 영ꡬ 점유 β†’ λ‹€λ₯Έ work 의 GPU contention. - **Cloud model 의 fallback κ°€ default**: privacy 의 κ°€μΉ˜ X. - **Embedding κ°€ cloud (OpenAI)**: privacy violation. - **No tool whitelisting**: shell + file 의 unrestricted = 사고. - **Quality 의 cloud-comparison expectation**: λ§€ user 의 gap 인지. ## πŸ§ͺ 검증 μƒνƒœ (Validation) - **정보 μƒνƒœ:** verified (applied β€” Antigravity ν”„λ‘œμ νŠΈ 의 active dev). - **좜처 신뒰도:** A (project's primary tool). - **κ²€ν†  이유:** Manual cleanup. λ§€ architecture 의 implementation detail κ°€ ConnectAI repo 와 sync. ## 🧬 쀑볡 검사 (Duplicate Check) - **κΈ°μ‘΄ μœ μ‚¬ λ¬Έμ„œ:** [[Local-LLM-Inference]] (concept), [[VS-Code-Extension-Patterns]] (technical), [[Code Agent β€” Devin / Cursor / Claude Code]] (general). - **처리 방식:** KEEP (specific tool 의 documentation). - **처리 이유:** ConnectAI 의 own design / architecture κ°€ distinct. ## πŸ•“ λ³€κ²½ 이λ ₯ (Changelog) | λ‚ μ§œ | λ³€κ²½ λ‚΄μš© | 처리 방식 | 신뒰도 | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 μ •κ·œν™” | UPDATE | A | | 2026-05-09 | Manual cleanup β€” code pattern + lifecycle 톡합 + μ˜μ‚¬κ²°μ • κΈ°μ€€ + μ•ˆν‹°νŒ¨ν„΄ μΆ”κ°€ | UPDATE | A |