--- id: VOICE-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, voice-assistant, stt, tts, nlp, audio-processing] last_reinforced: 2026-04-26 --- # Voice Assistant Architecture (μŒμ„± λΉ„μ„œ μ•„ν‚€ν…μ²˜) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μ†Œλ¦¬μ—μ„œ μ˜λ„λ₯Ό μΆ”μΆœν•˜κ³ , μ§€λŠ₯을 λ‹€μ‹œ μ†Œλ¦¬λ‘œ λΉšμ–΄λ‚΄λΌ" β€” μŒμ„± μ‹ ν˜Έλ₯Ό ν…μŠ€νŠΈλ‘œ λ³€ν™˜(STT), 의미 νŒŒμ•… 및 λ‹΅λ³€ 생성(NLU/LLM), 그리고 λ‹€μ‹œ μŒμ„±μœΌλ‘œ ν•©μ„±(TTS)ν•˜λŠ” 일련의 μ§€λŠ₯ν˜• νŒŒμ΄ν”„λΌμΈ. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** μ˜€λ””μ˜€ μŠ€νŠΈλ¦Όμ—μ„œ 핡심 정보λ₯Ό 순차적으둜 μ²˜λ¦¬ν•˜μ—¬ μžμ—°μŠ€λŸ¬μš΄ λŒ€ν™”ν˜• κ²½ν—˜μ„ μ œκ³΅ν•˜λŠ” λ©€ν‹°μŠ€ν…Œμ΄μ§€ μ‹ ν˜Έ 처리 νŒ¨ν„΄. - **핡심 ꡬ성 μš”μ†Œ:** - **Wake Word Detection:** "헀이 μ§€λ‹ˆ"와 같은 νŠΉμ • 단어λ₯Ό μ €μ „λ ₯으둜 μƒμ‹œ κ°μ‹œ. - **Automatic Speech Recognition (ASR/STT):** μ˜€λ””μ˜€ νŒŒν˜•μ„ ν…μŠ€νŠΈ 토큰 μ‹œν€€μŠ€λ‘œ λ³€ν™˜. - **Natural Language Understanding (NLU):** μ˜λ„(Intent)와 μ—”ν‹°ν‹°(Entity)λ₯Ό μΆ”μΆœ. ν˜„λŒ€ μ‹œμŠ€ν…œμ—μ„œλŠ” LLM이 이 역할을 톡합 μˆ˜ν–‰. - **Dialog Management:** λŒ€ν™”μ˜ λ§₯락을 μœ μ§€ν•˜κ³  λ‹€μŒ 행동 κ²°μ •. - **Text-to-Speech (TTS):** μƒμ„±λœ ν…μŠ€νŠΈλ₯Ό 감정과 톀이 μ‹€λ¦° μžμ—°μŠ€λŸ¬μš΄ μŒμ„±μœΌλ‘œ ν•©μ„±. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** 각 단계λ₯Ό 독립적인 λͺ¨λΈλ‘œ μ—°κ²°ν•˜λ˜ 방식(Cascaded)μ—μ„œ, μ΅œκ·Όμ—λŠ” μ†Œλ¦¬λ₯Ό λ“£κ³  λ°”λ‘œ μ΄ν•΄ν•˜μ—¬ λŒ€λ‹΅ν•˜λŠ” μ—”λ“œνˆ¬μ—”λ“œ(End-to-End) μ˜€λ””μ˜€ λͺ¨λΈλ‘œ λ°œμ „ 쀑. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” ν–₯ν›„ μŒμ„± μΈν„°νŽ˜μ΄μŠ€ 지원 μ‹œ, μ§€μ—° μ‹œκ°„μ„ 쀄이기 μœ„ν•΄ μ˜¨λ””λ°”μ΄μŠ€ STT와 μ„œλ²„κΈ‰ LLM을 κ²°ν•©ν•œ ν•˜μ΄λΈŒλ¦¬λ“œ μ•„ν‚€ν…μ²˜λ₯Ό 검토함. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Natural-Language-Processing]], [[LLM]], Signal-Processing, Agentic-Workflow - **Raw Source:** 10_Wiki/Topics/AI/Voice-Assistant-Architecture.md