# Project Chronicle Guard: Search Engine Roadmap ## ๐ŸŽฏ Current Status: v2.74.0 - [x] **Phase 1: Linguistic Foundation Stabilization** (Completed) - [x] **Phase 2: Conflict Scoring Refinement** (Completed) - [ ] **Phase 3: Performance Scaling & Caching** (In Progress) - [x] **Phase 4: Excerpt Precision Tuning** (Completed) - [ ] **Phase 5: Downstream Integration API** (Planned) --- ## ๐Ÿ”ฌ Phase Details ### Phase 1: Linguistic Foundation (v2.72.0 - v2.74.0) - **Goal**: Perfect tokenization for mixed KO/EN/Special characters. - **Achievement**: - Bilingual boundary split (e.g., 'Astra์˜' -> 'Astra', '์˜'). - Hangeul monosyllable preservation (e.g., 'ํ•œ', '๊ธ€'). - Zero-width character cleaning. ### Phase 2: Conflict Scoring (v2.73.0 - v2.74.0) - **Goal**: Quantitative risk assessment for information conflicts. - **Achievement**: - Tiered severity logic (NONE, LOW, MEDIUM, HIGH). - Substring-based detection to overcome particle interference. - Configurable thresholds via `SCORING_CONFIG`. ### Phase 3: Performance Scaling (v2.75.0+) - **Goal**: Sub-10ms response for 10k+ documents. - **Action**: - Global module-level caching for IDF and tokens. - Potential worker thread offloading for heavy scoring. ### Phase 4: Excerpt Precision (Planned) - **Goal**: Maximize context signal-to-noise ratio. - **Action**: - Density-based window starting point restriction. - Multi-stage filtering for optimal text chunking. ### Phase 5: Integration (Planned) - **Goal**: Seamless RAG pipeline integration. - **Action**: - Strict IO schema definition for downstream AI agents.