1.6 KiB
1.6 KiB
Project Chronicle Guard: Search Engine Roadmap
🎯 Current Status: v2.74.0
- Phase 1: Linguistic Foundation Stabilization (Completed)
- Phase 2: Conflict Scoring Refinement (Completed)
- Phase 3: Performance Scaling & Caching (In Progress)
- Phase 4: Excerpt Precision Tuning (Planned)
- Phase 5: Downstream Integration API (Planned)
🔬 Phase Details
Phase 1: Linguistic Foundation (v2.72.0 - v2.74.0)
- Goal: Perfect tokenization for mixed KO/EN/Special characters.
- Achievement:
- Bilingual boundary split (e.g., 'Astra의' -> 'Astra', '의').
- Hangeul monosyllable preservation (e.g., '한', '글').
- Zero-width character cleaning.
Phase 2: Conflict Scoring (v2.73.0 - v2.74.0)
- Goal: Quantitative risk assessment for information conflicts.
- Achievement:
- Tiered severity logic (NONE, LOW, MEDIUM, HIGH).
- Substring-based detection to overcome particle interference.
- Configurable thresholds via
SCORING_CONFIG.
Phase 3: Performance Scaling (v2.75.0+)
- Goal: Sub-10ms response for 10k+ documents.
- Action:
- Global module-level caching for IDF and tokens.
- Potential worker thread offloading for heavy scoring.
Phase 4: Excerpt Precision (Planned)
- Goal: Maximize context signal-to-noise ratio.
- Action:
- Density-based window starting point restriction.
- Multi-stage filtering for optimal text chunking.
Phase 5: Integration (Planned)
- Goal: Seamless RAG pipeline integration.
- Action:
- Strict IO schema definition for downstream AI agents.