--- id: mlops-feature-store title: Feature Store β€” Feast / Tecton / online & offline category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [mlops, feature-store, vibe-coding] tech_stack: { language: "Python", applicable_to: ["AI", "Backend"] } applied_in: [] aliases: [feature store, Feast, Tecton, online store, offline store, feature reuse] --- # Feature Store > ML feature 의 central registry. **Train / serve consistency, low-latency online, time-correct offline**. Feast (open) / Tecton (managed). ## πŸ“– 핡심 κ°œλ… - Online store: λΉ λ₯Έ 쑰회 (Redis / DynamoDB). - Offline store: ν•™μŠ΅μš© (Parquet / Snowflake). - Time-travel: κ³Όκ±° μ‹œμ  feature. - Reuse: ν•œ 번 μ •μ˜, μ—¬λŸ¬ model. ## πŸ’» μ½”λ“œ νŒ¨ν„΄ ### Feast μ •μ˜ ```python # features.py from feast import Entity, Feature, FeatureView, ValueType from datetime import timedelta user = Entity(name='user_id', value_type=ValueType.INT64) user_features = FeatureView( name='user_features', entities=['user_id'], ttl=timedelta(days=1), features=[ Feature(name='age', dtype=ValueType.INT32), Feature(name='total_spent', dtype=ValueType.FLOAT), Feature(name='days_active', dtype=ValueType.INT32), ], source=parquet_source, ) ``` ### 등둝 ```bash feast apply # β†’ Online + offline schema 생성 ``` ### Materialize (offline β†’ online) ```bash feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") # β†’ μ΅œμ‹  feature β†’ online store (Redis) ``` β†’ Cron / Airflow κ°€ 맀일 μ‹€ν–‰. ### Online get (serving) ```python from feast import FeatureStore store = FeatureStore(repo_path='.') features = store.get_online_features( features=['user_features:age', 'user_features:total_spent'], entity_rows=[{'user_id': 123}], ).to_dict() # {'age': [25], 'total_spent': [100.5]} ``` β†’ Redis κ°€ backend = ms latency. ### Historical get (training) ```python import pandas as pd entity_df = pd.DataFrame({ 'user_id': [123, 456, 789], 'event_timestamp': [t1, t2, t3], }) train_df = store.get_historical_features( entity_df=entity_df, features=['user_features:age', 'user_features:total_spent'], ).to_df() ``` β†’ Time-correct: t1 μ‹œμ μ˜ user 123 feature. ### Train / serve consistency ```python # Train (offline) df = store.get_historical_features(...).to_df() model.fit(df) # Serve (online) features = store.get_online_features(...).to_dict() pred = model.predict([features]) # β†’ 같은 transformation, 같은 schema = 일관. ``` β†’ κ°€μž₯ 큰 κ°€μΉ˜. ### Time-travel join ``` Feature: user_total_spent (μ‹œκ°„ 따라 λ³€κ²½) Event: 2026-05-01 user 123 click β†’ get historical = "2026-05-01 μ‹œμ μ˜ user 123 spent" (κ·Έ ν›„ λ³€κ²½ X) ``` β†’ Data leakage λ°©μ§€. ### Tecton (managed) ```python @stream_feature_view( source=kafka_source, entities=[user], mode='spark_sql', aggregations=[ Aggregation(column='amount', function='sum', time_window=timedelta(days=1)), ], ) def user_daily_spend(events): return f"SELECT user_id, amount, ts FROM {events}" ``` β†’ Streaming + windowed aggregation 지원. ### Real-time aggregation ```python # Streaming feature @stream_feature_view( source=kafka, aggregations=[ Aggregation(column='clicks', function='count', time_window=timedelta(hours=1)), Aggregation(column='clicks', function='count', time_window=timedelta(days=1)), ], ) def user_clicks(events): ... ``` β†’ "μ§€λ‚œ 1μ‹œκ°„ click 수" κ°€ μžλ™ maintain. ### Composition ```python # Combine @feature_view(...) def user_combined(user_features, item_features): return user_features.join(item_features, on='user_id') ``` ### Feature versioning ```python @feature_view(version='v2') def user_features(...): ... # v1 + v2 λ™μ‹œ β€” model λ³„λ‘œ μ‚¬μš©. ``` ### Push (real-time) ```python # Event λ°œμƒ 직후 store.push('user_clicks', {'user_id': 123, 'clicks': 5, 'event_timestamp': now}) ``` β†’ Online store μ¦‰μ‹œ update. ### Drift (data validation) ```python # Great Expectations + Feast from feast.data_quality import expectation @feature_view(...) class UserFeatures: age = Feature( dtype=ValueType.INT32, expectations=[expect_column_values_to_be_between('age', 0, 120)], ) ``` ### Cost ``` Online: Redis / DynamoDB β€” pay per Read. Offline: Parquet on S3 β€” cheap. Tecton: managed β€” $$$, 큰 νŒ€. Feast: open β€” infra 직접. ``` ### Hopsworks (alternative) ``` - Free + open - Streaming + batch - Built-in model registry ``` ### Vertex AI Feature Store ```python from google.cloud import aiplatform_v1 client = aiplatform_v1.FeaturestoreOnlineServingServiceClient() response = client.read_feature_values( entity_type='projects/.../entityTypes/user', entity_id='123', feature_selector={'ids': ['age', 'total_spent']}, ) ``` ### SageMaker Feature Store ```python from sagemaker.feature_store.feature_group import FeatureGroup fg = FeatureGroup(name='user-features', sagemaker_session=session) fg.create(record_identifier_name='user_id', event_time_feature_name='ts', ...) # Online get client.get_record( FeatureGroupName='user-features', RecordIdentifierValueAsString='123', ) ``` ### Direct DB (no Feast) ```sql -- Materialized view κ°€ single source. CREATE MATERIALIZED VIEW user_features AS SELECT user_id, age, COUNT(orders) as order_count, SUM(amount) as total_spent FROM users LEFT JOIN orders USING (user_id) GROUP BY user_id; -- Train: SELECT * FROM user_features WHERE ts < ? -- Serve: SELECT * FROM user_features WHERE user_id = ? ``` β†’ μž‘μ€ ML system κ°€ μΆ©λΆ„. ### Feature κ°€ reused ``` 3 model κ°€ 같은 'user_total_spent' μ‚¬μš©. - μ •μ˜ 1번 - λ§€ model κ°€ reference β†’ λ³€κ²½ ν•œ κ³³, 전체 효과. ``` ### Naming convention ``` {entity}_{aggregation}_{time} user_clicks_1h user_avg_session_7d item_views_30d ``` ### Consistency checks ```python # Train data 와 prod data 의 뢄포 비ꡐ train_age = pd.read_parquet('train.parquet')['age'] prod_age = client.fetch_recent_features('age', n=10000) assert ks_2samp(train_age, prod_age).pvalue > 0.01 ``` ### When μ•ˆ ν•„μš” ``` - 1 model + 1 simple feature - POC / μž‘μ€ demo - Real-time stateless feature 만 (input β†’ pred) ``` ## πŸ€” μ˜μ‚¬κ²°μ • κΈ°μ€€ | 상황 | μΆ”μ²œ | |---|---| | μž‘μ€ / 1-2 model | Direct DB / materialized view | | Open / self-host | Feast | | Streaming + windowed | Tecton / Hopsworks | | GCP | Vertex AI | | AWS | SageMaker | | Minute-level real-time | Streaming (Tecton / Hopsworks) | | Daily batch | Feast + cron | ## ❌ μ•ˆν‹°νŒ¨ν„΄ - **Train / serve schema 닀름**: silent error. - **No time-travel**: data leakage. - **Online TTL μ—†μŒ**: stale. - **Materialize μ•ˆ 함**: latency 큰. - **Feature μ •μ˜ 흩어짐**: drift. - **Push + batch + λ‹€λ₯Έ logic**: μ˜λ„ X. - **Privacy λ¬΄μ‹œ**: PII κ°€ store 에. ## πŸ€– LLM ν™œμš© 힌트 - Feature store κ°€ train/serve consistency 의 λ‹΅. - Time-travel = data leakage λ°©μ§€. - μž‘μ€ system κ°€ materialized view μΆ©λΆ„. - Streaming + window κ°€ ν•„μš” μ‹œ Tecton. ## πŸ”— κ΄€λ ¨ λ¬Έμ„œ - [[MLOps_Model_Registry]] - [[Data_Eng_Streaming_ETL]] - [[DB_Time_Series_Patterns]]