--- id: testing-test-data-management title: Test Data — fixture / factory / seed / clean category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [testing, data, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["Backend", "QA"] } applied_in: [] aliases: [test data, fixture, factory, seed, faker, builder pattern, anonymization] --- # Test Data Management > Test 의 가장 큰 문제 = data. **Factory + faker (random) + seed (deterministic) + clean (after)**. PII 안 사용 + repeatable. ## 📖 핵심 개념 - Fixture: 정적 data (file). - Factory: dynamic builder. - Seed: 초기 DB 상태. - Anonymized: prod data 의 PII 제거. ## 💻 코드 패턴 ### Fixture (정적) ```ts // fixtures/users.json [ { "id": 1, "email": "alice@test.com", "role": "admin" }, { "id": 2, "email": "bob@test.com", "role": "user" } ] // 사용 import users from './fixtures/users.json'; beforeEach(() => db.users.insertAll(users)); ``` → 작은 / 변경 적음. 큰 = 관리 어려움. ### Factory (dynamic) ```ts import { faker } from '@faker-js/faker'; function userFactory(overrides: Partial = {}): User { return { id: faker.string.uuid(), email: faker.internet.email(), name: faker.person.fullName(), age: faker.number.int({ min: 18, max: 80 }), createdAt: faker.date.past(), ...overrides, }; } // 사용 const admin = userFactory({ role: 'admin' }); const users = Array.from({ length: 10 }, () => userFactory()); ``` → 매 test 가 fresh data. ### Builder pattern ```ts class UserBuilder { private user: Partial = {}; withEmail(e: string) { this.user.email = e; return this; } withRole(r: Role) { this.user.role = r; return this; } asAdmin() { this.user.role = 'admin'; return this; } build(): User { return { ...userFactory(), ...this.user }; } } const admin = new UserBuilder().asAdmin().withEmail('a@x').build(); ``` ### fishery (TS factory lib) ```ts import { Factory } from 'fishery'; const userFactory = Factory.define(({ sequence }) => ({ id: sequence, email: `user${sequence}@test.com`, name: faker.person.fullName(), })); const u = userFactory.build({ name: 'Alice' }); const list = userFactory.buildList(10); ``` ### factory-bot (Python) ```python import factory class UserFactory(factory.Factory): class Meta: model = User id = factory.Sequence(lambda n: n) email = factory.Sequence(lambda n: f'user{n}@test.com') role = 'user' class AdminFactory(UserFactory): role = 'admin' admin = AdminFactory() ``` ### Seed (DB) ```ts // seed.ts import { faker } from '@faker-js/faker'; async function seed() { // 100 user const users = Array.from({ length: 100 }, () => ({ email: faker.internet.email(), name: faker.person.fullName(), })); await db.users.insertAll(users); // 1000 order const orders = Array.from({ length: 1000 }, () => ({ userId: faker.helpers.arrayElement(users).id, amount: faker.number.int({ min: 10, max: 1000 }), })); await db.orders.insertAll(orders); } await seed(); ``` → Dev / staging 환경 초기화. ### Faker faker faker ```ts faker.person.fullName(); // 'Alice Johnson' faker.internet.email(); // 'alice@example.com' faker.location.city(); // 'New York' faker.commerce.productName(); // 'Laptop' faker.number.int({ min: 1, max: 100 }); faker.date.recent(); faker.lorem.paragraphs(3); faker.image.url(); faker.string.uuid(); ``` → Locale 가능: `faker.locale = 'ko'`. ### Determinism ```ts faker.seed(42); const u1 = faker.person.fullName(); // 항상 같은 결과 // Sequence let _id = 0; function nextId() { return _id++; } ``` → Test 가 reproducible. ### Reset (테스트 격리) ```ts afterEach(async () => { await db.query('TRUNCATE users, orders CASCADE'); }); // 또는 transaction rollback beforeEach(async () => { await db.query('BEGIN'); }); afterEach(async () => { await db.query('ROLLBACK'); }); ``` → 매 test 가 깨끗. ### Snapshot DB ```bash # Reset 빠른 방법 pg_dump test_db > snapshot.sql # 매 test dropdb test_db && createdb test_db && psql test_db < snapshot.sql ``` → 큰 seed 가 매번 다시 X. ### Testcontainers ```ts import { PostgreSqlContainer } from '@testcontainers/postgresql'; beforeAll(async () => { pg = await new PostgreSqlContainer().start(); await runMigrations(pg.getConnectionUri()); await seed(); }); afterAll(() => pg.stop()); ``` → Docker container 가 test 시작/종료. ### Production data anonymization ```sql -- Prod → staging dump UPDATE users SET email = 'user' || id || '@test.com', phone = '000-0000-0000', ssn = NULL, full_name = 'User ' || id; DELETE FROM payment_methods; DELETE FROM messages WHERE created_at < NOW() - INTERVAL '30 days'; ``` → PII / payment 제거. ### Synthetic prod data ```python # Prod 의 분포 학습 → fake 생성 from sdv.tabular import GaussianCopula model = GaussianCopula() model.fit(prod_users_df) fake_users = model.sample(10000) ``` → [[AI_Synthetic_Data]]. ### 마스킹 vs 가짜 vs 합성 ``` Masking: 기존 → blur (Alice → A*****) Faker: 새 random Synthetic: 분포 보존 + 새 → 통계 분석 = synthetic. 일반 test = faker. ``` ### Time travel ```ts // 시간 의존 test import MockDate from 'mockdate'; MockDate.set('2026-05-09T00:00:00Z'); // ... test ... afterEach(() => MockDate.reset()); ``` ### UUID 의 함정 ```ts // ❌ 매 test 가 다른 UUID = snapshot 깨짐 const u = { id: faker.string.uuid() }; expect(u).toMatchSnapshot(); // ✅ Fixed const u = { id: '00000000-0000-0000-0000-000000000001' }; ``` ### Test data 공유 (shared) ``` beforeAll = 모든 test 공유. beforeEach = 매 test fresh. → Read-only test = beforeAll OK. Write test = beforeEach 필수 (격리). ``` ### Builder 의 약점 ``` 큰 entity = builder 길음: new UserBuilder() .withName(...) .withEmail(...) .withRole(...) .withAddress(...) .build(); → Default 가 좋고 + override 만 나은 경우 많음. ``` → Factory + override ({...}) 가 ergonomic. ### Test 의 data dependency ``` "이 user 가 있어야 X" → setup. "이 order 가 있어야 Y" → setup + relation. "이 user + order 합 = Z" → 복잡 setup. → Builder / factory 가 도움. ``` ### Idempotent seed ```ts async function seedIdempotent() { const exists = await db.users.findOne({ email: 'admin@x.com' }); if (!exists) { await db.users.insert({ email: 'admin@x.com', ... }); } } ``` → 다시 실행 OK. ### CI 의 data ``` CI 가 매번 새 DB / container. - Migration 실행 - Seed 실행 - Test 실행 → Hermetic. ``` ### Performance test data ``` 큰 양 (100k user, 1M order): - Bulk insert (COPY in postgres) - Generate file → load - 매 test 가 X bulk_insert(users, 100_000) # 빠름 ``` ### 함정: Test 가 prod data 가정 ```ts // ❌ "user_id 1 가 항상 admin" 가정 test('admin can delete', () => { ... }); // ✅ 매 test 가 admin 명시 생성 test('admin can delete', () => { const admin = userFactory({ role: 'admin' }); ... }); ``` ### Fixtures vs factory ``` Fixture: - 작은 / 정적 / shared - "전형적 user 5명" Factory: - Dynamic / parameter - "임의 admin / banned user" → 둘 다 함께 가능. ``` ## 🤔 의사결정 기준 | 작업 | 추천 | |---|---| | 단순 unit test | Factory (fishery / factory-bot) | | Integration test | Factory + DB seed | | E2E test | Docker + 큰 seed | | Big data | Bulk insert + snapshot | | 시간 의존 | MockDate | | Snapshot test | Fixed (no UUID) | | Prod 가까움 | Anonymized / synthetic | ## ❌ 안티패턴 - **Test 가 prod data 가정**: fragile. - **모든 test 가 1 fixture**: 격리 X. - **PII in test**: 누출 가능. - **Cleanup 없음**: 다음 test 영향. - **Faker 가 결정 X**: snapshot 깨짐. - **Fixture 거대 (10MB+)**: 관리 X. - **Seed 가 idempotent X**: 재실행 깨짐. ## 🤖 LLM 활용 힌트 - Factory + faker = ergonomic. - 매 test 가 격리 (clean / transaction). - Prod-like = anonymized / synthetic. - Determinism (seed) = reproducible. ## 🔗 관련 문서 - [[Testing_Faker_and_Builders]] - [[AI_Synthetic_Data]] - [[DB_Migration_Safety]]