Files
2nd/10_Wiki/Topics/Coding/Testing_Test_Data_Management.md
T
2026-05-10 22:08:15 +09:00

8.1 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
testing-test-data-management Test Data — fixture / factory / seed / clean Coding draft B conceptual 2026-05-09 2026-05-09
testing
data
vibe-coding
language applicable_to
TS / Python
Backend
QA
test data
fixture
factory
seed
faker
builder pattern
anonymization

Test Data Management

Test 의 가장 큰 문제 = data. Factory + faker (random) + seed (deterministic) + clean (after). PII 안 사용 + repeatable.

📖 핵심 개념

  • Fixture: 정적 data (file).
  • Factory: dynamic builder.
  • Seed: 초기 DB 상태.
  • Anonymized: prod data 의 PII 제거.

💻 코드 패턴

Fixture (정적)

// fixtures/users.json
[
  { "id": 1, "email": "alice@test.com", "role": "admin" },
  { "id": 2, "email": "bob@test.com", "role": "user" }
]

// 사용
import users from './fixtures/users.json';
beforeEach(() => db.users.insertAll(users));

→ 작은 / 변경 적음. 큰 = 관리 어려움.

Factory (dynamic)

import { faker } from '@faker-js/faker';

function userFactory(overrides: Partial<User> = {}): User {
  return {
    id: faker.string.uuid(),
    email: faker.internet.email(),
    name: faker.person.fullName(),
    age: faker.number.int({ min: 18, max: 80 }),
    createdAt: faker.date.past(),
    ...overrides,
  };
}

// 사용
const admin = userFactory({ role: 'admin' });
const users = Array.from({ length: 10 }, () => userFactory());

→ 매 test 가 fresh data.

Builder pattern

class UserBuilder {
  private user: Partial<User> = {};
  
  withEmail(e: string) { this.user.email = e; return this; }
  withRole(r: Role) { this.user.role = r; return this; }
  asAdmin() { this.user.role = 'admin'; return this; }
  
  build(): User {
    return { ...userFactory(), ...this.user };
  }
}

const admin = new UserBuilder().asAdmin().withEmail('a@x').build();

fishery (TS factory lib)

import { Factory } from 'fishery';

const userFactory = Factory.define<User>(({ sequence }) => ({
  id: sequence,
  email: `user${sequence}@test.com`,
  name: faker.person.fullName(),
}));

const u = userFactory.build({ name: 'Alice' });
const list = userFactory.buildList(10);

factory-bot (Python)

import factory

class UserFactory(factory.Factory):
    class Meta:
        model = User
    
    id = factory.Sequence(lambda n: n)
    email = factory.Sequence(lambda n: f'user{n}@test.com')
    role = 'user'

class AdminFactory(UserFactory):
    role = 'admin'

admin = AdminFactory()

Seed (DB)

// seed.ts
import { faker } from '@faker-js/faker';

async function seed() {
  // 100 user
  const users = Array.from({ length: 100 }, () => ({
    email: faker.internet.email(),
    name: faker.person.fullName(),
  }));
  await db.users.insertAll(users);
  
  // 1000 order
  const orders = Array.from({ length: 1000 }, () => ({
    userId: faker.helpers.arrayElement(users).id,
    amount: faker.number.int({ min: 10, max: 1000 }),
  }));
  await db.orders.insertAll(orders);
}

await seed();

→ Dev / staging 환경 초기화.

Faker faker faker

faker.person.fullName();           // 'Alice Johnson'
faker.internet.email();            // 'alice@example.com'
faker.location.city();             // 'New York'
faker.commerce.productName();      // 'Laptop'
faker.number.int({ min: 1, max: 100 });
faker.date.recent();
faker.lorem.paragraphs(3);
faker.image.url();
faker.string.uuid();

→ Locale 가능: faker.locale = 'ko'.

Determinism

faker.seed(42);
const u1 = faker.person.fullName();  // 항상 같은 결과

// Sequence
let _id = 0;
function nextId() { return _id++; }

→ Test 가 reproducible.

Reset (테스트 격리)

afterEach(async () => {
  await db.query('TRUNCATE users, orders CASCADE');
});

// 또는 transaction rollback
beforeEach(async () => {
  await db.query('BEGIN');
});
afterEach(async () => {
  await db.query('ROLLBACK');
});

→ 매 test 가 깨끗.

Snapshot DB

# Reset 빠른 방법
pg_dump test_db > snapshot.sql

# 매 test
dropdb test_db && createdb test_db && psql test_db < snapshot.sql

→ 큰 seed 가 매번 다시 X.

Testcontainers

import { PostgreSqlContainer } from '@testcontainers/postgresql';

beforeAll(async () => {
  pg = await new PostgreSqlContainer().start();
  await runMigrations(pg.getConnectionUri());
  await seed();
});

afterAll(() => pg.stop());

→ Docker container 가 test 시작/종료.

Production data anonymization

-- Prod → staging dump
UPDATE users SET
  email = 'user' || id || '@test.com',
  phone = '000-0000-0000',
  ssn = NULL,
  full_name = 'User ' || id;

DELETE FROM payment_methods;
DELETE FROM messages WHERE created_at < NOW() - INTERVAL '30 days';

→ PII / payment 제거.

Synthetic prod data

# Prod 의 분포 학습 → fake 생성
from sdv.tabular import GaussianCopula

model = GaussianCopula()
model.fit(prod_users_df)
fake_users = model.sample(10000)

AI_Synthetic_Data.

마스킹 vs 가짜 vs 합성

Masking: 기존 → blur (Alice → A*****)
Faker: 새 random
Synthetic: 분포 보존 + 새

→ 통계 분석 = synthetic.
일반 test = faker.

Time travel

// 시간 의존 test
import MockDate from 'mockdate';
MockDate.set('2026-05-09T00:00:00Z');

// ... test ...

afterEach(() => MockDate.reset());

UUID 의 함정

// ❌ 매 test 가 다른 UUID = snapshot 깨짐
const u = { id: faker.string.uuid() };
expect(u).toMatchSnapshot();

// ✅ Fixed
const u = { id: '00000000-0000-0000-0000-000000000001' };

Test data 공유 (shared)

beforeAll = 모든 test 공유.
beforeEach = 매 test fresh.

→ Read-only test = beforeAll OK.
Write test = beforeEach 필수 (격리).

Builder 의 약점

큰 entity = builder 길음:
new UserBuilder()
  .withName(...)
  .withEmail(...)
  .withRole(...)
  .withAddress(...)
  .build();

→ Default 가 좋고 + override 만 나은 경우 많음.

→ Factory + override ({...}) 가 ergonomic.

Test 의 data dependency

"이 user 가 있어야 X" → setup.
"이 order 가 있어야 Y" → setup + relation.
"이 user + order 합 = Z" → 복잡 setup.

→ Builder / factory 가 도움.

Idempotent seed

async function seedIdempotent() {
  const exists = await db.users.findOne({ email: 'admin@x.com' });
  if (!exists) {
    await db.users.insert({ email: 'admin@x.com', ... });
  }
}

→ 다시 실행 OK.

CI 의 data

CI 가 매번 새 DB / container.
- Migration 실행
- Seed 실행
- Test 실행

→ Hermetic.

Performance test data

큰 양 (100k user, 1M order):
- Bulk insert (COPY in postgres)
- Generate file → load
- 매 test 가 X

bulk_insert(users, 100_000)  # 빠름

함정: Test 가 prod data 가정

// ❌ "user_id 1 가 항상 admin" 가정
test('admin can delete', () => {
  ...
});

// ✅ 매 test 가 admin 명시 생성
test('admin can delete', () => {
  const admin = userFactory({ role: 'admin' });
  ...
});

Fixtures vs factory

Fixture:
- 작은 / 정적 / shared
- "전형적 user 5명"

Factory:
- Dynamic / parameter
- "임의 admin / banned user"

→ 둘 다 함께 가능.

🤔 의사결정 기준

작업 추천
단순 unit test Factory (fishery / factory-bot)
Integration test Factory + DB seed
E2E test Docker + 큰 seed
Big data Bulk insert + snapshot
시간 의존 MockDate
Snapshot test Fixed (no UUID)
Prod 가까움 Anonymized / synthetic

안티패턴

  • Test 가 prod data 가정: fragile.
  • 모든 test 가 1 fixture: 격리 X.
  • PII in test: 누출 가능.
  • Cleanup 없음: 다음 test 영향.
  • Faker 가 결정 X: snapshot 깨짐.
  • Fixture 거대 (10MB+): 관리 X.
  • Seed 가 idempotent X: 재실행 깨짐.

🤖 LLM 활용 힌트

  • Factory + faker = ergonomic.
  • 매 test 가 격리 (clean / transaction).
  • Prod-like = anonymized / synthetic.
  • Determinism (seed) = reproducible.

🔗 관련 문서