--- id: wiki-2026-0508-data-augmentation title: Data Augmentation Strategies category: 10_Wiki/Topics status: verified canonical_id: self aliases: [data augmentation, AutoAugment, RandAugment, MixUp, CutMix, back translation, Mosaic] duplicate_of: none source_trust_level: A confidence_score: 0.93 verification_status: applied tags: [data-augmentation, vision, nlp, audio, regularization, autoaugment, mixup, cutmix, generative-augment] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: torchvision / Albumentations / Augly / nlpaug / Diffusers --- # Data Augmentation ## 매 한 줄 > **"매 data 의 양 의 X — 매 모습 의 다양화"**. 매 invariance 의 학습 + 매 overfit 의 방지. 매 vision: rotation, flip, crop, MixUp, CutMix, AutoAugment. 매 NLP: back-translation, 매 LLM-aided. 매 modern: 매 generative augmentation (Stable Diffusion). ## 매 핵심 strategy ### Computer Vision - **Geometric**: rotation, flip, crop, scale. - **Color**: brightness, contrast, hue, saturation. - **Noise**: Gaussian, salt-pepper. - **MixUp**: 매 two image 의 linear combine. - **CutMix**: 매 patch swap. - **Cutout**: 매 random masking. - **AutoAugment / RandAugment**: 매 learned policy. - **TrivialAugment**: 매 random + 매 simple. - **Mosaic** (YOLOv5+): 매 4 image 의 grid. ### NLP - **Synonym replacement** (SR). - **Random insertion / deletion / swap** (EDA). - **Back-translation**: en → fr → en. - **Paraphrase**: 매 LLM 의 generate. - **Token noise**: 매 BERT-MLM-style. - **Mixup-NLP**: 매 hidden representation mix. ### Audio - **Speed / pitch shift**. - **SpecAugment**: 매 time + 매 frequency mask. - **Noise injection**. - **Reverb / EQ**. - **Mixup**. ### Tabular - **SMOTE**: 매 minority class 의 synthetic. - **Feature noise** (Gaussian). - **Mixup-tabular**. ### Modern (Generative) - **Diffusion-based augmentation**: Stable Diffusion 의 generate. - **GAN-based**. - **LLM-aided text**: 매 paraphrase / extend. - **Domain randomization** (sim → real). ### 매 task-specific - **Detection**: bbox-aware augment (Albumentations). - **Segmentation**: mask-aware. - **Pose**: keypoint-aware. - **OCR**: distortion + perspective. ### 매 modern best practice 1. **Strong but realistic**: 매 over-augmented X. 2. **Test-time augmentation** (TTA): 매 inference 의 multiple view. 3. **AutoML for augmentation**: 매 task-specific policy. 4. **Curriculum**: 매 weak → strong. 5. **Domain awareness**: 매 vertical / horizontal flip 의 task 에 따른. ## 💻 패턴 ### torchvision (vision) ```python import torch from torchvision import transforms train_transform = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ColorJitter(0.4, 0.4, 0.4), transforms.RandomRotation(15), transforms.RandAugment(num_ops=2, magnitude=9), # 매 modern default transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ]) ``` ### Albumentations (detection / segmentation) ```python import albumentations as A from albumentations.pytorch import ToTensorV2 transform = A.Compose([ A.RandomResizedCrop(224, 224, scale=(0.8, 1.0)), A.HorizontalFlip(p=0.5), A.OneOf([ A.GaussianBlur(), A.MotionBlur(), ], p=0.3), A.RandomBrightnessContrast(p=0.5), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2(), ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['class_labels'])) # 매 bbox + image 의 동시 transform. ``` ### MixUp (loss-level) ```python def mixup_data(x, y, alpha=0.2): lam = np.random.beta(alpha, alpha) idx = torch.randperm(x.size(0)) mixed_x = lam * x + (1 - lam) * x[idx] return mixed_x, y, y[idx], lam def mixup_loss(criterion, pred, y_a, y_b, lam): return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b) # 매 train loop for x, y in loader: mixed_x, y_a, y_b, lam = mixup_data(x, y) pred = model(mixed_x) loss = mixup_loss(F.cross_entropy, pred, y_a, y_b, lam) loss.backward() ``` ### CutMix ```python def cutmix(x, y, alpha=1.0): lam = np.random.beta(alpha, alpha) idx = torch.randperm(x.size(0)) H, W = x.size(2), x.size(3) cut_w = int(W * (1 - lam) ** 0.5) cut_h = int(H * (1 - lam) ** 0.5) cx, cy = np.random.randint(W), np.random.randint(H) x1, y1 = max(cx - cut_w // 2, 0), max(cy - cut_h // 2, 0) x2, y2 = min(cx + cut_w // 2, W), min(cy + cut_h // 2, H) x[:, :, y1:y2, x1:x2] = x[idx, :, y1:y2, x1:x2] lam = 1 - ((x2 - x1) * (y2 - y1) / (W * H)) return x, y, y[idx], lam ``` ### NLP — back-translation ```python from transformers import pipeline en_to_fr = pipeline('translation', model='Helsinki-NLP/opus-mt-en-fr') fr_to_en = pipeline('translation', model='Helsinki-NLP/opus-mt-fr-en') def back_translate(text): fr = en_to_fr(text, max_length=512)[0]['translation_text'] return fr_to_en(fr, max_length=512)[0]['translation_text'] # 매 paraphrase 효과 augmented = back_translate(original) ``` ### nlpaug (NLP utility) ```python import nlpaug.augmenter.word as naw # 매 contextual (BERT-based) synonym aug = naw.ContextualWordEmbsAug(model_path='bert-base-uncased', action='substitute') augmented = aug.augment('The quick brown fox jumps') ``` ### LLM-aided augmentation ```python def llm_paraphrase(text, n=5): prompt = f"""Paraphrase the following sentence in {n} different ways while preserving meaning: Original: {text} Output {n} paraphrases, each on a new line.""" return llm.generate(prompt).split('\n') ``` ### Audio (SpecAugment) ```python import torchaudio.transforms as T aug = torch.nn.Sequential( T.FrequencyMasking(freq_mask_param=30), T.TimeMasking(time_mask_param=80), ) mel_spec_augmented = aug(mel_spec) ``` ### SMOTE (tabular) ```python from imblearn.over_sampling import SMOTE smote = SMOTE(random_state=42) X_resampled, y_resampled = smote.fit_resample(X, y) ``` ### Diffusion-based augmentation ```python from diffusers import StableDiffusionImg2ImgPipeline pipe = StableDiffusionImg2ImgPipeline.from_pretrained('runwayml/stable-diffusion-v1-5').to('cuda') # 매 original image + 매 prompt 의 variation augmented = pipe( prompt='a {class_name} in different lighting / angle', image=original_image, strength=0.3, # 매 small change num_inference_steps=20, ).images[0] ``` ### Test-time augmentation (TTA) ```python def tta_predict(model, image, n=5): """매 매 prediction 의 augment + 매 average.""" augments = [normal_transform, flip_transform, crop1_transform, ...] preds = [model(aug(image)) for aug in augments[:n]] return torch.stack(preds).mean(dim=0) ``` ## 매 결정 기준 | 상황 | Strategy | |---|---| | Image classification | RandAugment + MixUp | | Detection | Albumentations + Mosaic | | Segmentation | Mask-aware augment | | NLP | Back-translation + LLM paraphrase | | Audio | SpecAugment | | Imbalanced tabular | SMOTE | | Long-tail vision | Class-balanced augment | | Generative augment | Diffusion (img2img) | **기본값**: RandAugment / TrivialAugment + MixUp/CutMix (vision). LLM paraphrase (NLP). ## 🔗 Graph - 부모: [[Data-Engineering]] · [[L1-and-L2-Regularization|Regularization]] - 변형: [[MixUp]] · [[CutMix]] · [[AutoAugment]] · [[Back-Translation]] · [[SMOTE]] - Adjacent: [[Bias-vs-Variance]] · [[Cross-Entropy Loss]] · [[CV_Synthesis]] · [[Antifragility]] ## 🤖 LLM 활용 **언제**: 매 ML training. 매 small dataset. 매 imbalanced. 매 robustness 필요. **언제 X**: 매 already strong model + abundant data. ## ❌ 안티패턴 - **Test set 의 augment**: 매 leakage. - **Over-augment** (training + test 의 distribute mismatch). - **Wrong domain augmentation** (e.g., flipping a "B" → "ⳝ" wrong text). - **No bbox-aware** (detection): 매 wrong label. - **MixUp 의 label 의 hard target 의 keep**: 매 wrong loss. - **Generative augment 의 OOD**: 매 noise. ## 🧪 검증 / 중복 - Verified (Cubuk AutoAugment, Zhang MixUp, DeVries CutOut, SpecAugment). - 신뢰도 A. - Related: [[Bias-vs-Variance]] · [[Cross-Entropy Loss]] · [[CV_Synthesis]] · [[Computer_Vision]] · [[Antifragility]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — strategy + 매 torchvision / Albumentations / MixUp / back-translate / TTA code |