--- id: wiki-2026-0508-point-cloud-processing title: Point Cloud Processing category: 10_Wiki/Topics status: verified canonical_id: self aliases: [point-cloud, 3d-deep-learning, lidar-processing] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [3d, point-cloud, pointnet, lidar, sparse-conv] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: PyTorch / Open3D --- # Point Cloud Processing ## 매 한 줄 > **"매 unordered 3D point set 의 deep learning"**. 매 PointNet (Qi 2017) 매 permutation-invariant 의 first NN. 매 PointNet++ → sparse conv (MinkowskiEngine, Spconv) → transformer (PTv3) → 매 modern 3D foundation models. 매 LiDAR autonomous driving + 매 robotics + 매 AR/VR 의 core. ## 매 핵심 ### 매 challenges - **매 unordered**: 매 N points 의 N! permutations — 매 symmetric function 의 필요. - **매 sparse + irregular**: 매 voxel grid 매 mostly empty. - **매 scale variance**: 매 LiDAR (~100K points) vs CAD (~1K). - **매 no canonical orientation**: 매 SE(3) equivariance 의 desired. ### 매 lineage - **매 PointNet (2017)**: per-point MLP + max-pool. 매 permutation invariant. 매 no local structure. - **매 PointNet++ (2017)**: hierarchical sampling + grouping (FPS + ball query). - **매 voxel + sparse conv** (2018-2020): MinkowskiEngine, Spconv — 매 only non-empty voxels. - **매 graph methods**: DGCNN, KPConv (kernel point conv). - **매 transformer**: Point Transformer (v1/v2/v3, 2024) — 매 SOTA on ScanNet. - **매 3D foundation (2024-2025)**: Sonata, PointTransformerV3, Uni3D. ### 매 tasks 1. **매 classification**: ModelNet40 (CAD), ScanObjectNN. 2. **매 part segmentation**: ShapeNet-Part. 3. **매 semantic segmentation**: ScanNet, S3DIS, SemanticKITTI (LiDAR). 4. **매 detection**: KITTI, nuScenes, Waymo (3D bbox). 5. **매 registration**: ICP, DGR, GeoTransformer. 6. **매 reconstruction**: NeRF, Gaussian Splatting. ### 매 응용 1. 매 autonomous driving (LiDAR perception). 2. 매 robotics manipulation (depth → grasp). 3. 매 AR/VR (scene understanding). 4. 매 BIM / construction (as-built scan). ## 💻 패턴 ### Open3D — load + visualize ```python import open3d as o3d, numpy as np pcd = o3d.io.read_point_cloud("scan.ply") print(pcd) # PointCloud with 1234567 points pcd = pcd.voxel_down_sample(voxel_size=0.05) pcd, _ = pcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0) pcd.estimate_normals() o3d.visualization.draw_geometries([pcd]) ``` ### PointNet (minimal) ```python import torch.nn as nn, torch class PointNetCls(nn.Module): def __init__(self, num_classes=40): super().__init__() self.mlp1 = nn.Sequential(nn.Conv1d(3, 64, 1), nn.BatchNorm1d(64), nn.ReLU()) self.mlp2 = nn.Sequential(nn.Conv1d(64, 128, 1), nn.BatchNorm1d(128), nn.ReLU()) self.mlp3 = nn.Sequential(nn.Conv1d(128, 1024, 1), nn.BatchNorm1d(1024), nn.ReLU()) self.head = nn.Sequential(nn.Linear(1024, 256), nn.ReLU(), nn.Linear(256, num_classes)) def forward(self, x): # (B, 3, N) x = self.mlp3(self.mlp2(self.mlp1(x))) x = x.max(dim=2)[0] # 매 permutation-invariant max pool return self.head(x) ``` ### Farthest Point Sampling (FPS) — PointNet++ ```python def fps(xyz, npoint): # xyz: (B, N, 3) B, N, _ = xyz.shape centroids = torch.zeros(B, npoint, dtype=torch.long, device=xyz.device) distance = torch.full((B, N), 1e10, device=xyz.device) farthest = torch.randint(0, N, (B,), device=xyz.device) batch_idx = torch.arange(B, device=xyz.device) for i in range(npoint): centroids[:, i] = farthest centroid = xyz[batch_idx, farthest, :].unsqueeze(1) dist = ((xyz - centroid) ** 2).sum(-1) distance = torch.minimum(distance, dist) farthest = distance.argmax(-1) return centroids ``` ### MinkowskiEngine — sparse 3D conv ```python import MinkowskiEngine as ME, torch # Build sparse tensor from point cloud coords = torch.floor(points / voxel_size).int() coords = ME.utils.batched_coordinates([c for c in coords]) feats = torch.ones(coords.shape[0], 3) # or RGB/normal x = ME.SparseTensor(features=feats, coordinates=coords) class SparseUNet(ME.MinkowskiNetwork): def __init__(self, in_channels=3, out_channels=20, D=3): super().__init__(D) self.conv1 = ME.MinkowskiConvolution(in_channels, 32, kernel_size=3, dimension=D) self.bn1 = ME.MinkowskiBatchNorm(32) self.relu = ME.MinkowskiReLU() # ... encoder/decoder def forward(self, x): return self.relu(self.bn1(self.conv1(x))) ``` ### Spconv (Volcano-Lab) — 매 fast alternative ```python import spconv.pytorch as spconv class SimpleSpconv(nn.Module): def __init__(self): super().__init__() self.net = spconv.SparseSequential( spconv.SubMConv3d(3, 32, 3, padding=1), nn.BatchNorm1d(32), nn.ReLU(), spconv.SparseConv3d(32, 64, 2, stride=2), # downsample nn.BatchNorm1d(64), nn.ReLU(), ) def forward(self, voxel_features, voxel_coords, batch_size, spatial_shape): x = spconv.SparseConvTensor(voxel_features, voxel_coords, spatial_shape, batch_size) return self.net(x) ``` ### KITTI LiDAR — load + crop ```python import numpy as np def load_kitti_bin(path): # KITTI LiDAR: (N, 4) — x, y, z, intensity return np.fromfile(path, dtype=np.float32).reshape(-1, 4) def crop_fov(pts, x_range=(-50, 50), y_range=(-50, 50), z_range=(-3, 1)): mask = ((pts[:, 0] > x_range[0]) & (pts[:, 0] < x_range[1]) & (pts[:, 1] > y_range[0]) & (pts[:, 1] < y_range[1]) & (pts[:, 2] > z_range[0]) & (pts[:, 2] < z_range[1])) return pts[mask] ``` ### Gaussian Splatting export (modern 3D recon) ```python # 3DGS — points + Gaussian params (μ, Σ, color, opacity) # 매 NeRF 매 successor — 매 fast train + render import gsplat # https://github.com/nerfstudio-project/gsplat # Render: gsplat.rasterization(means, quats, scales, opacities, colors, ...) ``` ### Point Transformer v3 (SOTA 2024) ```python # pip install pointcept from pointcept.models.point_transformer_v3 import PointTransformerV3 model = PointTransformerV3( in_channels=6, num_classes=20, enc_depths=(2, 2, 2, 6, 2), enc_channels=(32, 64, 128, 256, 512), ) # Input: dict with 'feat', 'coord', 'grid_coord', 'offset' out = model(input_dict) ``` ## 매 결정 기준 | 상황 | Method | |---|---| | 매 small clouds (<10K points) classification | 매 PointNet++ | | 매 LiDAR scene seg (>100K) | 매 sparse conv (Spconv/Mink) | | 매 SOTA segmentation | 매 PTv3 | | 매 3D detection (autonomous) | 매 CenterPoint / TransFusion | | 매 reconstruction from images | 매 Gaussian Splatting | | 매 registration | 매 GeoTransformer | | 매 quick prototyping | 매 Open3D | **기본값**: 매 Spconv (LiDAR) / PTv3 (indoor) / Open3D (general utilities). ## 🔗 Graph - 부모: [[3D-Deep-Learning]] · [[Computer Vision|Computer-Vision]] - 응용: [[Autonomous-Driving]] · [[Gaussian-Splatting]] - Adjacent: [[NeRF]] ## 🤖 LLM 활용 **언제**: 매 LiDAR scene understanding, 매 indoor scan semantic seg, 매 robot perception 의 사용. **언제 X**: 매 dense images already (image CNN sufficient), 매 mesh-native tasks (use mesh networks). ## ❌ 안티패턴 - **매 dense voxel grid**: 매 OOM — 매 sparse representation 의 사용. - **매 ignore normalization**: 매 cloud 의 unit sphere 또는 unit cube 의 normalize. - **매 PointNet for large scenes**: 매 single max-pool 매 100K points → 매 information loss. - **매 forget T-Net (PointNet)**: 매 input transform 의 omit → 매 SO(3) sensitivity. ## 🧪 검증 / 중복 - Verified (PointNet/PointNet++ Qi 2017, MinkowskiEngine docs, Spconv repo, PTv3 2024). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — PointNet/Sparse/PTv3 + LiDAR + Open3D + GS |