--- id: wiki-2026-0508-spatial-data-analysis title: Spatial Data Analysis category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Geospatial-Analysis, GIS-Analysis] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [geospatial, gis, spatial, analysis, statistics] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: geopandas-shapely-pysal --- # Spatial Data Analysis ## 매 한 줄 > **"매 location 의 matter — Tobler's First Law"**. 매 가까운 곳 의 더 관련 — 매 spatial autocorrelation 의 측정 / modeling. 1854 Snow's cholera map 에서 시작, 2026 에 epidemiology, urban planning, climate, autonomous driving 의 중심. ## 매 핵심 ### 매 data type - **Vector**: point (city), line (road), polygon (district) — GeoJSON, Shapefile, GeoParquet. - **Raster**: gridded (satellite imagery, DEM, climate) — GeoTIFF, Zarr, COG (Cloud-Optimized GeoTIFF). - **Network**: routable graphs (road, transit) — OSMnx, pgRouting. - **Trajectory**: time-stamped points — MovingPandas. ### 매 operation - **Spatial join**: 매 polygon 안 의 point 의 매칭. - **Buffer**: 매 distance d 만큼 의 surround region. - **Overlay**: intersection, union, difference. - **Reprojection**: CRS (coordinate reference system) — WGS84, UTM, Web Mercator. - **Aggregation**: pixel/zone 별 statistics. ### 매 statistic - **Moran's I**: 매 global spatial autocorrelation — Tobler's law 의 측정. - **Getis-Ord G\***: 매 local hotspot — 매 cluster 의 위치 의 발견. - **Variogram / Kriging**: 매 spatial interpolation — geostatistics. - **Geographically Weighted Regression (GWR)**: 매 spatially-varying coefficients. ## 💻 패턴 ### 1. GeoPandas — vector load + filter ```python import geopandas as gpd gdf = gpd.read_file("districts.geojson").to_crs("EPSG:3857") # Web Mercator seoul = gdf[gdf["name"].str.contains("Seoul")] ``` ### 2. Spatial join — points in polygons ```python points = gpd.read_file("incidents.csv") joined = gpd.sjoin(points, gdf, how="left", predicate="within") counts = joined.groupby("district").size() ``` ### 3. Buffer + overlay ```python roads = gpd.read_file("roads.shp") buffer_500m = roads.buffer(500) # CRS 가 meters 인 경우 flood = gpd.read_file("flood.geojson") risk = gpd.overlay(buffer_500m, flood, how="intersection") ``` ### 4. Moran's I (PySAL) ```python from libpysal.weights import Queen from esda.moran import Moran w = Queen.from_dataframe(gdf) moran = Moran(gdf["income"], w) print(moran.I, moran.p_sim) # autocorrelation + permutation p-value ``` ### 5. Local hotspot (Getis-Ord G*) ```python from esda.getisord import G_Local g = G_Local(gdf["crime"], w, transform="R") gdf["z"] = g.Zs # 매 z>2.58 → 매 99% hotspot ``` ### 6. Raster — Rasterio + xarray ```python import rioxarray da = rioxarray.open_rasterio("landsat.tif", masked=True) ndvi = (da.sel(band=4) - da.sel(band=3)) / (da.sel(band=4) + da.sel(band=3)) ndvi.rio.to_raster("ndvi.tif") ``` ### 7. Kriging interpolation ```python from pykrige.ok import OrdinaryKriging ok = OrdinaryKriging(x, y, z, variogram_model="spherical") grid_z, _ = ok.execute("grid", gridx, gridy) ``` ### 8. STAC + COG (cloud-native, 2026) ```python import pystac_client import stackstac catalog = pystac_client.Client.open("https://earth-search.aws.element84.com/v1") items = catalog.search(collections=["sentinel-2-l2a"], bbox=bbox).item_collection() stack = stackstac.stack(items, assets=["B04", "B08"]) # 매 lazy xarray ``` ### 9. H3 hexagonal indexing (Uber) ```python import h3 hexes = [h3.latlng_to_cell(lat, lng, resolution=9) for lat, lng in coords] # 매 hex aggregation 으로 zone-based stats ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Vector ops | GeoPandas / Shapely | | Raster ops | Rasterio / rioxarray / xarray | | Cloud-scale (TB+) | STAC + COG + Dask | | Hotspot detection | Getis-Ord G* | | Continuous interpolation | Kriging | | Discrete zoning / aggregation | H3 / S2 cells | | Routing | OSMnx / pgRouting | | Visualization | Folium, Kepler.gl, Deck.gl | **기본값**: GeoPandas + EPSG:4326 → ops 시 projected CRS (UTM/3857) → ESDA (PySAL) for stats. ## 🔗 Graph - 부모: [[Statistics]] · [[Geographic-Information-Systems]] - 변형: [[Geographic-Information-Systems]] · [[Knowledge Graph]] - 응용: [[Autonomous-Vehicle-Path-Planning]] · [[Climate Change Mitigation Frameworks]] - Adjacent: [[Multivariate-Analysis]] · [[Regression-Analysis-Foundations]] ## 🤖 LLM 활용 **언제**: place-name geocoding 의 disambiguation, narrative description of spatial pattern, OSM tag interpretation. **언제 X**: 매 numerical kriging, projection — 매 dedicated geospatial library 의 사용. ## ❌ 안티패턴 - **Mixing CRS without conversion**: meters + degrees 의 mix → 매 silent error. - **Web Mercator for area calc**: distortion at high latitudes → 매 equal-area projection (Mollweide, Equal Earth) 의 사용. - **Ignoring spatial autocorrelation in regression**: OLS assumption 의 violation → GWR / spatial lag model. - **Rasterizing then re-vectorizing**: precision loss — 매 vector ops 의 가능 시 매 vector 의 유지. ## 🧪 검증 / 중복 - Verified (PySAL docs, *Geocomputation with Python* — Lovelace et al., USGS standards). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — vector/raster/STAC + ESDA patterns. |