"매 split → map → shuffle → reduce". MapReduce (Dean & Ghemawat, Google 2004) 는 대규모 batch 처리 의 functional programming 모델. 2026 perspective 에서 raw Hadoop MR 은 legacy, Spark / Flink / BigQuery / Beam 이 후속 표준.
defmap_with_combiner(doc_id,text):local=defaultdict(int)forwordintext.split():local[word.lower()]+=1forw,cinlocal.items():yield(w,c)# 매 network shuffle 양 감소
# Composite key for sort-within-groupdefmap_temp(line):parts=line.split(",")year,temp=parts[0],int(parts[1])yield((year,temp),None)# negative temp for descdefpartitioner(key):returnhash(key[0])%num_reducers# group by year onlydefgrouping_comparator(a,b):return(a[0]>b[0])-(a[0]<b[0])# year only