TeraSort
TeraSort TeraSort is a benchmark style distributed sorting procedure for very large datasets. It is commonly associated with sorting one terabyte or more of records using a cluster. The core idea is simple: sample keys, choose range partitions, shuffle records to the correct partitions, sort each partition locally, and write ordered output shards. TeraSort is close to MapReduce sort in structure, but it is usually discussed as a performance benchmark....