Spark Question

Shuffling is the process of redistributing data across partitions that may lead to data movement across the executors. The shuffle operation is implemented differently in Spark compared to Hadoop.

Shuffling has 2 important compression parameters:

spark.shuffle.compress – checks whether the engine would compress shuffle outputs or not spark.shuffle.spill.compress – decides whether to compress intermediate shuffle spill files or not

It occurs while joining two tables or while performing byKey operations such as GroupByKey or ReduceByKey

Get In Touch

Pune

Pune Maharashtra

infooverflow.org@gmail.com

Online User - 0

Quick Link

Home About Us Term Of Uses Privacy Policy FAQ Contact Us

infooverflow.org@gmail.com

Free Library

Spark Question

Get In Touch

Quick Link

News Letter

Follow Us

infooverflow.org@gmail.com

Free Library

Spark Question

What is shuffling in Spark ?

Get In Touch

Quick Link

News Letter

Follow Us