In the Spark SQL table, there are often many small files (the size is much smaller than the HDFS block size). In this case, Spark will start more Task to process these small files. When there is a Shuffle operation in the SQL logic, will greatly increase the number of hash buckets, which will seriously affect performance.

In the Spark SQL table, there are often many small files (the size is much smaller than the HDFS block size). In this case, Spark will start more Task to process these small files. When there is a Shuffle operation in the SQL logic, will greatly increase the number of hash buckets, which will seriously affect performance.
A . True
B . False

Answer: A

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments