In many small file scenarios, Spark will start a lot of tasks. When there is a Shuffle operation in the SQL logic, it will greatly increase the number of hash buckets, which will seriously affect performance. In Fusioninsight, for small file scenarios, the () operator is usually used to merge the partitions generated by the small files in the Table to reduce the number of partitions, to avoid generating too many hash buckets during shuffle and improve performance?

In many small file scenarios, Spark will start a lot of tasks. When there is a Shuffle operation in the SQL logic, it will greatly increase the number of hash buckets, which will seriously affect performance. In Fusioninsight, for small file scenarios, the () operator is usually used to merge the partitions generated by the small files in the Table to reduce the number of partitions, to avoid generating too many hash buckets during shuffle and improve performance?
A . group by
B . coalosce
C . connect
D . join

Answer: D

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments