How can we address this limitation?

When defining external tables using formats CSV, JSON, TEXT, BINARY any query on the exter-nal tables caches the data and location for performance reasons, so within a given spark session any new files that may have arrived will not be available after the initial query.

How can we address this limitation?
A . UNCACHE TABLE table_name
B. CACHE TABLE table_name
C. REFRESH TABLE table_name
D. BROADCAST TABLE table_name
E. CLEAR CACH table_name

Answer: C

Explanation:

The answer is REFRESH TABLE table_name

REFRESH TABLE table_name will force Spark to refresh the availability of external files and any changes.

When spark queries an external table it caches the files associated with it, so that way if the table is queried again it can use the cached files so it does not have to retrieve them again from cloud object storage, but the drawback here is that if new files are available Spark does not know until the Refresh command is ran.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments