Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?
A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
B . transactionsDf.cache()
C . transactionsDf.storage_level(‘MEMORY_ONLY’)
D . transactionsDf.persist()
E . transactionsDf.clear_persist()
F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Answer: F

Explanation:

from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY) Correct. Note that the storage level MEMORY_ONLY means that all partitions that do not fit into memory will be recomputed when they are needed. transactionsDf.cache()

This is wrong because the default storage level of DataFrame.cache() is

MEMORY_AND_DISK, meaning that partitions that do not fit into memory are stored on disk.

transactionsDf.persist()

This is wrong because the default storage level of DataFrame.persist() is

MEMORY_AND_DISK.

transactionsDf.clear_persist()

Incorrect, since clear_persist() is not a method of DataFrame.

transactionsDf.storage_level(‘MEMORY_ONLY’)

Wrong. storage_level is not a method of DataFrame.

More info: RDD Programming Guide – Spark 3.0.0 Documentation, pyspark.sql.DataFrame.persist ― PySpark 3.0.0 documentation (https://bit.ly/3sxHLVC , https://bit.ly/3j2N6B9)

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>