Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?
A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
B . transactionsDf.cache()
C . transactionsDf.storage_level(‘MEMORY_ONLY’)
D . transactionsDf.persist()
E . transactionsDf.clear_persist()
F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)
Answer: F
Explanation:
from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY) Correct. Note that the storage level MEMORY_ONLY means that all partitions that do not fit into memory will be recomputed when they are needed. transactionsDf.cache()
This is wrong because the default storage level of DataFrame.cache() is
MEMORY_AND_DISK, meaning that partitions that do not fit into memory are stored on disk.
transactionsDf.persist()
This is wrong because the default storage level of DataFrame.persist() is
MEMORY_AND_DISK.
transactionsDf.clear_persist()
Incorrect, since clear_persist() is not a method of DataFrame.
transactionsDf.storage_level(‘MEMORY_ONLY’)
Wrong. storage_level is not a method of DataFrame.
More info: RDD Programming Guide – Spark 3.0.0 Documentation, pyspark.sql.DataFrame.persist ― PySpark 3.0.0 documentation (https://bit.ly/3sxHLVC , https://bit.ly/3j2N6B9)
Latest Databricks Certified Associate Developer for Apache Spark 3.0 Dumps Valid Version with 180 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund