The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column storeId as key for partitioning. Find the error. Code block: transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A. A. The format("parquet") expression is inappropriate to use here, "parquet" should be passed as...

Continue reading

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to transactionNumber. Find the error. Code block: transactionsDf.withColumn("transactionNumber", "transactionId")A . The arguments to the withColumn method need to be reordered.B . The arguments to the withColumn method...

Continue reading

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)B . transactionsDf.cache()C . transactionsDf.storage_level(‘MEMORY_ONLY’)D . transactionsDf.persist()E . transactionsDf.clear_persist()F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY) View Answer Answer: F Explanation: from...

Continue reading

spark.read.json(filePath, schema=schema)

spark.read.json(filePath, schema=schema) C. spark.read.json(filePath, schema=schema_of_json(json_schema)) D. spark.read.json(filePath, schema=spark.read.json(json_schema)) View Answer Answer: B Explanation: Spark provides a way to digest JSON-formatted strings as schema. However, it is not trivial to use. Although slightly above exam difficulty, this QUESTION NO: is beneficial to your exam preparation, since it helps you to familiarize yourself with the concept...

Continue reading

spark.sql(statement).drop("value", "storeId", "attributes")

spark.sql(statement).drop("value", "storeId", "attributes") View Answer Answer: E Explanation: This QUESTION NO: offers you a wide variety of answers for a seemingly simple question. However, this variety reflects the variety of ways that one can express a join in PySpark. You need to understand some SQL syntax to get to the correct answer here. transactionsDf.createOrReplaceTempView(‘transactionsDf’)...

Continue reading

itemsDf.withColumnRenamed("supplier", "feature1")

itemsDf.withColumnRenamed("supplier", "feature1") C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1")) D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1") E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1") View Answer Answer: D Explanation: itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1") Correct! Spark’s DataFrame.withColumnRenamed syntax makes it relatively easy to change the name of a column. itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1) Incorrect. In this code block, the Python interpreter will try to use attributes...

Continue reading

articlesDf = articlesDf.groupby("col").count()

articlesDf = articlesDf.groupby("col").count() B. 4, 5 C. 2, 5, 3 D. 5, 2 E. 2, 3, 4 F. 2, 5, 4 View Answer Answer: E Explanation: Correct code block: articlesDf = articlesDf.select(explode(col(‘attributes’))) articlesDf = articlesDf.groupby(‘col’).count() articlesDf = articlesDf.sort(‘count’,ascending=False).select(‘col’) Output of correct code block: +——-+ | col| +——-+ | summer| | winter| | blue| |...

Continue reading