The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column storeId as key for partitioning. Find the error. Code block: transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A. A. The format("parquet") expression is inappropriate to use here, "parquet" should be passed as...
Continue readingWhich of the elements that are labeled with a circle and a number contain an error or are misrepresented?
Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?A . 1, 10B . 1, 8C . 10D . 7, 9, 10E . 1, 4, 6, 9 View Answer Answer: B Explanation: 1: Correct C This should just read "API" or "DataFrame API". The DataFrame...
Continue readingWhich of the elements that are labeled with a circle and a number contain an error or are misrepresented?
Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?A . 1, 10B . 1, 8C . 10D . 7, 9, 10E . 1, 4, 6, 9 View Answer Answer: B Explanation: 1: Correct C This should just read "API" or "DataFrame API". The DataFrame...
Continue readingThe code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to
The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to transactionNumber. Find the error. Code block: transactionsDf.withColumn("transactionNumber", "transactionId")A . The arguments to the withColumn method need to be reordered.B . The arguments to the withColumn method...
Continue readingWhich of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?
Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)B . transactionsDf.cache()C . transactionsDf.storage_level(‘MEMORY_ONLY’)D . transactionsDf.persist()E . transactionsDf.clear_persist()F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY) View Answer Answer: F Explanation: from...
Continue readingspark.read.json(filePath, schema=schema)
spark.read.json(filePath, schema=schema) C. spark.read.json(filePath, schema=schema_of_json(json_schema)) D. spark.read.json(filePath, schema=spark.read.json(json_schema)) View Answer Answer: B Explanation: Spark provides a way to digest JSON-formatted strings as schema. However, it is not trivial to use. Although slightly above exam difficulty, this QUESTION NO: is beneficial to your exam preparation, since it helps you to familiarize yourself with the concept...
Continue readingWhich of the following is a characteristic of the cluster manager?
Which of the following is a characteristic of the cluster manager?A . Each cluster manager works on a single partition of data.B . The cluster manager receives input from the driver through the SparkContext.C . The cluster manager does not exist in standalone mode.D . The cluster manager transforms jobs into DAGs.E . In...
Continue readingspark.sql(statement).drop("value", "storeId", "attributes")
spark.sql(statement).drop("value", "storeId", "attributes") View Answer Answer: E Explanation: This QUESTION NO: offers you a wide variety of answers for a seemingly simple question. However, this variety reflects the variety of ways that one can express a join in PySpark. You need to understand some SQL syntax to get to the correct answer here. transactionsDf.createOrReplaceTempView(‘transactionsDf’)...
Continue readingitemsDf.withColumnRenamed("supplier", "feature1")
itemsDf.withColumnRenamed("supplier", "feature1") C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1")) D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1") E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1") View Answer Answer: D Explanation: itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1") Correct! Spark’s DataFrame.withColumnRenamed syntax makes it relatively easy to change the name of a column. itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1) Incorrect. In this code block, the Python interpreter will try to use attributes...
Continue readingarticlesDf = articlesDf.groupby("col").count()
articlesDf = articlesDf.groupby("col").count() B. 4, 5 C. 2, 5, 3 D. 5, 2 E. 2, 3, 4 F. 2, 5, 4 View Answer Answer: E Explanation: Correct code block: articlesDf = articlesDf.select(explode(col(‘attributes’))) articlesDf = articlesDf.groupby(‘col’).count() articlesDf = articlesDf.sort(‘count’,ascending=False).select(‘col’) Output of correct code block: +——-+ | col| +——-+ | summer| | winter| | blue| |...
Continue reading