Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Databricks Certified Associate Developer for Apache Spark 3.0 exam Online Training

Question #1

Which of the following code blocks silently writes DataFrame itemsDf in avro format to location fileLocation if a file does not yet exist at that location?

  • A . itemsDf.write.avro(fileLocation)
  • B . itemsDf.write.format("avro").mode("ignore").save(fileLocation)
  • C . itemsDf.write.format("avro").mode("errorifexists").save(fileLocation)
  • D . itemsDf.save.format("avro").mode("ignore").write(fileLocation)
  • E . spark.DataFrameWriter(itemsDf).format("avro").write(fileLocation)

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #2

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  • A . 1, 10
  • B . 1, 8
  • C . 10
  • D . 7, 9, 10
  • E . 1, 4, 6, 9

Reveal Solution Hide Solution

Question #30

5

Reveal Solution Hide Solution

Question #30

5

Reveal Solution Hide Solution

Question #32

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

  • A . transactionsDf.sort(asc(value)).show(10)
  • B . transactionsDf.sort(col("value")).show(10)
  • C . transactionsDf.sort(col("value").desc()).head()
  • D . transactionsDf.sort(col("value").asc()).print(10)
  • E . transactionsDf.orderBy("value").asc().show(10)

Reveal Solution Hide Solution

Question #32

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

  • A . transactionsDf.sort(asc(value)).show(10)
  • B . transactionsDf.sort(col("value")).show(10)
  • C . transactionsDf.sort(col("value").desc()).head()
  • D . transactionsDf.sort(col("value").asc()).print(10)
  • E . transactionsDf.orderBy("value").asc().show(10)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #34

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

  • A . from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
  • B . transactionsDf.cache()
  • C . transactionsDf.storage_level(‘MEMORY_ONLY’)
  • D . transactionsDf.persist()
  • E . transactionsDf.clear_persist()
  • F . from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #55

spark.read.json(filePath, schema=schema)

C. spark.read.json(filePath, schema=schema_of_json(json_schema))

D. spark.read.json(filePath, schema=spark.read.json(json_schema))

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #77

"left_semi"

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #98

parquet

Reveal Solution Hide Solution

Question #120

parquet

Reveal Solution Hide Solution

Question #121

Which of the following is a viable way to improve Spark’s performance when dealing with large amounts of data, given that there is only a single application running on the cluster?

  • A . Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions
  • B . Decrease values for the properties spark.default.parallelism and spark.sql.partitions
  • C . Increase values for the properties spark.sql.parallelism and spark.sql.partitions
  • D . Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions
  • E . Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #122

Which of the following is the deepest level in Spark’s execution hierarchy?

  • A . Job
  • B . Task
  • C . Executor
  • D . Slot
  • E . Stage

Reveal Solution Hide Solution

Question #135

count()

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #136

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

  • A . tranactionsDf.select(‘value’).join(transactionsDf.select(‘productId’), col(‘value’)==col(‘productId’), ‘outer’)
  • B . transactionsDf.select(col(‘value’), col(‘productId’)).agg({‘*’: ‘count’})
  • C . transactionsDf.select(‘value’, ‘productId’).distinct()
  • D . transactionsDf.select(‘value’).union(transactionsDf.select(‘productId’)).distinct()
  • E . transactionsDf.agg({‘value’: ‘collect_set’, ‘productId’: ‘collect_set’})

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #145

+————-+———+—–+——-+———+—-+

  • A . The column names should be listed directly as arguments to the operator and not as a list.
  • B . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
  • C . The select operator should be replaced by a drop operator.
  • D . The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
  • E . The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Reveal Solution Hide Solution

Question #166

col("value")

Reveal Solution Hide Solution

Question #167

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

  • A . transactionsDf.groupBy(col(storeId).avg())
  • B . transactionsDf.groupBy("storeId").avg(col("value"))
  • C . transactionsDf.groupBy("storeId").agg(avg("value"))
  • D . transactionsDf.groupBy("storeId").agg(average("value"))
  • E . transactionsDf.groupBy("value").average()

Reveal Solution Hide Solution

Question #167

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

  • A . transactionsDf.groupBy(col(storeId).avg())
  • B . transactionsDf.groupBy("storeId").avg(col("value"))
  • C . transactionsDf.groupBy("storeId").agg(avg("value"))
  • D . transactionsDf.groupBy("storeId").agg(average("value"))
  • E . transactionsDf.groupBy("value").average()

Reveal Solution Hide Solution

Question #169

spark.createDataFrame([("red",), ("blue",), ("green",)], "color")

Instead of calling spark.createDataFrame, just DataFrame should be called.

  • A . The commas in the tuples with the colors should be eliminated.
  • B . The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.
  • C . Instead of color, a data type should be specified.
  • D . The "color" expression needs to be wrapped in brackets, so it reads ["color"].

Reveal Solution Hide Solution

Question #169

spark.createDataFrame([("red",), ("blue",), ("green",)], "color")

Instead of calling spark.createDataFrame, just DataFrame should be called.

  • A . The commas in the tuples with the colors should be eliminated.
  • B . The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.
  • C . Instead of color, a data type should be specified.
  • D . The "color" expression needs to be wrapped in brackets, so it reads ["color"].

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #171

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

  • A . itemsDf.persist(StorageLevel.MEMORY_ONLY)
  • B . itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
  • C . itemsDf.store()
  • D . itemsDf.cache()
  • E . itemsDf.write.option(‘destination’, ‘memory’).save()

Reveal Solution Hide Solution

Question #183

+————-+———+—–+——-+———+—-+

  • A . transactionsDf.max(‘value’).min(‘value’)
  • B . transactionsDf.agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • C . transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))
  • D . transactionsDf.groupby(‘productId’).agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • E . transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Reveal Solution Hide Solution

Question #183

+————-+———+—–+——-+———+—-+

  • A . transactionsDf.max(‘value’).min(‘value’)
  • B . transactionsDf.agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • C . transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))
  • D . transactionsDf.groupby(‘productId’).agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • E . transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Reveal Solution Hide Solution

Question #183

+————-+———+—–+——-+———+—-+

  • A . transactionsDf.max(‘value’).min(‘value’)
  • B . transactionsDf.agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • C . transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))
  • D . transactionsDf.groupby(‘productId’).agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • E . transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Reveal Solution Hide Solution

Question #183

+————-+———+—–+——-+———+—-+

  • A . transactionsDf.max(‘value’).min(‘value’)
  • B . transactionsDf.agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • C . transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))
  • D . transactionsDf.groupby(‘productId’).agg(max(‘value’).alias(‘highest’), min(‘value’).alias(‘lowest’))
  • E . transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #187

spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #200

articlesDf = articlesDf.groupby("col").count()

B. 4, 5

C. 2, 5, 3

D. 5, 2

E. 2, 3, 4

F. 2, 5, 4

Reveal Solution Hide Solution

Question #217

"MM d (EEE)"

Reveal Solution Hide Solution

Question #217

"MM d (EEE)"

Reveal Solution Hide Solution

Question #217

"MM d (EEE)"

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #220

itemsDf.withColumnRenamed("supplier", "feature1")

C. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Reveal Solution Hide Solution

Question #227

importedDf = spark.read.json(jsonPath)

  • A . 4, 1, 2
  • B . 5, 1, 3
  • C . 5, 2
  • D . 4, 1, 3
  • E . 5, 1, 2

Reveal Solution Hide Solution

Question #227

importedDf = spark.read.json(jsonPath)

  • A . 4, 1, 2
  • B . 5, 1, 3
  • C . 5, 2
  • D . 4, 1, 3
  • E . 5, 1, 2

Reveal Solution Hide Solution

Question #229

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

  • A . transactionsDf.withColumn("storeId", convert("storeId", "string"))
  • B . transactionsDf.withColumn("storeId", col("storeId", "string"))
  • C . transactionsDf.withColumn("storeId", col("storeId").convert("string"))
  • D . transactionsDf.withColumn("storeId", col("storeId").cast("string"))
  • E . transactionsDf.withColumn("storeId", convert("storeId").as("string"))

Reveal Solution Hide Solution

Question #230

Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?

  • A . itemsDf.write.mode("overwrite").parquet(filePath)
  • B . itemsDf.write.option("parquet").mode("overwrite").path(filePath)
  • C . itemsDf.write(filePath, mode="overwrite")
  • D . itemsDf.write.mode("overwrite").path(filePath)
  • E . itemsDf.write().parquet(filePath, mode="overwrite")

Reveal Solution Hide Solution

Question #230

Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?

  • A . itemsDf.write.mode("overwrite").parquet(filePath)
  • B . itemsDf.write.option("parquet").mode("overwrite").path(filePath)
  • C . itemsDf.write(filePath, mode="overwrite")
  • D . itemsDf.write.mode("overwrite").path(filePath)
  • E . itemsDf.write().parquet(filePath, mode="overwrite")

Reveal Solution Hide Solution

Question #232

Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

  • A . Slot is another name for executor.
  • B . There must be less executors than tasks.
  • C . An executor runs on a single core.
  • D . There must be more slots than tasks.
  • E . Tasks run in parallel via slots.

Reveal Solution Hide Solution

Question #233

Which of the following code blocks returns a DataFrame with an added column to DataFrame transactionsDf that shows the unix epoch timestamps in column transactionDate as strings in the format

month/day/year in column transactionDateFormatted?

Excerpt of DataFrame transactionsDf:

  • A . transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate", format="dd/MM/yyyy"))
  • B . transactionsDf.withColumnRenamed("transactionDate", "transactionDateFormatted", from_unixtime("transactionDateFormatted", format="MM/dd/yyyy"))
  • C . transactionsDf.apply(from_unixtime(format="MM/dd/yyyy")).asColumn("transactionDateFor matted")
  • D . transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate", format="MM/dd/yyyy"))
  • E . transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate"))

Reveal Solution Hide Solution

Question #233

Which of the following code blocks returns a DataFrame with an added column to DataFrame transactionsDf that shows the unix epoch timestamps in column transactionDate as strings in the format

month/day/year in column transactionDateFormatted?

Excerpt of DataFrame transactionsDf:

  • A . transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate", format="dd/MM/yyyy"))
  • B . transactionsDf.withColumnRenamed("transactionDate", "transactionDateFormatted", from_unixtime("transactionDateFormatted", format="MM/dd/yyyy"))
  • C . transactionsDf.apply(from_unixtime(format="MM/dd/yyyy")).asColumn("transactionDateFor matted")
  • D . transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate", format="MM/dd/yyyy"))
  • E . transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate"))

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #235

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_s plit")A.

  • A . The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.
  • B . Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.
  • C . Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.
  • D . partitionOn("storeId") should be called before the write operation.
  • E . The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #250

spark.sql(statement).drop("value", "storeId", "attributes")

Reveal Solution Hide Solution

Question #260

transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

Reveal Solution Hide Solution

Question #261

Which of the following statements about broadcast variables is correct?

  • A . Broadcast variables are serialized with every single task.
  • B . Broadcast variables are commonly used for tables that do not fit into memory.
  • C . Broadcast variables are immutable.
  • D . Broadcast variables are occasionally dynamically updated on a per-task basis.
  • E . Broadcast variables are local to the worker node and not shared across the cluster.

Reveal Solution Hide Solution

Question #262

The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.

Code block:

transactionsDf.filter(col(‘predError’).in([3, 6])).count()

  • A . The number of rows cannot be determined with the count() operator.
  • B . Instead of filter, the select method should be used.
  • C . The method used on column predError is incorrect.
  • D . Instead of a list, the values need to be passed as single arguments to the in operator.
  • E . Numbers 3 and 6 need to be passed as string variables.

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #263

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  • A . transactionsDf.drop(["predError", "value"])
  • B . transactionsDf.drop("predError", "value")
  • C . transactionsDf.drop(col("predError"), col("value"))
  • D . transactionsDf.drop(predError, value)
  • E . transactionsDf.drop("predError & value")

Reveal Solution Hide Solution

Question #275

spark.read.options("modifiedBefore", "2029-03-

20T05:44:46").schema(schema).load(filePath)

  • A . The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark’s DataFrameReader is incorrect.
  • B . Columns in the schema definition use the wrong object type and the syntax of the call to Spark’s DataFrameReader is incorrect.
  • C . The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.
  • D . Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.
  • E . Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Reveal Solution Hide Solution

Question #276

Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?

  • A . itemsDf.cache().count()
  • B . itemsDf.cache(eager=True)
  • C . cache(itemsDf)
  • D . itemsDf.cache().filter()
  • E . itemsDf.rdd.storeCopy()

Reveal Solution Hide Solution

Question #277

The code block displayed below contains an error. The code block is intended to join

DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.

Code block:

transactionsDf.join(itemsDf, "itemId", how="broadcast")

  • A . The syntax is wrong, how= should be removed from the code block.
  • B . The join method should be replaced by the broadcast method.
  • C . Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
  • D . The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
  • E . broadcast is not a valid join type.

Reveal Solution Hide Solution

Question #277

The code block displayed below contains an error. The code block is intended to join

DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.

Code block:

transactionsDf.join(itemsDf, "itemId", how="broadcast")

  • A . The syntax is wrong, how= should be removed from the code block.
  • B . The join method should be replaced by the broadcast method.
  • C . Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
  • D . The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
  • E . broadcast is not a valid join type.

Reveal Solution Hide Solution

Question #279

print(itemsDf.types)

B. itemsDf.printSchema()

C. spark.schema(itemsDf)

D. itemsDf.rdd.printSchema()

E. itemsDf.print.schema()

Reveal Solution Hide Solution

Question #279

print(itemsDf.types)

B. itemsDf.printSchema()

C. spark.schema(itemsDf)

D. itemsDf.rdd.printSchema()

E. itemsDf.print.schema()

Reveal Solution Hide Solution

Question #281

The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.

Code block:

transactionsDf.agg("storeId").avg("value")

  • A . Instead of avg("value"), avg(col("value")) should be used.
  • B . The avg("value") should be specified as a second argument to agg() instead of being appended to it.
  • C . All column names should be wrapped in col() operators.
  • D . agg should be replaced by groupBy.
  • E . "storeId" and "value" should be swapped.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #282

Which of the following statements about the differences between actions and transformations is correct?

  • A . Actions are evaluated lazily, while transformations are not evaluated lazily.
  • B . Actions generate RDDs, while transformations do not.
  • C . Actions do not send results to the driver, while transformations do.
  • D . Actions can be queued for delayed execution, while transformations can only be processed immediately.
  • E . Actions can trigger Adaptive Query Execution, while transformation cannot.

Reveal Solution Hide Solution

Question #290

+——+———————————-+——————-+

Code block:

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

  • A . All column names need to be wrapped in the col() operator.
  • B . Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument "," needs to be passed to the split method.
  • C . Operator withColumnRenamed needs to be replaced with operator withColumn and the split method needs to be replaced by the splitString method.
  • D . Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument " " needs to be passed to the split method.
  • E . The expressions "itemNameElements" and split("itemName") need to be swapped.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #291

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

  • A . The arguments to the withColumn method need to be reordered.
  • B . The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.
  • C . The copy() operator should be appended to the code block to ensure a copy is returned.
  • D . Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.
  • E . The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Reveal Solution Hide Solution

Question #343

transactionsDf.select(count_to_target_udf(‘predError’))

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #344

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

  • A . transactionsDf.repartition(transactionsDf.getNumPartitions()+2)
  • B . transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)
  • C . transactionsDf.coalesce(10)
  • D . transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)
  • E . transactionsDf.repartition(transactionsDf._partitions+2)

Reveal Solution Hide Solution

Question #352

part-00003-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-301-1-

c000.csv.gz

spark.option("header",True).csv(filePath)

  • A . spark.read.format("csv").option("header",True).option("compression","zip").load(filePath)
  • B . spark.read().option("header",True).load(filePath)
  • C . spark.read.format("csv").option("header",True).load(filePath)
  • D . spark.read.load(filePath)

Reveal Solution Hide Solution

Question #353

Which of the following statements about stages is correct?

  • A . Different stages in a job may be executed in parallel.
  • B . Stages consist of one or more jobs.
  • C . Stages ephemerally store transactions, before they are committed through actions.
  • D . Tasks in a stage may be executed by multiple machines at the same time.
  • E . Stages may contain multiple actions, narrow, and wide transformations.

Reveal Solution Hide Solution

Question #354

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a

DataFrame?

  • A . spark.mode("parquet").read("/FileStore/imports.parquet")
  • B . spark.read.path("/FileStore/imports.parquet", source="parquet")
  • C . spark.read().parquet("/FileStore/imports.parquet")
  • D . spark.read.parquet("/FileStore/imports.parquet")
  • E . spark.read().format(‘parquet’).open("/FileStore/imports.parquet")

Reveal Solution Hide Solution

Question #354

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a

DataFrame?

  • A . spark.mode("parquet").read("/FileStore/imports.parquet")
  • B . spark.read.path("/FileStore/imports.parquet", source="parquet")
  • C . spark.read().parquet("/FileStore/imports.parquet")
  • D . spark.read.parquet("/FileStore/imports.parquet")
  • E . spark.read().format(‘parquet’).open("/FileStore/imports.parquet")

Reveal Solution Hide Solution

Question #356

Which of the following describes the difference between client and cluster execution modes?

  • A . In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.
  • B . In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.
  • C . In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.
  • D . In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.
  • E . In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

Reveal Solution Hide Solution

Question #357

Which of the following statements about Spark’s configuration properties is incorrect?

  • A . The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.
  • B . The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.
  • C . The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.
  • D . The default number of partitions to use when shuffling data for joins or aggregations is 300.
  • E . The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

Reveal Solution Hide Solution

Question #358

Which of the following describes a valid concern about partitioning?

  • A . A shuffle operation returns 200 partitions if not explicitly set.
  • B . Decreasing the number of partitions reduces the overall runtime of narrow transformations if there are more executors available than partitions.
  • C . No data is exchanged between executors when coalesce() is run.
  • D . Short partition processing times are indicative of low skew.
  • E . The coalesce() method should be used to increase the number of partitions.

Reveal Solution Hide Solution

Question #359

Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?

  • A . transactionsDf.filter(productId==3 or productId<1)
  • B . transactionsDf.filter((col("productId")==3) or (col("productId")<1))
  • C . transactionsDf.filter(col("productId")==3 | col("productId")<1)
  • D . transactionsDf.where("productId"=3).or("productId"<1))
  • E . transactionsDf.filter((col("productId")==3) | (col("productId")<1))

Reveal Solution Hide Solution

Question #359

Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?

  • A . transactionsDf.filter(productId==3 or productId<1)
  • B . transactionsDf.filter((col("productId")==3) or (col("productId")<1))
  • C . transactionsDf.filter(col("productId")==3 | col("productId")<1)
  • D . transactionsDf.where("productId"=3).or("productId"<1))
  • E . transactionsDf.filter((col("productId")==3) | (col("productId")<1))

Reveal Solution Hide Solution

Question #361

Which of the following describes a difference between Spark’s cluster and client execution modes?

  • A . In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client mode.
  • B . In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.
  • C . In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.
  • D . In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.
  • E . In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.

Reveal Solution Hide Solution

Question #362

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

  • A . spark.read.json(filePath)
  • B . spark.read.path(filePath, source="json")
  • C . spark.read().path(filePath)
  • D . spark.read().json(filePath)
  • E . spark.read.path(filePath)

Reveal Solution Hide Solution

Question #362

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

  • A . spark.read.json(filePath)
  • B . spark.read.path(filePath, source="json")
  • C . spark.read().path(filePath)
  • D . spark.read().json(filePath)
  • E . spark.read.path(filePath)

Reveal Solution Hide Solution

Question #364

The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by

column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items

in column value. Find the error.

Code block:

transactionsDf.orderBy(‘value’, asc_nulls_first(col(‘predError’)))

  • A . Two orderBy statements with calls to the individual columns should be chained, instead of having both columns in one orderBy statement.
  • B . Column value should be wrapped by the col() operator.
  • C . Column predError should be sorted in a descending way, putting nulls last.
  • D . Column predError should be sorted by desc_nulls_first() instead.
  • E . Instead of orderBy, sort should be used.

Reveal Solution Hide Solution

Question #365

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

  • A . transactionsDf.dropna("any")
  • B . transactionsDf.dropna(thresh=4)
  • C . transactionsDf.drop.na("",2)
  • D . transactionsDf.dropna(thresh=2)
  • E . transactionsDf.dropna("",4)

Reveal Solution Hide Solution

Question #365

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

  • A . transactionsDf.dropna("any")
  • B . transactionsDf.dropna(thresh=4)
  • C . transactionsDf.drop.na("",2)
  • D . transactionsDf.dropna(thresh=2)
  • E . transactionsDf.dropna("",4)

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #367

Which of the following describes tasks?

  • A . A task is a command sent from the driver to the executors in response to a transformation.
  • B . Tasks transform jobs into DAGs.
  • C . A task is a collection of slots.
  • D . A task is a collection of rows.
  • E . Tasks get assigned to the executors by the driver.

Reveal Solution Hide Solution

Question #383

5

Reveal Solution Hide Solution

Question #383

5

Reveal Solution Hide Solution

Question #385

Which of the following is a problem with using accumulators?

  • A . Only unnamed accumulators can be inspected in the Spark UI.
  • B . Only numeric values can be used in accumulators.
  • C . Accumulator values can only be read by the driver, but not by executors.
  • D . Accumulators do not obey lazy evaluation.
  • E . Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Reveal Solution Hide Solution

Question #386

The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching value in column itemId with a value in column transactionsId of DataFrame transactionsDf.

Find the error.

Code block:

itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)

  • A . The join statement is incomplete.
  • B . The union method should be used instead of join.
  • C . The join method is inappropriate.
  • D . The merge method should be used instead of join.
  • E . The join expression is malformed.

Reveal Solution Hide Solution

Question #387

Which of the following statements about lazy evaluation is incorrect?

  • A . Predicate pushdown is a feature resulting from lazy evaluation.
  • B . Execution is triggered by transformations.
  • C . Spark will fail a job only during execution, but not during definition.
  • D . Accumulators do not change the lazy evaluation model of Spark.
  • E . Lineages allow Spark to coalesce transformations into stages

Reveal Solution Hide Solution

Question #388

Which of the following are valid execution modes?

  • A . Kubernetes, Local, Client
  • B . Client, Cluster, Local
  • C . Server, Standalone, Client
  • D . Cluster, Server, Local
  • E . Standalone, Client, Cluster

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #389

Which of the following describes characteristics of the Dataset API?

  • A . The Dataset API does not support unstructured data.
  • B . In Python, the Dataset API mainly resembles Pandas’ DataFrame API.
  • C . In Python, the Dataset API’s schema is constructed via type hints.
  • D . The Dataset API is available in Scala, but it is not available in Python.
  • E . The Dataset API does not provide compile-time type safety.

Reveal Solution Hide Solution

Question #397

+——+—————————–+——————-+

  • A . itemsDf.withColumn(‘attributes’, sort_array(col(‘attributes’).desc()))
  • B . itemsDf.withColumn(‘attributes’, sort_array(desc(‘attributes’)))
  • C . itemsDf.withColumn(‘attributes’, sort(col(‘attributes’), asc=False))
  • D . itemsDf.withColumn("attributes", sort_array("attributes", asc=False))
  • E . itemsDf.select(sort_array("attributes"))

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #398

Which is the highest level in Spark’s execution hierarchy?

  • A . Task
  • B . Executor
  • C . Slot
  • D . Job
  • E . Stage

Reveal Solution Hide Solution

Question #410

spark.sql ("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")

F. transactionsDf.filter(col(transactionId).isin([3,4,6]))

Reveal Solution Hide Solution

Question #411

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

  • A . Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
  • B . Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
  • C . Use a narrow transformation to reduce the number of partitions.
  • D . Use a wide transformation to reduce the number of partitions.
    Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Reveal Solution Hide Solution

Question #411

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

  • A . Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
  • B . Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
  • C . Use a narrow transformation to reduce the number of partitions.
  • D . Use a wide transformation to reduce the number of partitions.
    Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Reveal Solution Hide Solution

Question #411

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

  • A . Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
  • B . Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
  • C . Use a narrow transformation to reduce the number of partitions.
  • D . Use a wide transformation to reduce the number of partitions.
    Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Reveal Solution Hide Solution

Question #411

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

  • A . Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
  • B . Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
  • C . Use a narrow transformation to reduce the number of partitions.
  • D . Use a wide transformation to reduce the number of partitions.
    Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Reveal Solution Hide Solution

Question #411

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

  • A . Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
  • B . Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
  • C . Use a narrow transformation to reduce the number of partitions.
  • D . Use a wide transformation to reduce the number of partitions.
    Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Reveal Solution Hide Solution

Question #411

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

  • A . Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
  • B . Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
  • C . Use a narrow transformation to reduce the number of partitions.
  • D . Use a wide transformation to reduce the number of partitions.
    Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Reveal Solution Hide Solution

Question #417

col(["transactionId", "predError", "value", "f"])

Reveal Solution Hide Solution

Question #417

col(["transactionId", "predError", "value", "f"])

Reveal Solution Hide Solution

Question #417

col(["transactionId", "predError", "value", "f"])

Reveal Solution Hide Solution

Question #417

col(["transactionId", "predError", "value", "f"])

Reveal Solution Hide Solution

Question #417

col(["transactionId", "predError", "value", "f"])

Reveal Solution Hide Solution

Question #417

col(["transactionId", "predError", "value", "f"])

Reveal Solution Hide Solution

Question #423

dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-dd HH:mm:ss"))

E. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

Reveal Solution Hide Solution

Question #424

Which of the following is a characteristic of the cluster manager?

  • A . Each cluster manager works on a single partition of data.
  • B . The cluster manager receives input from the driver through the SparkContext.
  • C . The cluster manager does not exist in standalone mode.
  • D . The cluster manager transforms jobs into DAGs.
  • E . In client mode, the cluster manager runs on the edge node.

Reveal Solution Hide Solution

Question #425

Which of the following statements about DAGs is correct?

  • A . DAGs help direct how Spark executors process tasks, but are a limitation to the proper execution of a query when an executor fails.
  • B . DAG stands for "Directing Acyclic Graph".
  • C . Spark strategically hides DAGs from developers, since the high degree of automation in Spark means that developers never need to consider DAG layouts.
  • D . In contrast to transformations, DAGs are never lazily executed.
  • E . DAGs can be decomposed into tasks that are executed in parallel.

Reveal Solution Hide Solution

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments