Which code block should be used to create the date Python variable used in the above code block?

An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format("parquet").load(f"/mnt/source/(date)") Which code block should be used to...

April 27, 2025 No Comments READ MORE +

Which of the following is true of Delta Lake and the Lakehouse?

Which of the following is true of Delta Lake and the Lakehouse?A . Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.B . Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in...

April 26, 2025 No Comments READ MORE +

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?A . Three new jobs named "Ingest new data" will be...

April 24, 2025 No Comments READ MORE +

Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?

A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30...

April 21, 2025 No Comments READ MORE +

Which statement characterizes the general programming model used by Spark Structured Streaming?

Which statement characterizes the general programming model used by Spark Structured Streaming?A . Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.B . Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.C . Structured Streaming uses specialized hardware and I/O...

April 17, 2025 No Comments READ MORE +

Which statement describes the execution and results of running the above query multiple times?

A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true. They plan to execute the following code...

April 15, 2025 No Comments READ MORE +

Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?

The data engineering team maintains the following code: Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?A . A batch job will update the enriched_itemized_orders_by_account table, replacing only...

April 14, 2025 No Comments READ MORE +

Which statement describes this implementation?

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table. The following logic is used to process these records. Which statement describes this implementation?A . The customers table is implemented as a Type 3 table; old values are maintained...

April 12, 2025 No Comments READ MORE +

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?A . spark.sql.files.maxPartitionBytesB . spark.sql.autoBroadcastJoinThresholdC . spark.sql.files.openCostInBytesD . spark.sql.adaptive.coalescePartitions.minPartitionNumE . spark.sql.adaptive.advisoryPartitionSizeInBytesView AnswerAnswer: A Explanation: This is the correct answer because spark.sql.files.maxPartitionBytes is a configuration parameter that directly affects the size of a spark-partition upon ingestion...

April 12, 2025 No Comments READ MORE +

Which statement exemplifies best practices for implementing this system?

The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of bronze, silver, and gold tables. Bronze tables will almost exclusively be used by production data engineering workloads, while silver tables will...

April 9, 2025 No Comments READ MORE +