Which of the following likely explains these smaller file sizes?
A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were...
If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?
The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. The recent_sensor_recordings table contains an identifying sensor_id alongside the timestamp and temperature for the most recent 5 minutes of recordings. The below query is used to create the alert:...
If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?
An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable: Assume that the fields customer_id and order_id serve as a composite key...
Which statement describes how data will be filtered?
A Delta Lake table representing metadata about content posts from users has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE This table is partitioned by the date column. A query is run with the following filter: longitude < 20 & longitude...
Which code block should be used to create the date Python variable used in the above code block?
An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format("parquet").load(f"/mnt/source/(date)") Which code block should be used to...
Which of the following is true of Delta Lake and the Lakehouse?
Which of the following is true of Delta Lake and the Lakehouse?A . Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.B . Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in...
Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?
A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?A . Three new jobs named "Ingest new data" will be...
Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?
A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30...
Which statement characterizes the general programming model used by Spark Structured Streaming?
Which statement characterizes the general programming model used by Spark Structured Streaming?A . Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.B . Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.C . Structured Streaming uses specialized hardware and I/O...
Which statement describes the execution and results of running the above query multiple times?
A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true. They plan to execute the following code...