Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?
To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries. The data engineering team has been made aware of new requirements from a customer-facing...
Which statement describes this implementation?
The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table. The following logic is used to process these records. Which statement describes this implementation?A . The customers table is implemented as a Type 3 table; old values are maintained...
Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?
A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple,...
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?
Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM. Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?A . • Total VMs;...
Which describes how Delta Lake can help to avoid data loss of this nature in the future?
A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on...
Which statement exemplifies best practices for implementing this system?
The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of bronze, silver, and gold tables. Bronze tables will almost exclusively be used by production data engineering workloads, while silver tables will...
Which of the following is true of Delta Lake and the Lakehouse?
Which of the following is true of Delta Lake and the Lakehouse?A . Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.B . Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in...
Which code block should be used to create the date Python variable used in the above code block?
An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format("parquet").load(f"/mnt/source/(date)") Which code block should be used to...
Which approach would simplify the identification of these changed records?
A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from...
If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?
The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. The recent_sensor_recordings table contains an identifying sensor_id alongside the timestamp and temperature for the most recent 5 minutes of recordings. The below query is used to create the alert:...