Assuming that user_id is a unique identifying key and that delete_requests contains all users that have requested deletion, which statement describes whether successfully executing the above logic guarantees that the records to be deleted are no longer accessible and why?
The data governance team is reviewing code used for deleting records for compliance with GDPR. They note the following logic is used to delete records from the Delta Lake table named users. Assuming that user_id is a unique identifying key and that delete_requests contains all users that have requested deletion,...
Which code block accomplishes this task while minimizing potential compute costs?
The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE". The data science team would like predictions saved...
Which statement describes what will happen when the above code is executed?
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database. After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active...
The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day.
The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day. Assuming users have been...
Which approach would simplify the identification of these changed records?
A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from...
Which code snippet completes this function definition?
A nightly job ingests data into a Delta Lake table using the following code: The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline. Which code...
Which statement regarding stream-static joins and static Delta tables is correct?
Which statement regarding stream-static joins and static Delta tables is correct?A . Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.B . Each microbatch of a stream-static join will use the most recent version of the static Delta...
Which statement characterizes the general programming model used by Spark Structured Streaming?
Which statement characterizes the general programming model used by Spark Structured Streaming?A . Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.B . Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.C . Structured Streaming uses specialized hardware and I/O...
Which code block accomplishes this task while minimizing potential compute costs?
The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE". The data science team would like predictions saved...
Which statement describes Delta Lake Auto Compaction?
Which statement describes Delta Lake Auto Compaction?A . An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 GB.B . Before a Jobs cluster terminates, optimize is executed on all tables modified...