Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?
The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings. The team has decided to process all deletions from the previous week as...
Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?
A junior data engineer is working to implement logic for a Lakehouse table named silver_device_recordings. The source data contains 100 unique fields in a highly nested JSON structure. The silver_device_recordings table will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the...
Which of the following likely explains these smaller file sizes?
A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were...
Which statement describes what will happen when the above code is executed?
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database. After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active...
Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?A . spark.sql.files.maxPartitionBytesB . spark.sql.autoBroadcastJoinThresholdC . spark.sql.files.openCostInBytesD . spark.sql.adaptive.coalescePartitions.minPartitionNumE . spark.sql.adaptive.advisoryPartitionSizeInBytesView AnswerAnswer: A Explanation: This is the correct answer because spark.sql.files.maxPartitionBytes is a configuration parameter that directly affects the size of a spark-partition upon ingestion...
A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.
A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device. Streaming DataFrame df has the following...
Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?
A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?A . Three new jobs named "Ingest new data" will be...
Which of the following describes how Databricks Repos can help facilitate CI/CD workflows on the Databricks Lakehouse Platform?
Which of the following describes how Databricks Repos can help facilitate CI/CD workflows on the Databricks Lakehouse Platform?A . Databricks Repos can facilitate the pull request, review, and approval process before merging branches B. Databricks Repos can merge changes from a secondary Git branch into a main Git branch C....
You are currently asked to work on building a data pipeline, you have noticed that you are currently working on a very large scale ETL many data dependencies, which of the following tools can be used to address this problem?
You are currently asked to work on building a data pipeline, you have noticed that you are currently working on a very large scale ETL many data dependencies, which of the following tools can be used to address this problem?A . AUTO LOADER B. JOBS and TASKS C. SQL Endpoints...
You noticed that colleague is manually copying the notebook with _bkp to store the previous versions, which of the following feature would you recommend instead.
You noticed that colleague is manually copying the notebook with _bkp to store the previous versions, which of the following feature would you recommend instead.A . Databricks notebooks support change tracking and versioning B. Databricks notebooks should be copied to a local machine and setup source control locally to version...