You have written a notebook to generate a summary data set for reporting, Notebook was scheduled using the job cluster, but you realized it takes an average of 8 minutes to start the cluster, what feature can be used to start the cluster in a timely fashion?

You have written a notebook to generate a summary data set for reporting, Notebook was scheduled using the job cluster, but you realized it takes an average of 8 minutes to start the cluster, what feature can be used to start the cluster in a timely fashion?
A . Setup an additional job to run ahead of the actual job so the cluster is running second job starts
B. Use the Databricks cluster pools feature to reduce the startup time
C. Use Databricks Premium edition instead of Databricks standard edition
D. Pin the cluster in the cluster UI page so it is always available to the jobs
E. Disable auto termination so the cluster is always running

Answer: B

Explanation:

Cluster pools allow us to reserve VM’s ahead of time, when a new job cluster is created VM are grabbed from the pool. Note: when the VM’s are waiting to be used by the cluster only cost incurred is Azure. Databricks run time cost is only billed once VM is allocated to a cluster.

Here is a demo of how to setup and follow some best practices, https://www.youtube.com/watch?v=FVtITxOabxg&ab_channel=DatabricksAcademy

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments