You have written a notebook to generate a summary data set for reporting, Notebook was scheduled using the job cluster, but you realized it takes 8 minutes to start the cluster, what feature can be used to start the cluster in a timely fashion so your job can run immediatley?

You have written a notebook to generate a summary data set for reporting, Notebook was scheduled using the job cluster, but you realized it takes 8 minutes to start the cluster, what feature can be used to start the cluster in a timely fashion so your job can run immediatley?
A . Setup an additional job to run ahead of the actual job so the cluster is running second job starts
B . Use the Databricks cluster pools feature to reduce the startup time
C . Use Databricks Premium edition instead of Databricks standard edition
D . Pin the cluster in the cluster UI page so it is always available to the jobs
E . Disable auto termination so the cluster is always running

Answer: B

Explanation:

Cluster pools allow us to reserve VM’s ahead of time, when a new job cluster is created VM are grabbed from the pool. Note: when the VM’s are waiting to be used by the cluster only cost incurred is Azure. Databricks run time cost is only billed once VM is allocated to a cluster.

Here is a demo of how to setup a pool and follow some best practices,

Graphical user

interface, text

Description automatically generated

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments